Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Media tools

This is my personal "Swiss army knife" for working with (usually large) collections of media. It is made up of a number of scripts that downloads media, extracts samples from that media, extracts features from those samples, and recombines them in various ways. For example, this will sort an audio file by frequency:

python -in "media/sample/sonata.mp3" -sort "hz=asc" -out "output/sonata_sorted_by_hz.mp3"

Where sort can be one of: tsne (spectral similarity), hz (frequency), power (volume), clarity (clear harmonic bands), dur (clip duration)


Not all of these are required for individual scripts, but covers what's needed for most workflows.

  • Python (I developed using 3.6, so 3.6+ is recommended and may not work with 2.7+)
  • SciPy for math functions (probably already installed)

Working with video

Working with audio


  • scikit-learn for statistics and machine learning features (e.g. TSNE, clustering, classification)
  • Multicore-TSNE for faster TSNE
  • RasterFairy for transforming point cloud to grid
  • Requests for making remote web requests for scraping metadata
  • Curl for binary downloads

Large collection workflow

The scripts in this repository were mostly designed for analyzing/visualizing very large collections of media. These scripts are not for the faint of heart and could easily take a few days to run from start to finish. You would benefit from a powerful workstation built for gaming, computation, or video rendering (i.e. multiple cores, plenty of RAM, SSD, and a decent graphics card.) That said, here's an example workflow:

1. Metadata retrieval

Download all movie metadata from Internet Archive that are in the Fedflix collection and created by the National Archives and save to CSV file:

python ingesters/internet_archive/ \
-query " collection:(FedFlix) AND mediatype:(movies) AND creator:(national archives and records administration)" \
-out "tmp/ia_fedflixnara.csv"

By default, the above script will look for the largest .mp4 asset and associate it with the filename property. You can change this format by adding a flag, e.g. -format .mp3. If each record has multiple assets associated with it, add a flag -multi and each asset with the indicated format will be retrieved and saved as its own row (the record metadata will be the same, except for filename)

For details on how to construct a search query, visit the Internet Archive's advanced search page.

2. Asset download

Next, you can download the assets from each of the movies retrieved from the previous step. You can add a flag to indicate how many parallel downloads to run, e.g. -threads 3. Make sure output directory has plenty of space. This particular collection has a 100GB+ of files.

python ingesters/internet_archive/ \
-in "tmp/ia_fedflixnara.csv" \
-out "tmp/downloads/ia_fedflixnara/"

3. File features

Then retrieve the file features: video duration, has audio track, has video track. By default, it opens the file to get an accurate duration. You can speed this up by just looking at the metadata (less accurate) by adding flag -fast.

python \
-in "tmp/ia_fedflixnara.csv" \
-dir "tmp/downloads/ia_fedflixnara/"

Note that checking for an audio track doesn't guarantee to catch all silent films since some video files may have a silent audio track. These cases will be captured in the next step.

4. Audio analysis

Now we analyze each movie file's audio track for "samples." These essentially are clips of audio that have a distinct onset and release. This could be thought of as a distinct sonic "pulse" or syllable in the case of speech. The -features flag adds an analysis step that looks for each sample's volume (power) and pitch (hz or frequency.) note and octave are added for convenience, and the clarity feature attempts to measure how distinct a particular note is (i.e. very clear harmonic bands.) Samples with high clarity values should be good candidates for musical notes.

python \
-in "tmp/ia_fedflixnara.csv" \
-dir "tmp/downloads/ia_fedflixnara/" \
-out "tmp/sampledata/ia_fedflixnara/%s.csv" \

The above command will save all samples to .csv files, where each media file will have one .csv file with its respective sample data. Each .csv file will have the same filename as the media source's filename. This will take a long time for large collections.

5. Audio analysis metadata

Next we will update the original metadata .csv file with metadata about the samples per file, e.g. number of samples, median volume, median pitch. This will help identify which movies are silent or have unusable audio, e.g. if a file has few samples or its medianPower is very low.

python \
-in "tmp/ia_fedflixnara.csv" \
-dir "tmp/sampledata/ia_fedflixnara/"

6. View audio analysis

Optionally, you can view the stats of the samples you created:

python \
-in "tmp/ia_fedflixnara.csv" \
-plot "duration,samples,medianPower,medianHz,medianClarity,medianDur"

python -in "tmp/ia_fedflixnara.csv"

Or view more detailed stats of an individual file's samples:

python -in "tmp/sampledata/ia_fedflixnara/gov.archives.111-tv-221.mp4.csv"

Or view two properties as a scatter plot:

python \
-in "tmp/sampledata/ia_fedflixnara/gov.archives.111-tv-221.mp4.csv" \
-props "power,hz"

7. Visualizing audio/video

Create a subset by taking all films with more than 500 samples with sound; take the 16,384 (128x128) samples with most power and clarity; limit 100 samples per film

python \
-in "tmp/ia_fedflixnara.csv" \
-dir "tmp/sampledata/ia_fedflixnara/" \
-out "tmp/ia_fedflixnara_subset.csv" \
-filter "samples>500&medianPower>0.5" \
-lim 16384 -ffilter "octave>1&power>0" \
-fsort "power=desc=0.75&clarity=desc" \
-flim 100

If the resulting subset is too small, try to modify the -filter query and decrease the thresholds for filtering. Typically you'd want to have as small of a -flim (sample limit per file) value as you can if you want to have a relatively diverse set of samples.

Now we can attempt to sort this subset be spectral similarity via t-SNE. First, we must extract 1-D tsne values from the subset and cache the feature data:

python \
-in "tmp/ia_fedflixnara_subset.csv" \
-dir "tmp/sampledata/ia_fedflixnara/" \
-components 1 \
-prefix "stsne" \
-angle 0.1 \
-cache "tmp/ia_fedflixnara_subset_features.p"
-threads 4

You can tweak the number of parallel processes (-threads) to work best with your processor.

Then sort the samples by t-SNE and output to audio file:

python3 \
-in "tmp/ia_fedflixnara_subset.csv" \
-dir "tmp/sampledata/ia_fedflixnara/" \
-sort "stsne=asc" \
-out "output/ia_fedflixnara_sort_stsne_asc.mp3"

In the resulting audio file, you should hear clear "clusters" of similar-sounding audio.

Now we will lay out the samples on a 2-D grid using t-SNE again, but with two components instead of one. This should be faster if you cached the features in the previous section.

python \
-in "tmp/ia_fedflixnara_subset.csv" \
-dir "tmp/downloads/ia_fedflixnara/" \
-components 2 \
-angle 0.2 \
-cache "tmp/ia_fedflixnara_subset_features.p"
-threads 4

Then put the sample subset in a 128x128 grid based on the t-SNE features (essentially created a matrix of samples organized by spectral similarity). This requires rasterfairy and Python 2.7+

python \
-in "tmp/ia_fedflixnara_subset.csv" \
-grid "128x128"

Before we visualize the grid, I like to run this script as well:

python \
-in "tmp/ia_fedflixnara_subset.csv" \
-dir "tmp/downloads/ia_fedflixnara/"

The above script is optional and mostly for aesthetic purposes. It analyzes the video component of the audio samples that we extracted from previous steps to extend the sample duration so we're not seeing too much flickering. For example, if an extracted audio sample was only 100 milliseconds, if we loop that, it would be visually flickering and overwhelming. Or, a sample might start right before a new visual scene starts. This script attempts to extend the sample to about a second if it doesn't see a new scene start and attempts to end the sample before a new scene starts. The result adds two columns vstart and vdur to our input sample data file.

Finally we can generate a visualization. All of the different visualizations are in the ./compositions/ folder. We can start with a simple one:

python compositions/ \
-in "tmp/ia_fedflixnara_subset.csv" \
-dir "tmp/downloads/ia_fedflixnara/" \
-out "output/ia_fedflixnara_proliferation.mp4" \
-cache \
-cd "tmp/ia_fedflixnara_cache/" \
-ckey "ia_fedflixnara_proliferation" \
-outframe "tmp/ia_fedflixnara_proliferation_frames/frame.%s.png"

This script simply plays the samples starting from the center, then outward. When you first run this, it will take a long time since it does some preprocessing which it caches, so it will be faster for subsequent commands. Here's what the script will do, in order:

  1. Construct the audio component of the visualization by layering each source audio file as a separate track. This will result in an .mp3 file in the output directory.
  2. Analyze the sequence to calculate the maximum size (width/height) of each clip. The result will be cached in the cache directory (-cd).
  3. Each frame from each clip will be extracted from the source video and cached in the cache directory.
  4. Then each frame of the result video is processed and saved to the -outframe folder. You can speed this up by adding parallel processing (-threads).
  5. Finally, the frames and the audio file is compiled to a video file using ffmpeg

more soon...


No description, website, or topics provided.







No releases published


No packages published