Skip to content

Generate image stack representing audio. This can be typically used as one of the raw material to create video clip.

License

Notifications You must be signed in to change notification settings

crowtec/audio-display

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

audio-display

Forked from https://gitlab.com/zeograd/audio-display

audio-display is a set of utility aimed at rendering images based on audio input. It aims at generating images to produce visual "companion" to audio files. Typically to create a video clip supporting a musical composition.

fft2png

fft2png is a command line utility to create a set of image files representing audio spectrum from a wav file. It is an offline version of a spectrum analyser output and is not totally dissimilar to Sonic Candle.

Generated files can be imported and postprocessed in any video edition tool accepting a set of images as input. Just make sure that the framerate used during the call to fft2png matches the framerate at which you consume images in your video edition software. Libre software able to consume and use those images includes, but aren't limited to, Natron.

You can also use FFmpeg for either previewing quickly or for simple needs (just the spectrum over a fixed image or a video. Some samples of ffmpeg usage can be found later.

Usage

usage: fft2png [-h] [-d] [-n] [-r TARGET_FPS] [-R {0,1,2,3}] [-v]
[-w BAR_WIDTH] [-s BAR_SPACING] [-c BAR_COUNT] [-C COLOR] [-b BLENDING] [-W FFT_WINDOW] [--image-height IMAGE_HEIGHT] [--audio-min-freq AUDIO_MIN_FREQ] [--audio-max-freq AUDIO_MAX_FREQ] [--silence-ceiling SILENCE_CEILING] [-i INPUT_FILENAME] -o OUTPUT_FILENAME_MASK

GPL v3+ 2015 Olivier Jolly

optional arguments:
-h, --help show this help message and exit
-d, --debug debug operations [default: False]
-n don't perform actions [default: False]
-r TARGET_FPS, --framerate TARGET_FPS
 output framerate [default: 30]
-R {0,1,2,3}, --renderer {0,1,2,3}
which renderer to use to display bars (0=filled, 1=hollow, 2=symetrical filled, 3=symetrical hollow)
-v, --version show program's version number and exit
-w BAR_WIDTH, --bar-width BAR_WIDTH
 bar width in output images
-s BAR_SPACING, --bar-spacing BAR_SPACING
 bar spacing in output images
-c BAR_COUNT, --bar-count BAR_COUNT
 number of bars in output images
-C COLOR, --color COLOR
 hexa color of bars [RRGGBB or RRGGBBAA, default: FFFFFFFF]
-b BLENDING, --blending BLENDING
 blending of previous spectrum into current one (0 = display only fresh data, 1 = use as many previous than fresh data)
-W FFT_WINDOW, --window FFT_WINDOW
 window size for FFT [default: 4096]
--image-height IMAGE_HEIGHT
 output images height
--audio-min-freq AUDIO_MIN_FREQ
 min frequency in input audio
--audio-max-freq AUDIO_MAX_FREQ
 max frequency in input audio
--silence-ceiling SILENCE_CEILING
 opposite of threshold considered silence [in dB, default: 70]
-i INPUT_FILENAME
 input file in wav format
-o OUTPUT_FILENAME_MASK
 output filename mask (should contain {:06} or similar to generate sequence)

Convert audio file to stack of images

-r
Framerate of the generated images. Should match the framerate at which they will be consumed. Higher framerate gives a smoother result.
-R
Aspect of the bar representing power for one frequency. 0 uses filled boxes, 1 uses hollow boxes, 2 uses filled boxes vertically centered and 3 uses hollow boxes vertically centered.
-w
Width (in pixel) per bar.
-s
Spacing (in pixel) between bars.
-c
Number of bars per images.
-C
Color of the bars in hexa. Can be RGB or RGBA. For instance, FF0000 will render pure opaque red bars, 00FF0080 will render 50% transparent pure green bars, ...
-b
Blending ratio from previous frame into current one. When set to 0, only fresh data will be used to render bars. When set to 1, bars will be rendered from an average of the fresh and previous frame data. Intermediate values will inject a fraction of the previous frame data into the current one for rendering. Lower values tends to render more reactive spectrum while higher ones will smooth data over time and react slower.
-W
Spectrum generation window is the amount of data in the audio file used to determine the spectrum raw data. Lower value will make spectrum blockier but will be slightly faster to generate.

Example

To use default values when generating spectrum, just invoke:

fft2png -i input.wav -o output-{:06}.png

result of default fft2png settings

For a slightly different result, you can invoke it like this:

fft2png -R2 -w4 -s4 -c30 -C FF8080A0 --audio-min-freq 100 -i input.wav -o output-{:06}.png

You'll end up with 30 symetrical transparent redish solid bars 4 pixels wide, spaced by 4 pixels

result of red solid symetrical bars ff2png settings


FFMpeg usage

If you already have a video as background and want to add spectrogram center on it while adding some musique, you can invoke ffmpeg like this:

ffmpeg -i <background_video.mp4> -framerate <generated frames framerate> -i <audio-00%4d.png> -filter_complex "overlay=(main_w-overlay_w)/2:(main_h-overlay_h)/2:shortest=1" -i <music.wav> -map 2:0 -vframes <number of generated frames> -strict -2 <output.mp4> -y
where :
  • <background_video.mp4> is the filename of your background video
  • <generated frames framerate> is the framerate used when generating spectrogram frames
  • <audio-00%4d.png> is the mask of the generated frames to overlay
  • <music.wav> is the filename of the your music
  • <number of generated frames> is, well, the number of generated spectrogram frames
  • <output.mp4> is the generated muxed video
A few notes :
  • you can change the overlay position by setting the position in absolute coordinates or using some maths with main_w, main_h, overlay_w, overlay_h as show here
  • -y is for overwriting the result file
  • -strict -2 alleviates some error with aac encoding on my version/system combo
  • the background video will not loop. As for now (ffmpeg 3.0.1), looping is not for video. If your video is too short, prepare one which is long enough by concatenating it several times. The shortest=1 in the filter expression will stop whenever an input stream (background video, spectrogram images or music) reaches its end.
  • use the ffmpeg manual, Luke

If you want to use a static image as background, the invocation becomes something like:

ffmpeg -loop 1 -i <background_image.jpg> -framerate <generated frames framerate> -i <audio-00%4d.png> -filter_complex "overlay=(main_w-overlay_w)/2:(main_h-overlay_h)/2:shortest=1" -i <music.wav> -map 2:0 -vframes <number of generated frames> -strict -2 <output.mp4> -y

The main difference being the -loop 1 to loop the background image over and over until one of the other stream ends.

Installation

audio-display is installable from PyPI with a single pip command:

pip install audio-display

Alternatively, audio-display can be run directly from sources after a git pull (recommended if you want to tweak or read the source):

git clone https://gitlab.com/zeograd/audio-display.git
cd audio-display && python setup.py install

or directly from its git repository:

pip install git+https://gitlab.com/zeograd/audio-display.git

About

Generate image stack representing audio. This can be typically used as one of the raw material to create video clip.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages