<a href="https://colab.research.google.com/github/jeffheaton/present/blob/master/youtube/video/fft-frequency.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Copyright 2022 by [Jeff Heaton](https://www.heatonresearch.com/), released under [LGPLv3](https://www.gnu.org/licenses/lgpl-3.0.en.html)

[YouTube video about this code](https://www.youtube.com/watch?v=rj9NOiFLxWA)

In [1]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

PATH = '/content/drive/MyDrive/projects/audio'

!pip install -U kaleido

# Configuration
FPS = 30
FFT_WINDOW_SECONDS = 0.25 # how many seconds of audio make up an FFT window

# Note range to display
FREQ_MIN = 10
FREQ_MAX = 1000

# Notes to display
TOP_NOTES = 3

# Names of the notes
NOTE_NAMES = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]

# Output size. Generally use SCALE for higher res, unless you need a non-standard aspect ratio.
RESOLUTION = (1920, 1080)
SCALE = 2 # 0.5=QHD(960x540), 1=HD(1920x1080), 2=4K(3840x2160)

Note: not using Google CoLab
Collecting kaleido
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl.metadata (15 kB)
Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.9/79.9 MB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: kaleido
Successfully installed kaleido-0.2.1


In [2]:
import matplotlib.pyplot as plt
from scipy.fftpack import fft
from scipy.io import wavfile # get the api
import os

# Get a WAV file from GDrive, such as:
# AUDIO_FILE = os.path.join(PATH,'short_popcorn.wav')

# Or download my sample audio
!wget https://github.com/jeffheaton/present/raw/master/youtube/video/sample_audio/piano_c_major_scale.wav
AUDIO_FILE = "/content/piano_c_major_scale.wav"

fs, data = wavfile.read(os.path.join(PATH,AUDIO_FILE)) # load the data
audio = data.T[0] # this is a two channel soundtrack, get the first track
FRAME_STEP = (fs / FPS) # audio samples per video frame
FFT_WINDOW_SIZE = int(fs * FFT_WINDOW_SECONDS)
AUDIO_LENGTH = len(audio)/fs

--2024-11-07 20:10:49--  https://github.com/jeffheaton/present/raw/master/youtube/video/sample_audio/piano_c_major_scale.wav
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/jeffheaton/present/master/youtube/video/sample_audio/piano_c_major_scale.wav [following]
--2024-11-07 20:10:51--  https://raw.githubusercontent.com/jeffheaton/present/master/youtube/video/sample_audio/piano_c_major_scale.wav
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4921652 (4.7M) [audio/wav]
Saving to: ‘piano_c_major_scale.wav’


2024-11-07 20:10:52 (132 MB/s) - ‘piano_c_major_scale.wav’ saved [4921652/4921652]



  fs, data = wavfile.read(os.path.join(PATH,AUDIO_FILE)) # load the data


Several utility functions.

In [3]:
import plotly.graph_objects as go

def plot_fft(p, xf, fs, notes, dimensions=(960,540)):
  layout = go.Layout(
      title="frequency spectrum",
      autosize=False,
      width=dimensions[0],
      height=dimensions[1],
      xaxis_title="Frequency (note)",
      yaxis_title="Magnitude",
      font={'size' : 24}
  )

  fig = go.Figure(layout=layout,
                  layout_xaxis_range=[FREQ_MIN,FREQ_MAX],
                  layout_yaxis_range=[0,1]
                  )

  fig.add_trace(go.Scatter(
      x = xf,
      y = p))

  for note in notes:
    fig.add_annotation(x=note[0]+10, y=note[2],
            text=note[1],
            font = {'size' : 48},
            showarrow=False)
  return fig

def extract_sample(audio, frame_number):
  end = frame_number * FRAME_OFFSET
  begin = int(end - FFT_WINDOW_SIZE)

  if end == 0:
    # We have no audio yet, return all zeros (very beginning)
    return np.zeros((np.abs(begin)),dtype=float)
  elif begin<0:
    # We have some audio, padd with zeros
    return np.concatenate([np.zeros((np.abs(begin)),dtype=float),audio[0:end]])
  else:
    # Usually this happens, return the next sample
    return audio[begin:end]

def find_top_notes(fft,num):
  if np.max(fft.real)<0.001:
    return []

  lst = [x for x in enumerate(fft.real)]
  lst = sorted(lst, key=lambda x: x[1],reverse=True)

  idx = 0
  found = []
  found_note = set()
  while( (idx<len(lst)) and (len(found)<num) ):
    f = xf[lst[idx][0]]
    y = lst[idx][1]
    n = freq_to_number(f)
    n0 = int(round(n))
    name = note_name(n0)

    if name not in found_note:
      found_note.add(name)
      s = [f,note_name(n0),y]
      found.append(s)
    idx += 1

  return found

Run the FFT on individual samples of the audio and generate video frames of the frequency chart.

In [11]:
import numpy as np
import tqdm

!rm /content/*.png

# See https://newt.phys.unsw.edu.au/jw/notes.html
def freq_to_number(f): return 69 + 12*np.log2(f/440.0)
def number_to_freq(n): return 440 * 2.0**((n-69)/12.0)
def note_name(n): return NOTE_NAMES[n % 12] + str(int(n/12 - 1))

# Hanning window function
window = 0.5 * (1 - np.cos(np.linspace(0, 2*np.pi, FFT_WINDOW_SIZE, False)))

xf = np.fft.rfftfreq(FFT_WINDOW_SIZE, 1/fs)
FRAME_COUNT = int(AUDIO_LENGTH*FPS)
FRAME_OFFSET = int(len(audio)/FRAME_COUNT)

# Pass 1, find out the maximum amplitude so we can scale.
mx = 0
for frame_number in range(FRAME_COUNT):
  sample = extract_sample(audio, frame_number)

  fft = np.fft.rfft(sample * window)
  fft = np.abs(fft).real
  mx = max(np.max(fft),mx)

print(f"Max amplitude: {mx}")

last_note = None

# Pass 2, produce the animation
for frame_number in tqdm.tqdm(range(FRAME_COUNT)):
    sample = extract_sample(audio, frame_number)
    fft = np.fft.rfft(sample * window)
    fft = np.abs(fft) / mx

    # Find the top note for the current frame
    s = find_top_notes(fft, TOP_NOTES)

    if s:
        # Sort notes by amplitude and select the one with the highest amplitude
        highest_amplitude_note = max(s, key=lambda x: x[2])
        current_note = highest_amplitude_note[1]  # Get the name of the note with the highest amplitude

        # Print only if the current note is different from the last one
        if current_note != last_note:
            timestamp = frame_number / FPS  # Calculate timestamp
            print(f"New highest amplitude note: {current_note} at {timestamp:.2f} seconds")
            last_note = current_note  # Update last printed note

    fig = plot_fft(fft.real,xf,fs,s,RESOLUTION)
    fig.write_image(f"/content/frame{frame_number}.png",scale=2)


Max amplitude: 1035.1945494237968


  0%|          | 2/418 [00:01<04:29,  1.55it/s]

New highest amplitude note: A0 at 0.07 seconds


  1%|          | 3/418 [00:01<04:18,  1.60it/s]

New highest amplitude note: G0 at 0.10 seconds


  1%|          | 4/418 [00:02<03:43,  1.85it/s]

New highest amplitude note: D#0 at 0.13 seconds


  1%|          | 5/418 [00:02<03:21,  2.05it/s]

New highest amplitude note: C4 at 0.17 seconds


  5%|▌         | 22/418 [00:09<02:23,  2.77it/s]

New highest amplitude note: C5 at 0.73 seconds


  7%|▋         | 30/418 [00:11<02:16,  2.85it/s]

New highest amplitude note: G5 at 1.00 seconds


  7%|▋         | 31/418 [00:12<02:41,  2.40it/s]

New highest amplitude note: D4 at 1.03 seconds


 14%|█▍        | 58/418 [00:24<02:10,  2.77it/s]

New highest amplitude note: F4 at 1.93 seconds


 14%|█▍        | 59/418 [00:25<02:11,  2.74it/s]

New highest amplitude note: E5 at 1.97 seconds


 19%|█▉        | 80/418 [00:34<02:22,  2.37it/s]

New highest amplitude note: D#0 at 2.67 seconds


 20%|██        | 84/418 [00:36<02:08,  2.59it/s]

New highest amplitude note: F4 at 2.80 seconds


 26%|██▋       | 110/418 [00:47<02:51,  1.80it/s]

New highest amplitude note: G#4 at 3.67 seconds


 27%|██▋       | 111/418 [00:47<02:51,  1.80it/s]

New highest amplitude note: G5 at 3.70 seconds


 32%|███▏      | 135/418 [00:57<01:41,  2.79it/s]

New highest amplitude note: G4 at 4.50 seconds


 33%|███▎      | 136/418 [00:57<01:40,  2.80it/s]

New highest amplitude note: A4 at 4.53 seconds


 38%|███▊      | 158/418 [01:07<01:36,  2.70it/s]

New highest amplitude note: A1 at 5.27 seconds


 39%|███▊      | 161/418 [01:08<01:59,  2.15it/s]

New highest amplitude note: D#0 at 5.37 seconds


 39%|███▉      | 162/418 [01:09<01:51,  2.30it/s]

New highest amplitude note: B4 at 5.40 seconds


 44%|████▍     | 184/418 [01:18<02:06,  1.85it/s]

New highest amplitude note: D#0 at 6.13 seconds


 45%|████▍     | 188/418 [01:20<02:02,  1.87it/s]

New highest amplitude note: A1 at 6.27 seconds


 45%|████▌     | 190/418 [01:21<01:40,  2.27it/s]

New highest amplitude note: B4 at 6.33 seconds


 46%|████▌     | 192/418 [01:22<01:34,  2.40it/s]

New highest amplitude note: C5 at 6.40 seconds


 52%|█████▏    | 218/418 [01:32<01:34,  2.12it/s]

New highest amplitude note: B4 at 7.27 seconds


 57%|█████▋    | 239/418 [01:41<01:04,  2.77it/s]

New highest amplitude note: D#0 at 7.97 seconds


 58%|█████▊    | 242/418 [01:42<01:02,  2.80it/s]

New highest amplitude note: A1 at 8.07 seconds


 58%|█████▊    | 243/418 [01:42<01:03,  2.77it/s]

New highest amplitude note: D#0 at 8.10 seconds


 59%|█████▊    | 245/418 [01:43<01:01,  2.82it/s]

New highest amplitude note: A4 at 8.17 seconds


 64%|██████▎   | 266/418 [01:53<01:08,  2.22it/s]

New highest amplitude note: D#0 at 8.87 seconds


 64%|██████▍   | 268/418 [01:54<01:00,  2.48it/s]

New highest amplitude note: B1 at 8.93 seconds


 65%|██████▍   | 271/418 [01:55<00:55,  2.65it/s]

New highest amplitude note: G#4 at 9.03 seconds


 65%|██████▌   | 273/418 [01:55<00:53,  2.70it/s]

New highest amplitude note: G5 at 9.10 seconds


 71%|███████   | 296/418 [02:05<01:03,  1.92it/s]

New highest amplitude note: G4 at 9.87 seconds


 71%|███████▏  | 298/418 [02:06<01:03,  1.88it/s]

New highest amplitude note: F4 at 9.93 seconds


 78%|███████▊  | 326/418 [02:17<00:33,  2.78it/s]

New highest amplitude note: E5 at 10.87 seconds


 83%|████████▎ | 349/418 [02:28<00:25,  2.69it/s]

New highest amplitude note: D#0 at 11.63 seconds


 84%|████████▎ | 350/418 [02:28<00:25,  2.72it/s]

New highest amplitude note: A1 at 11.67 seconds


 84%|████████▍ | 352/418 [02:29<00:24,  2.75it/s]

New highest amplitude note: D4 at 11.73 seconds


 91%|█████████ | 381/418 [02:41<00:16,  2.30it/s]

New highest amplitude note: C4 at 12.70 seconds


 95%|█████████▍| 397/418 [02:47<00:07,  2.76it/s]

New highest amplitude note: C5 at 13.23 seconds


 97%|█████████▋| 406/418 [02:50<00:04,  2.50it/s]

New highest amplitude note: G5 at 13.53 seconds


 98%|█████████▊| 409/418 [02:52<00:04,  1.91it/s]

New highest amplitude note: C5 at 13.63 seconds


100%|█████████▉| 416/418 [02:56<00:01,  1.86it/s]

New highest amplitude note: G1 at 13.87 seconds


100%|█████████▉| 417/418 [02:56<00:00,  2.00it/s]

New highest amplitude note: D#0 at 13.90 seconds


100%|██████████| 418/418 [02:57<00:00,  2.36it/s]


Use [ffmpeg](https://ffmpeg.org/) to combine the input audio WAV and the individual frame images into a MP4 video.

In [5]:
!ffmpeg -y -r {FPS} -f image2 -s 1920x1080 -i frame%d.png -i {AUDIO_FILE} -c:v libx264 -pix_fmt yuv420p movie.mp4

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab

Download the generated movie.

In [None]:
from google.colab import files
files.download('movie.mp4')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>