# ENF Extraction from Audio and Video Files

Audio recordings may contain sum hum that is caused by the grid frequency interfering with the audio signal. If this noise is present in the audio signal depends on the recording equipment, the cabling, and so on.

It is known that the grid frequency is not stricty 50 or 60 Hz but slightly fluctuates around the nominal value. These fluctuations are then also present in the audio audio recordings. If one matches the fluctuation of in the audio with the fluctuations of the grid frequency in the past that is is possible to chronolocate the audio recording, that is, determine the time when the recording was made.

For this matching to work, one needs:
- access to a database of historical network frequencies,
- an audio clip containing a sufficient amount of network noise.

## 1. Import Standard Modules

In [None]:
import sys
import os
import subprocess
import matplotlib.pyplot as plt
import numpy as np
import scipy as sc

# https://stackoverflow.com/questions/74157935/getting-the-file-name-of-downloaded-video-using-yt-dlp
try:
  import yp_dlp
except:
  !pip install -q yt-dlp
  import yt_dlp

!# Install the Python modules that are not yet present on Colab
try:
  import py7zr
except:
  !pip install -q py7zr
  import py7zr

from google.colab import files

try:
  import google.colab
  from google.colab import files
  IN_COLAB = True
except:
  IN_COLAB = False

## 2. Load Custom ENF Modules from Github

In [None]:
!# Clone the files on github to Colab so that they can be used
![ -d enf-matching ] || git clone https://github.com/CoRoe/enf-matching.git
!cd enf-matching; git pull

# Add the path of the just cloned Python files to the Python path:
if not '/content/enf-matching' in sys.path:
    sys.path.insert(0, '/content/enf-matching')
#print(sys.path)

from enf import AudioClipEnf, GridEnf
from enf import notch_filter, butter_bandpass_filter

# 3. Mount your Google Drive

Mounting Google drive is sensible if you want to analyse media files you have stored there. Google will pop up a dialogue asking you for to authorise the mounting.**Text fett markieren**

In [None]:
if not os.path.exists('/content/drive/MyDrive'):
  try:
    from google.colab import drive
    drive.mount('/content/drive')
  except:
    print("Google Drive not mounted")

# 3 Choose an Audio or Video Clip

The clip will be passed through `ffmpeg`. Ffmpeg will do two things:

1.   Extract the audio track from a video file;
2.   downsample it to 400 Hz and convert it to uncompressed PCM data.

The downsampling reduces the storage requirements in the Python script.

There are three ways to access a media file; use one of the following methods:

1. **Internet video clip**: Paste the URL of the video page into the *url* field.
2. **File on your computer**:
3. **File on Colab**: Type the filename into the *media_file* field.

In [None]:
enf_data_source = "Internet video clip" # @param ["File on Colab","File on your computer","Internet video clip"]
media_file = "/content/enf-matching/samplemedia/001.wav" # @param {"type":"string","placeholder":"Audio or video file"}
url = "https://www.youtube.com/watch?v=4Un6B5ZnCUk" # @param {"type":"string"}


In [None]:
if enf_data_source == 'File on your computer':
  uploaded = files.upload()
  if len(uploaded) > 0:
    media_file = list(uploaded.keys())[0]
    print(f"Selected file: '{media_file}'")
elif enf_data_source == 'Internet video clip':
  with yt_dlp.YoutubeDL() as ydl:
      info_dict = ydl.extract_info(url, download=True)
      output_filename = ydl.prepare_filename(info_dict)
      media_file = output_filename
      print(f"Downloaded '{output_filename}'")
      media_file = output_filename


## 4 Load the Media File

In [None]:
clip = AudioClipEnf()
if clip.loadAudioFile(media_file):
  print(f"Loaded '{media_file}' ok, sample rate {clip.sampleRate()}")
else:
  print(f"Failed to load audio file '{media_file}'")

# 5 Generate a Spectrogram

A spectrogram visualises which frequencies are contained in a clip and how they vary over time. The hum component is usually very small will be visible only when frequencies outside the interesting range are suppressed. A bandfilter is used for that purpose.

For further analysis, the parameters of the filter have to be choosen. You may play with the parameters to obtain better results.

- **Grid frequency**; it is 50 Hz in most parts of the world and 60 Hz in the US.
- **The harmonic**; in many cases instead of the base frequency some harmonic is present in the recording.
- The **bandwidth of the bandpass**. The value should be set to the range in which grid frequency fluctuations are to be expected. A sensible value is 0.2 Hz.

The spectrogram shows the frequency range around the chosen harmonic of the grid frequency. Brighter colours indicate a higher amplitude.

In [None]:
# @title For the next steps, some parameters have to be chosen.
grid_freq = "50" # @param ["50","60"]
harmonic = "2" # @param ["1","2", "3", "4"]
freq_band = 0.2 # @param {"type":"slider","min":0,"max":0.5,"step":0.01}


In [None]:
butter_order = 20
NFFT = 4096

locut = int(harmonic) * (int(grid_freq) - freq_band)
hicut = int(harmonic) * (int(grid_freq) + freq_band)
ylim_lower = int(harmonic) * (int(grid_freq) - 5 * freq_band)
ylim_upper = int(harmonic) * (int(grid_freq) + 5 * freq_band)

filtered_data = butter_bandpass_filter(clip.data, locut, hicut,
                                        clip.sampleRate(), butter_order)
t = np.linspace(0, len(filtered_data)/clip.sampleRate(), len(filtered_data))

fig, ax = plt.subplots(1, 1)

Pxx, freqs, bins, im = ax.specgram(filtered_data, NFFT=NFFT, Fs=clip.sampleRate())
# The `specgram` method returns 4 objects. They are:
# - Pxx: the periodogram
# - freqs: the frequency vector
# - bins: the centers of the time bins
# - im: the .image.AxesImage instance representing the data in the plot
ax.set_ylim((ylim_lower, ylim_upper))
ax.set_xlabel('Time (s)')
ax.set_ylabel('Frequency (Hz)')
ax.set_title('Spectrogram')

# 6 Extract ENF Fluctuations

This step calculates the variation of the ENF signal over time.

In [None]:
clip.makeEnf(int(grid_freq), 0.200, int(harmonic))
t, f_enf = clip.getEnf()
fig, (ax1) = plt.subplots(nrows=1, sharex=True)
ax1.plot(t, f_enf/1000)
ax1.set_xlabel('Time (s)')
ax1.set_ylabel('ENF (Hz)')

# 7 Chronolocate the Clip

There are several possibilities to chronolocate the clip:

1.   Match against a database of historical ENF data. Unfortunately, there historical data are (so far?) available only for the UK.
2.   Match against a self-recorded ENF values in a CSV file.
3.   Match against a test WAV file.

The latter two options use files in the `git` repository. The fields *month* and *year* below are not relevant for the test cases.

In [None]:
# @title Source of historical ENF data
enf_hist_data_source = "Test (CSV file)" # @param ["GB","Test (WAV file)", "Test (CSV file)"]
month = "1" # @param ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"]
year = 2024 # @param {"type":"integer"}
enf_data_csv = "/content/enf-matching/samplemedia/2024-08-19T15:26:02.csv" # @param {"type":"string"}
match_algo = "Convolution" # @param ['Convolution', 'Euclidian', 'Pearson']

In [None]:
# The class GridEnf caches historical grid data in an SQL database.
if os.path.exists('/content/drive/MyDrive'):
  database_path = '/content/drive/MyDrive'
else:
  database_path = '/content'

# Define a progress callback function
def match_callback2(hint, progr):
  pass

def match_callback1(progr):
  pass

fig, ax = plt.subplots(nrows=2, sharex=True)
ax[0].set_xlabel('Time (s)')
ax[0].set_ylabel('ENF grid (Hz)')
ax[1].set_xlabel('Time (s)')
ax[1].set_ylabel('ENF clip (Hz)')
ax[0].set_title('ENF Match')

# Create an instance
grid_data_loaded = False
grid = GridEnf(database_path + '/hum.dp')
if enf_hist_data_source == 'Test (WAV file)':
  if grid.loadAudioFile('/content/enf-matching/samplemedia/71000_ref.wav'):
    grid.makeEnf(int(grid_freq), freq_band, int(harmonic))
    grid_data_loaded = True
  else:
    print(f"Failed to load audio file")
elif enf_hist_data_source == 'Test (CSV file)':
  grid.loadCSVFile(enf_data_csv)
  enf = grid.enf
  print("timestamp", type(grid.getTimestamp()))
  grid_data_loaded = True
else:
  grid.loadGridEnf(enf_hist_data_source, int(year), int(month), 1, match_callback2)
  _, d = grid.getEnf()
  if d is not None:
    grid_data_loaded = True

if grid_data_loaded:
  print("Loaded")
  grid.matchClip(clip, match_algo, match_callback1)
  t = grid.getMatchTimestamp()
  clip.setTimestamp(t)

  r = grid.getMatchRange()
  print("Range:", r)

  ax[0].set_xlim(r)
  ax[0].set_ylim(int(grid_freq) - freq_band, int(grid_freq) + freq_band)
  ax[1].set_xlim(r)
  ax[1].set_ylim(int(grid_freq) - freq_band, int(grid_freq) + freq_band)

  t0, f_enf0 = grid.getEnf()
  ax[0].plot(t0, f_enf0/1000)
  t1, f_enf1 = clip.getEnf()
  ax[1].plot(t1, f_enf1/1000)