Lab 6

1. Background:
Creation of a table of peak pairs from a song or clip.
Use of these tables to form a searchable database.

2. Finding a Clip Match:
Generating a clip table from the input clip.
Searching the database for matching entries.
Using hash functions to facilitate fast lookup.
Determining the song with the most matches.

3. Noise and SNR:
Adding noise to clips and measuring the classification performance at different SNRs.
Evaluating the system's robustness by varying noise levels.

* Important Points

1. Hash Table Construction:
Each entry in the song table is hashed based on its frequencies and time differences.
Handling collisions in the hash table is necessary for accurate matching.

2. Clip Matching:
Matches are found by identifying common time differences in peak pairs.
A histogram of these differences helps determine the correct song.

3. Performance Testing:
Performance is tested by adding Gaussian noise to clips and measuring the correct classification rate.
The effect of clip length on classification accuracy is also assessed.

In [11]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
from scipy.signal import find_peaks
import os
import hashlib
import pickle
import random

# Hash table and song list initialization
HASH_TABLE = {}
SONG_LIST = []

# Utility functions
def hash_function(f1, f2, dt):
    dt_int = int(dt * 1000)  # Scale dt to milliseconds and convert to int
    return (dt_int << 16) + (f1 << 8) + f2

def add_to_hash(song_id, table):
    global HASH_TABLE
    for f1, f2, t1, dt in table:
        h = hash_function(f1, f2, dt)
        if h not in HASH_TABLE:
            HASH_TABLE[h] = []
        HASH_TABLE[h].append((song_id, t1))

def save_database():
    with open('HASHTABLE.pkl', 'wb') as f:
        pickle.dump(HASH_TABLE, f)
    with open('SONGID.pkl', 'wb') as f:
        pickle.dump(SONG_LIST, f)

def load_database():
    global HASH_TABLE, SONG_LIST
    with open('HASHTABLE.pkl', 'rb') as f:
        HASH_TABLE = pickle.load(f)
    with open('SONGID.pkl', 'rb') as f:
        SONG_LIST = pickle.load(f)

def make_table(audio, fs, peak_count=100):
    peaks, _ = find_peaks(audio, height=0)
    if len(peaks) > peak_count:
        peaks = peaks[:peak_count]  # Limit the number of peaks to avoid excessive combinations
    table = []
    for i in range(len(peaks) - 1):
        for j in range(i + 1, len(peaks)):
            f1 = peaks[i]
            f2 = peaks[j]
            t1 = peaks[i] / fs
            t2 = peaks[j] / fs
            dt = t2 - t1
            table.append((f1, f2, t1, dt))
    return table

def add_songs_to_database(directory='.'):
    for file in os.listdir(directory):
        if file.endswith('.wav'):
            print(f"Processing {file}...")
            fs, data = wavfile.read(os.path.join(directory, file))
            song_id = len(SONG_LIST)
            SONG_LIST.append(file)
            table = make_table(data, fs)
            add_to_hash(song_id, table)
    save_database()

def myshazam(clip_file, test_directory='.'):
    load_database()
    test_file_path = os.path.join(test_directory, clip_file)
    print(f"Looking for file at: {os.path.abspath(test_file_path)}")  # Print the absolute path for debugging
    if not os.path.isfile(test_file_path):
        return f"File not found: {test_file_path}"
    fs, clip_data = wavfile.read(test_file_path)
    clip_table = make_table(clip_data, fs)
    match_counts = {}
    for f1, f2, t1, dt in clip_table:
        h = hash_function(f1, f2, dt)
        if h in HASH_TABLE:
            for song_id, ts1 in HASH_TABLE[h]:
                offset = ts1 - t1
                if song_id not in match_counts:
                    match_counts[song_id] = []
                match_counts[song_id].append(offset)
    best_match = max(match_counts, key=lambda k: len(match_counts[k]), default=None)
    if best_match is not None:
        return SONG_LIST[best_match]
    else:
        return "No match found"

def test_minimum_duration(directory='.', min_duration_sec=1):
    results = {}
    for file in os.listdir(directory):
        if file.endswith('.wav'):
            fs, data = wavfile.read(os.path.join(directory, file))
            duration = len(data) / fs
            results[file] = duration >= min_duration_sec
    return results

# Determine the minimum duration
min_duration_sec = 1  # Set this to the duration you want to test
duration_results = test_minimum_duration('.', min_duration_sec)
print(duration_results)

# Sample usage
# Ensure your .wav files are in the same directory as this Jupyter notebook
add_songs_to_database('.')
print(myshazam('StarWars3.wav', 'C:\\Users\\DHANASHRI\\Downloads'))  # Replace 'StarWars3.wav' with your test clip file name

# Test performance with noise
snr_db_list = [-15, -12, -9, -6, -3, 0, 3, 6, 9, 12, 15]
# performance_results = test_performance_with_noise('.', snr_db_list, 10)
# print(performance_results)


{'CantinaBand3.wav': True, 'CantinaBand60.wav': True, 'filtered_PinkPanther60.wav': True, 'filtered_whkight.wav': True, 'PinkPanther60.wav': True, 'pure_tone.wav': True, 'simple_signal.wav': True, 'sound.wav': True, 'StarWars60.wav': True, 'taunt.wav': True, 'tel.wav': True, 'temp_clip.wav': True, 'whkight.wav': True}
Processing CantinaBand3.wav...
Processing CantinaBand60.wav...
Processing filtered_PinkPanther60.wav...
Processing filtered_whkight.wav...
Processing PinkPanther60.wav...
Processing pure_tone.wav...
Processing simple_signal.wav...
Processing sound.wav...
Processing StarWars60.wav...
Processing taunt.wav...
Processing tel.wav...
Processing temp_clip.wav...
Processing whkight.wav...
Looking for file at: C:\Users\DHANASHRI\Downloads\StarWars3.wav
StarWars60.wav


In [7]:
pip install ffmpeg-python


Collecting ffmpeg-pythonNote: you may need to restart the kernel to use updated packages.

  Downloading ffmpeg_python-0.2.0-py3-none-any.whl.metadata (1.7 kB)
Collecting future (from ffmpeg-python)
  Downloading future-1.0.0-py3-none-any.whl.metadata (4.0 kB)
Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Downloading future-1.0.0-py3-none-any.whl (491 kB)
   ---------------------------------------- 0.0/491.3 kB ? eta -:--:--
   ---- ---------------------------------- 51.2/491.3 kB 890.4 kB/s eta 0:00:01
   ---------- ----------------------------- 122.9/491.3 kB 1.2 MB/s eta 0:00:01
   ------------------- -------------------- 235.5/491.3 kB 1.8 MB/s eta 0:00:01
   ------------------------------------ --- 450.6/491.3 kB 2.6 MB/s eta 0:00:01
   ---------------------------------------- 491.3/491.3 kB 2.4 MB/s eta 0:00:00
Installing collected packages: future, ffmpeg-python
Successfully installed ffmpeg-python-0.2.0 future-1.0.0


In [1]:
pip install os

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement os (from versions: none)
ERROR: No matching distribution found for os


In [2]:
pip install random

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement random (from versions: none)
ERROR: No matching distribution found for random


In [3]:
pip install pickle

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement pickle (from versions: none)
ERROR: No matching distribution found for pickle


In [4]:
pip install hashlib

Note: you may need to restart the kernel to use updated packages.


ERROR: Ignored the following yanked versions: 20081119
ERROR: Could not find a version that satisfies the requirement hashlib (from versions: none)
ERROR: No matching distribution found for hashlib
