<a href="https://colab.research.google.com/github/SandeepKonduruFeb12/aiml/blob/master/silver/A2AudioFIleComparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Find the most similar audio files to an uploaded query audio file from a collection of stored audio files, using feature extraction and similarity metrics.

## Setup Environment
Install the necessary Python libraries for audio processing and similarity calculation.


In [3]:
pip install torchaudio librosa scipy numpy



## Prepare Stored Audio Files

### Subtask:
Organize your collection of audio files by placing them into a designated directory (e.g., 'stored_audio'). This step is for you to prepare your data.


### Download Sample Audio Files

I will now download some example audio files. These are from public domain or Creative Commons sources. Feel free to replace these with your own audio files. The first few will be downloaded to `stored_audio` and one will be downloaded as the `query.wav`.

In [21]:
# Example URLs for public domain/CC-licensed audio files
# Note: These links are examples and might change or become unavailable over time.
# Always ensure you have the right to use the audio files you download.

stored_audio_urls = [
    "https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3",
    "https://www.soundhelix.com/examples/mp3/SoundHelix-Song-2.mp3",
    "https://www.soundhelix.com/examples/mp3/SoundHelix-Song-3.mp3",
    "https://www.soundhelix.com/examples/mp3/SoundHelix-Song-4.mp3"
]

query_audio_url = "https://www.soundhelix.com/examples/mp3/SoundHelix-Song-5.mp3" # This will be our query.wav

import os

# Ensure the stored_audio directory exists
if not os.path.exists('stored_audio'):
    os.makedirs('stored_audio')

print("Downloading stored audio files...")
for i, url in enumerate(stored_audio_urls):
    filename = os.path.join('stored_audio', f'stored_audio_{i+1}.' + url.split('.')[-1].split('?')[0])
    get_ipython().system(f'wget -O {filename} {url}')
    print(f"Downloaded {filename}")

print("\nDownloading query audio file...")
query_filename = 'query.wav' # You can change this if your query file has a different name
get_ipython().system(f'wget -O {query_filename} {query_audio_url}')
print(f"Downloaded {query_filename}")

print("\nAll sample audio files downloaded.")

Downloading stored audio files...
--2025-11-27 10:32:18--  https://www.soundhelix.com/examples/mp3/SoundHelix-Song-1.mp3
Resolving www.soundhelix.com (www.soundhelix.com)... 81.169.145.157, 2a01:238:20a:202:1157::
Connecting to www.soundhelix.com (www.soundhelix.com)|81.169.145.157|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8945229 (8.5M) [audio/mpeg]
Saving to: ‘stored_audio/stored_audio_1.mp3’


2025-11-27 10:32:21 (4.92 MB/s) - ‘stored_audio/stored_audio_1.mp3’ saved [8945229/8945229]

Downloaded stored_audio/stored_audio_1.mp3
--2025-11-27 10:32:21--  https://www.soundhelix.com/examples/mp3/SoundHelix-Song-2.mp3
Resolving www.soundhelix.com (www.soundhelix.com)... 81.169.145.157, 2a01:238:20a:202:1157::
Connecting to www.soundhelix.com (www.soundhelix.com)|81.169.145.157|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10222911 (9.7M) [audio/mpeg]
Saving to: ‘stored_audio/stored_audio_2.mp3’


2025-11-27 10:32:24 (5.57 MB/s) -

In [17]:
import os

directory_name = 'stored_audio'
if not os.path.exists(directory_name):
    os.makedirs(directory_name)
    print(f"Directory '{directory_name}' created successfully.")
else:
    print(f"Directory '{directory_name}' already exists.")

Directory 'stored_audio' already exists.


In [18]:
import torchaudio
import torchaudio.transforms as T
import torch
import numpy as np

def extract_features(audio_path, sample_rate=16000, n_mfcc=40, n_mels=64, n_fft=2048, hop_length=512):
    """
    Loads an audio file, resamples it, and extracts MFCC features.

    Args:
        audio_path (str): Path to the audio file.
        sample_rate (int): Target sample rate for resampling.
        n_mfcc (int): Number of MFCCs to return.
        n_mels (int): Number of mel filterbanks.
        n_fft (int): Size of FFT, creates `n_fft // 2 + 1` bins.
        hop_length (int): Length of the hop between windows.

    Returns:
        np.ndarray: A 1D numpy array representing the mean MFCC features.
    """
    try:
        # Load the audio file
        waveform, orig_sample_rate = torchaudio.load(audio_path)

        # Resample if necessary
        if orig_sample_rate != sample_rate:
            resampler = T.Resample(orig_sample_rate, sample_rate)
            waveform = resampler(waveform)

        # Ensure mono channel (take the first channel if stereo)
        if waveform.shape[0] > 1:
            waveform = waveform[0, :].unsqueeze(0)

        # Extract MFCCs
        mfcc_transform = T.MFCC(
            sample_rate=sample_rate,
            n_mfcc=n_mfcc,
            melkwargs={
                "n_mels": n_mels,
                "n_fft": n_fft,
                "hop_length": hop_length
            }
        )
        mfcc_features = mfcc_transform(waveform)

        # Take the mean across the time dimension (dimension 2 for [batch, mfcc_dim, time])
        # Squeeze to remove batch dimension if only one audio file is processed
        mean_mfccs = torch.mean(mfcc_features, dim=2).squeeze(0)

        # Convert to NumPy array and return
        return mean_mfccs.numpy()

    except Exception as e:
        print(f"Error processing {audio_path}: {e}")
        return None

print("Defined `extract_features` function.")

Defined `extract_features` function.


## Extract Features for Stored Files

### Subtask:
Iterate through all audio files in the 'stored_audio' directory, apply the feature extraction function defined previously, and store the extracted features along with the corresponding file paths.


**Reasoning**:
I will implement the logic to iterate through the 'stored_audio' directory, apply the previously defined `extract_features` function to each audio file, and store the results in a list as per the subtask instructions.



In [22]:
import os

# 1. Initialize an empty list to store features
stored_audio_features = []

# Define common audio file extensions to filter
audio_extensions = ('.wav', '.mp3', '.flac', '.ogg', '.m4a')

processed_count = 0
failed_count = 0

# 2. Get a list of all audio files in the 'stored_audio' directory
# Ensure directory_name is defined, if not, define it again for robustness
if 'directory_name' not in locals():
    directory_name = 'stored_audio'

if os.path.exists(directory_name):
    for filename in os.listdir(directory_name):
        if filename.lower().endswith(audio_extensions):
            full_file_path = os.path.join(directory_name, filename)

            # 5. Call the extract_features function
            features = extract_features(full_file_path)

            # 6. If extraction is successful, append to the list
            if features is not None:
                stored_audio_features.append((full_file_path, features))
                processed_count += 1
            else:
                print(f"Skipping {filename} due to feature extraction failure.")
                failed_count += 1
else:
    print(f"Error: Directory '{directory_name}' not found. Please ensure audio files are uploaded.")

# 7. Print a confirmation message
print(f"\nFeature extraction complete.")
print(f"Successfully processed {processed_count} audio files.")
print(f"Failed to process {failed_count} audio files.")
print(f"Total features stored in 'stored_audio_features': {len(stored_audio_features)}.")

Error processing stored_audio/stored_audio_3.wav: Failed to create AudioDecoder for stored_audio/stored_audio_3.wav: Could not open input file: stored_audio/stored_audio_3.wav Invalid data found when processing input
Skipping stored_audio_3.wav due to feature extraction failure.
Error processing stored_audio/stored_audio_2.wav: Failed to create AudioDecoder for stored_audio/stored_audio_2.wav: Could not open input file: stored_audio/stored_audio_2.wav Invalid data found when processing input
Skipping stored_audio_2.wav due to feature extraction failure.
Error processing stored_audio/stored_audio_4.wav: Failed to create AudioDecoder for stored_audio/stored_audio_4.wav: Could not open input file: stored_audio/stored_audio_4.wav Invalid data found when processing input
Skipping stored_audio_4.wav due to feature extraction failure.

Feature extraction complete.
Successfully processed 4 audio files.
Failed to process 3 audio files.
Total features stored in 'stored_audio_features': 4.


In [20]:
get_ipython().system('pip install torchcodec')
print("Installed torchcodec.")

Installed torchcodec.


In [23]:
query_audio_path = "query.wav"  # <<< IMPORTANT: Change this to the actual name of your uploaded query audio file if it's not 'query.wav'

print(f"Processing query audio file: {query_audio_path}")
query_features = extract_features(query_audio_path)

if query_features is not None:
    print("Query audio features extracted successfully.")
    print(f"Shape of query features: {query_features.shape}")
else:
    print(f"Error: Failed to extract features from {query_audio_path}. Please check the file path and format.")

Processing query audio file: query.wav
Query audio features extracted successfully.
Shape of query features: (40,)


## Calculate Similarities

### Subtask:
Develop a function to calculate the similarity between the features of the uploaded query audio file and the features of each stored audio file. Common similarity metrics include cosine similarity or Euclidean distance, which can be found in `scipy.spatial.distance`.


In [24]:
from scipy.spatial.distance import cosine

def calculate_cosine_similarity(features1, features2):
    """
    Calculates the cosine similarity between two feature vectors.

    Args:
        features1 (np.ndarray): The first feature vector.
        features2 (np.ndarray): The second feature vector.

    Returns:
        float: The cosine similarity between the two vectors.
    """
    # scipy.spatial.distance.cosine returns cosine distance, so subtract from 1 for similarity
    return 1 - cosine(features1, features2)

print("Defined `calculate_cosine_similarity` function.")

Defined `calculate_cosine_similarity` function.


In [25]:
similarities = []

# Ensure query_features is available and valid
if 'query_features' not in locals() or query_features is None:
    print("Error: query_features not found or is None. Please ensure the query audio file was processed successfully.")
else:
    for file_path, features in stored_audio_features:
        try:
            similarity = calculate_cosine_similarity(query_features, features)
            similarities.append((file_path, similarity))
        except ValueError as e:
            print(f"Could not calculate similarity for {file_path}: {e}")

print(f"Calculated similarities for {len(similarities)} stored audio files.")


Calculated similarities for 4 stored audio files.


## Display Most Similar Files
Rank the stored audio files based on their similarity scores to the query file (from most similar to least similar). Display the file paths and similarity scores for the top N most similar audio files.


In [26]:
import numpy as np # Ensure numpy is imported if it hasn't been in previous relevant cells

# 1. Sort the similarities list in descending order
sorted_similarities = sorted(similarities, key=lambda item: item[1], reverse=True)

# 2. Define top_n
top_n = 3 # You can change this value to display more or fewer top results

# 3. Print a descriptive header
print(f"\nTop {top_n} Most Similar Audio Files to Query Audio:")
print("---------------------------------------------------")

# 4. Iterate and print the top_n results
for i, (file_path, similarity) in enumerate(sorted_similarities[:top_n]):
    print(f"{i+1}. File: {file_path}\n   Similarity: {similarity:.4f}")



Top 3 Most Similar Audio Files to Query Audio:
---------------------------------------------------
1. File: stored_audio/stored_audio_4.mp3
   Similarity: 0.9989
2. File: stored_audio/stored_audio_1.mp3
   Similarity: 0.9937
3. File: stored_audio/stored_audio_2.mp3
   Similarity: 0.9923
