#Matrix Represenation In Audio Similarity
IE University BCSAI 2023

Matrices and Linear Transformations Project

##Introduction
A simple approach to analyze audio signals in the form of mp3 files of songs by representing them as matrices, with each row representing a different frequency component and each column representing a short time segment of the audio signal. 

By applying singular value decomposition (SVD) to these matrices, it is possible to reduce the dimensionality of the data by focusing on the most important frequency components, represented by the singular value vectors.

These singular value vectors are then compared using similarity metrics of cosine similarity, Euclidean distance and Manhattan distance to determine how similar the two audio signals are.

This approach can be useful for various applications, such as audio classification, music recommendation, and audio similarity analysis.





In [11]:
import numpy as np
import librosa

## Load the audio files
The `load` function from the Librosa library returns the audio time series and the sampling rate of the audio.

In [6]:
# Load the audio files
song1, sr1 = librosa.load('Avalon_Audio_-_Groove_Funk_Beat_Main.mp3')
song2, sr2 = librosa.load('Gypsy_Romance_-_GYPSY_RING.mp3')
song3, sr3 = librosa.load('Gypsy_Romance_-_GYPSY_RING.mp3')
#song3, sr3 = librosa.load('Meditation_Background_-_cleanmindstudio.mp3')

In [7]:
# Ensure that the songs have the same length
min_len = min(len(song1), len(song2), len(song3))
song1 = song1[:min_len]
song2 = song2[:min_len]
song3 = song3[:min_len]

##Convert the audio files into matrix representation using STFT
Convert the audio time series into a matrix representation using the Short-Time Fourier Transform (STFT). 

The STFT breaks the audio signal into small overlapping time windows and calculates the Fourier transform for each window. 

The resulting matrix represents the magnitude and phase of each frequency component at each time window.

In [8]:
# Convert the audio files into matrix representation using STFT
hop_length = 512
n_fft = 2048
stft1 = librosa.stft(song1, hop_length=hop_length, n_fft=n_fft)
stft2 = librosa.stft(song2, hop_length=hop_length, n_fft=n_fft)
stft3 = librosa.stft(song3, hop_length=hop_length, n_fft=n_fft)

##Apply SVD to the resulting matrix

The Singular Value Decomposition (SVD) is applied to the resulting matrix to factorize it into three matrices: U, Σ, and V. U and V are orthogonal matrices, and Σ is a diagonal matrix containing the singular values of the original matrix.

In [9]:
# Apply SVD to the resulting matrices
U1, s1, V1 = np.linalg.svd(stft1)
U2, s2, V2 = np.linalg.svd(stft2)
U3, s3, V3 = np.linalg.svd(stft3)

##Compare the values of Σ for the songs



###Cosine similarity 
is a measure of similarity between two vectors that takes into account the angle between them as is calculated as the dot product of the two vectors divided by the product of their magnitudes. 

cosine_similarity(a, b) = (a dot b) / (||a|| * ||b||)

In this case, we use it to compare the singular value vectors of the two songs.

In [10]:
# Calculate the cosine similarity between the Σ values for the songs
similarity12 = np.dot(s1, s2) / (np.linalg.norm(s1) * np.linalg.norm(s2))
similarity13 = np.dot(s1, s3) / (np.linalg.norm(s1) * np.linalg.norm(s3))
similarity23 = np.dot(s2, s3) / (np.linalg.norm(s2) * np.linalg.norm(s3))

print(f"Cosine similarity between song 1 and 2: {similarity12}")
print(f"Cosine similarity between song 1 and 3: {similarity13}")
print(f"Cosine similarity between song 2 and 3: {similarity23}")

Cosine similarity between song 1 and 2: 0.9414443373680115
Cosine similarity between song 1 and 3: 0.9414443373680115
Cosine similarity between song 2 and 3: 1.0


###Euclidean distance 
The Euclidean distance is a measure of the straight-line distance between two points in a multidimensional space, and is calculated as the square root of the sum of the squared differences between corresponding elements of the two matrices.

Euclidean distance = √Σ(A[i,j] - B[i,j])²

where Σ is the sum over all elements of the matrices A and B, i is the row index, and j is the column index.

In [20]:
# Compute the Euclidean distance between the matrices
E_dist_12 = np.linalg.norm(s1 - s2)
E_dist_23 = np.linalg.norm(s2 - s2)

print(f"Euclidean distance between song 1 and 2: {E_dist_12}")
print(f"Euclidean distance between song 2 and 3: {E_dist_23}")

Euclidean distance between song 1 and 2: 6897.376953125
Euclidean distance between song 2 and 3: 0.0


###The Manhattan distance
Also known as the taxicab distance or L1 distance, the Manhattan distance, is a measure of distance between two points in a multidimensional space. It is calculated as the sum of the absolute differences between the corresponding elements of the two matrices.

Manhattan distance = Σ |A[i,j] - B[i,j]|

where Σ is the sum over all elements of the matrices A and B, i is the row index, and j is the column index.

In [22]:
# Compute the Manhattan distance between the matrices
M_dist_12 = np.linalg.norm(s1 - s2, ord=1)
M_dist_23 = np.linalg.norm(s2 - s3, ord=1)

print(f"Manhattan distance between song 1 and 2: {M_dist_12}")
print(f"Manhattan distance between song 2 and 3: {M_dist_23}")

Manhattan distance between song 1 and 2: 64502.2265625
Manhattan distance between song 2 and 3: 0.0
