## Using Computational Methods to Simulate Spotify's Algorithm to find music characterics such as tempo, danceability, key, loudness,instramentalness and energy

In this code we are using the librosa python library which is used for audio and music analysis. It provides a wide range of tools for working with audio data, including functions for loading audio files, extracting features, performing transformations, and visualizing audio data. Librosa is widely used in the field of music information retrieval (MIR).

In [25]:
import librosa
import numpy as np
import json

# Function to calculate danceability
def calculate_danceability(y, sr):
    # Calculate tempo
    tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
    
    # Calculate beat strength
    onset_env = librosa.onset.onset_strength(y=y, sr=sr)
    beat_strength = onset_env.mean()
    
    # Calculate spectral features
    spectral_contrast = librosa.feature.spectral_contrast(y=y, sr=sr).mean()
    spectral_flatness = librosa.feature.spectral_flatness(y=y).mean()
    
    # Normalize features to a scale of 0.0 to 1.0
    tempo_norm = min(tempo / 200, 1.0)  # Assuming 200 BPM as a high tempo
    beat_strength_norm = min(beat_strength / 1.0, 1.0)  # Assuming beat strength ranges from 0 to 1
    spectral_contrast_norm = min(spectral_contrast / 50, 1.0)  # Assuming 50 as a high spectral contrast
    spectral_flatness_norm = min(spectral_flatness / 1.0, 1.0)  # Assuming spectral flatness ranges from 0 to 1
    
    # Combine features to estimate danceability
    danceability = (tempo_norm * 0.3) + (beat_strength_norm * 0.4) + (spectral_contrast_norm * 0.2) - (spectral_flatness_norm * 0.1)
    danceability = max(0.0, min(danceability, 1.0))  # Ensure the value is between 0.0 and 1.0
    return danceability

# Function to calculate instrumentalness
def calculate_instrumentalness(y, sr):
    # Harmonic-Percussive Source Separation (HPSS)
    y_harmonic, y_percussive = librosa.effects.hpss(y)
    
    # Calculate the ratio of harmonic to percussive energy
    harmonic_energy = np.sum(y_harmonic**2)
    percussive_energy = np.sum(y_percussive**2)
    instrumentalness = harmonic_energy / (harmonic_energy + percussive_energy)
    
    # Normalize to a scale of 0.0 to 1.0
    instrumentalness = max(0.0, min(instrumentalness, 1.0))
    return instrumentalness

# Function to calculate energy
def calculate_energy(y):
    # Calculate RMS energy
    rms = librosa.feature.rms(y=y).mean()
    
    # Normalize to a scale of 0.0 to 1.0
    energy = min(rms / 1.0, 1.0)  # Assuming RMS energy ranges from 0 to 1
    return energy

# Function to extract audio features using librosa
def extract_audio_features(file_path):
    y, sr = librosa.load(file_path, sr=None)

    # Extract features
    tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
    chroma_stft = float(librosa.feature.chroma_stft(y=y, sr=sr).mean())
    rmse = float(librosa.feature.rms(y=y).mean())
    spectral_contrast = float(librosa.feature.spectral_contrast(y=y, sr=sr).mean())
    spectral_bandwidth = float(librosa.feature.spectral_bandwidth(y=y, sr=sr).mean())
    spectral_flatness = float(librosa.feature.spectral_flatness(y=y).mean())
    tonnetz = float(librosa.feature.tonnetz(y=y, sr=sr).mean())
    zcr = float(librosa.feature.zero_crossing_rate(y).mean())
    mfcc = librosa.feature.mfcc(y=y, sr=sr).mean(axis=1).tolist()

    # Improved loudness calculation
    S = np.abs(librosa.stft(y))
    freqs = librosa.fft_frequencies(sr=sr)
    loudness = float(librosa.perceptual_weighting(S**2, freqs).mean())

    # Improved key detection
    chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
    chroma_mean = chroma.mean(axis=1)
    key_index = chroma_mean.argmax()
    key = librosa.hz_to_note(librosa.midi_to_hz(key_index + 12))  # Convert to musical note

    # Calculate danceability
    danceability = calculate_danceability(y, sr)

    # Calculate instrumentalness
    instrumentalness = calculate_instrumentalness(y, sr)

    # Calculate energy
    energy = calculate_energy(y)

    features = {
        'Tempo': tempo,
        'Loudness': loudness,
        'Key': key,
        'Danceability': danceability,
        'Instrumentalness': instrumentalness,
        'Energy': energy
    }

    # Convert any remaining NumPy arrays to lists and ensure all values are standard Python types
    for key, value in features.items():
        if isinstance(value, np.ndarray):
            features[key] = value.tolist()
        elif isinstance(value, (np.float32, np.float64)):
            features[key] = float(value)
        elif isinstance(value, (np.int32, np.int64)):
            features[key] = int(value)

    return features

# Example usage
file_path = r'C:\Users\shoir\is310-coding-assignments\inclass-code\Zane Little - Always and Forever.mp3'  # Using raw string
# or
# file_path = 'C:\\Users\\shoir\\is310-coding-assignments\\inclass-code\\Zane Little - Always and Forever.mp3'  # Using escaped backslashes

audio_features = extract_audio_features(file_path)
print("Extracted Audio Features:")
print(json.dumps(audio_features, indent=4))

Extracted Audio Features:
{
    "Tempo": [
        114.84375
    ],
    "Loudness": -22.791458468304516,
    "Key": "G0",
    "Danceability": [
        0.6703728687516529
    ],
    "Instrumentalness": 0.8519929647445679,
    "Energy": 0.08817879110574722
}


### How did we find the Tempo?

Used the .beat.beat_track function which is used to estimate the tempo whihc is in beats per minute and it detects the beat positions of an audio signal. 

### How did we find danceability?

To get the danceability we used different features to determine it such as Tempo, Beat Strength (which is the prominence of beats), Spectral Contrast (which is the difference in amplitude between peaks and valleys and a higher spectral contrast indicates a more dynamic and rhythmic music) and specteral flatness (this measures how noise-like the sound is, a lower specteral flatness is typically more danceable). The common variables used are y and sr which are audio time series (y) and sampling rate (sr). We then put the numbers together to get a metric between 0 and 1 to get how danceable the song is. 

### How did we find Loudness?

To calculate the loudness we used the function perceptual weighting which reflects the human perception of loudness. We first computed the short-time fourier transform which converts the audio signal from the time domeain to a frewquency which hives a spectrogram which can represent the amplitude of different frequencies. Then we computed the power spectrogram which you get by squaring the stft. Then using the perceptual weighting functution it models human hearing hearing sensitivity to different frequencies. After that we get the mean loudness which is our final number from the .mean() function.

### How did we find the key of the song?

To get the key, we used the chroma featur which shows the intensity of each of teh 12 different pitches. By averaging it out we ditermined the pitch with the highest average energy, which is most likely the note of the key.

### How did we get the Instrumentalness?

To get the instrumentalness we used three steps. We used the Harmonic Percussive Source Seperatione which seperates the signal into the two components. Then we calculated the energy of each component by summing the squared amplitudes. Then after that we calculated the ration of harmonic energy to total energy to give us the instrumentalness.

### How did we get Energy?

We used the .feature.rms function which calculates the Root Mean Squared energy of an Audo signal. It is typically used to present the perceived energy of the signal. 