# Chorus Prediction Model Feature Description

## Temporal Features

### Tempo:
- **Average Tempo**: Mean tempo across chorus sections.
- **Tempo Variability**: Standard deviation of tempo across the track.

### Onset Envelope:
- **Mean Onset Strength**: Average amplitude of onset envelope within chorus sections.
- **Onset Rate**: Number of onsets per second (mean, std).

### Beat Interval:
- **Mean Beat Interval**: Average time between beats in chorus sections.
- **Beat Interval Variability**: Standard deviation of time between beats.

## Rhythmic Features
- **Rhythmic Entropy**: Shannon entropy calculated from the beat histogram in chorus sections.
- **Beat Strength**: Average strength of detected beats.
- **Beat Density**: Number of beats per unit time in chorus sections.
- **Rhythmic Complexity**: Measures of syncopation or beat irregularity.
- **Beat Periodicity/Fluctuation**: Variability in the periodicity of beats.

## Spectral Features

### Spectral Contrast:
- **Mean Contrast**: Average spectral contrast over chorus sections.
- **Contrast Variability**: Standard deviation of spectral contrast.

### Spectral Centroids:
- **Mean Centroid**: Average spectral centroid in chorus sections.
- **Centroid Variability**: Standard deviation of spectral centroids.

### Spectral Flatness:
- **Mean Flatness**: Average spectral flatness in chorus sections.
- **Flatness Variability**: Standard deviation of spectral flatness.

### Spectral Flux:
- **Mean Flux**: Average spectral flux in chorus sections.
- **Flux Variability**: Standard deviation of spectral flux.

### Spectral Rolloff:
- **Mean Rolloff**: Average frequency below which a certain percentage of the total spectral energy is contained within chorus sections.
- **Rolloff Variability**: Standard deviation of spectral rolloff.

### Spectral Bandwidth:
- **Mean Bandwidth**: Average spectral bandwidth in chorus sections.
- **Bandwidth Variability**: Standard deviation of spectral bandwidth.

## Harmonic Features

### Tonal Centroid (tonnetz):
- **Mean Position**: Average position in tonal centroid space during chorus.
- **Position Variability**: Standard deviation within tonnetz features.

### Chroma:
- **Mean Chroma Energy**: Average energy of each chroma feature in chorus sections.
- **Chroma Variance**: Variance of chroma features across chorus sections.

### Key Clarity:
- **Average Key Clarity**: Average clarity of the perceived key in chorus sections.

### Cadence Detection:
- **Cadence Frequency**: Frequency of cadential progressions in chorus sections.

### Harmonic Change Detection:
- **Change Frequency**: Number of harmonic changes per unit time in chorus sections.

### Harmony to Melody Ratio:
- **Average Ratio**: Ratio of harmonic to melodic content in chorus sections.

### Harmonic-to-Noise Ratio (HNR):
- **Mean HNR**: Average harmonic-to-noise ratio in chorus sections.

## Loudness and Energy Features

### Loudness (RMS Energy):
- **Mean Loudness**: Average RMS energy in chorus sections.
- **Loudness Variability**: Standard deviation of RMS energy.

## Timbral Features

### Mel-frequency Cepstral Coefficients (MFCCs):
- **Mean MFCCs**: Average value of each MFCC across chorus sections.
- **MFCC Variability**: Standard deviation of each MFCC across the track.

### Spectral Harmonicity:
- **Mean Harmonicity**: Average spectral harmonicity within chorus sections.

### Zero-Crossing Rate:
- **Mean ZCR**: Average zero-crossing rate in chorus sections.
- **ZCR Variability**: Standard deviation of zero-crossing rate.

## High-Level Descriptors

### Dynamic Range:
- **Dynamic Range**: Difference between the loudest and quietest parts of chorus sections.

### Temporal Evolution of Timbral Features:
- **Change Over Time**: Measurement of how MFCCs and other timbral features evolve over the course of the chorus.

### Self-Similarity Matrix:
- **Degree of Self-Similarity**: Quantification of the repetitiveness within chorus sections.

### Repetition Detection:
- **Repetition Count**: Number of repeated motifs or fragments within chorus sections.

## Beat-Synchronous Features

### Beat Synchronous Chroma:
- **Mean BSC**: Aggregate chroma features synchronized to the beat intervals within chorus sections.

### Beat Synchronous MFCCs:
- **Mean BSMFCCs**: Aggregate MFCCs synchronized to the beat intervals within chorus sections.

## Summary Statistics for Each Feature
For each of the features, calculate the following summary statistics to capture the essence of the data:

- **Mean**: The average value indicating central tendency.
- **Standard Deviation (std)**: A measure of the spread ofthe values.
- **Minimum (min) and Maximum (max)**: The lowest and highest values, showing the range.
- **Median**: The middle value, less affected by outliers.
- **Quantiles**: Values such as the 25th and 75th percentiles that help understand data distribution.
- **Skewness**: Indicates asymmetry of the distribution.
- **Kurtosis**: A measure of the "tailedness" of the distribution.

## Granularity of Features

### Song-Normalized Segment Features:
These features are normalized based on the entire song to contextualize the segment (verse, chorus, bridge) within the song's overall structure.

### Segment-Level Features:
Describe the properties of individual segments, such as a chorus, providing insights into the intrinsic characteristics that may define its role in the song.

### Song-Level Features:
Characterize the overall properties of the song, providing a backdrop for understanding segment-level features and the song's structure as a whole.

## Import Libraries

In [85]:
# Required imports
import os
import numpy as np
import pandas as pd
import librosa
from tqdm import tqdm
from pydub import AudioSegment
from scipy.stats import skew, kurtosis
from scipy.signal import welch

## Define Paths

In [59]:
# Define path to directories
audio_dir = '../data/audio_files/processed'                 # Directory containing processed the audio files for analysis
audio_export_dir = '../data/audio_files/segmented/training' # Directory containing exported SALAMI-annotated audio segments 
data_dir = '../data/dataframes'                             # Directory containing dataframes and other reference data

## Import Cleaned Dataframe with Chorus & Metadata

In [60]:
df = pd.read_csv(os.path.join(data_dir,'chorus_and_metadata.csv'))
df.head()

Unnamed: 0,song_id,annotator,chorus_start,chorus_end,chorus_duration,SONG_TITLE,ARTIST,CLASS,GENRE,Genre_itunes,Album,file_path
0,1003,2,89.722358,118.502698,28.78034,Im_Moving_On,Big_Water,Live_Music_Archive,,,,../data/audio_files/processed/1003.mp3
1,1003,1,107.144059,118.560272,11.416213,Im_Moving_On,Big_Water,Live_Music_Archive,,,,../data/audio_files/processed/1003.mp3
2,1003,1,153.137029,176.392086,23.255057,Im_Moving_On,Big_Water,Live_Music_Archive,,,,../data/audio_files/processed/1003.mp3
3,1003,2,210.783991,239.428345,28.644354,Im_Moving_On,Big_Water,Live_Music_Archive,,,,../data/audio_files/processed/1003.mp3
4,1004,1,74.257415,112.743107,38.485692,Fearless,Big_Whiskey,Live_Music_Archive,,,,../data/audio_files/processed/1004.mp3


## Define Functions for Audio Analysis and Feature Extraction

In [89]:
def load_audio(file_path, sr=None):
    """
    Loads an audio file.

    Parameters:
    - file_path: The path to the audio file.
    - sr: The sampling rate to use. Default None uses the original sampling rate.

    Returns:
    A tuple with the audio signal and the sampling rate, or (None, None) if loading fails.
    """
    try:
        audio_signal, sr = librosa.load(file_path, sr=sr)
        return audio_signal, sr
    except Exception as e:
        print(f"Error loading {file_path}: {e}")
        return None, None

# Function to compute rhythm features for an audio segment
def extract_rhythm_features(y, sr, start_time, end_time):
    # Convert the start and end times to sample indices
    start_sample = librosa.time_to_samples(start_time, sr=sr)
    end_sample = librosa.time_to_samples(end_time, sr=sr)
    # Extract the specific segment
    segment = y[start_sample:end_sample]

    # Compute the tempogram
    tempogram = librosa.feature.tempogram(y=segment, sr=sr)
    # Calculate tempo-related features from tempogram
    tempogram_mean = np.mean(tempogram)
    tempogram_variance = np.var(tempogram)
    tempogram_median = np.median(tempogram)
    tempogram_skewness = skew(tempogram.flatten())
    tempogram_kurtosis = kurtosis(tempogram.flatten())

    # Compute the Fourier tempogram
    fourier_tempogram = np.abs(np.fft.rfft(tempogram, axis=0))
    fourier_tempogram_mean = np.mean(fourier_tempogram)
    # Calculate dominant rhythmic frequency (assuming the tempo is in BPM)
    d_freq = np.fft.rfftfreq(tempogram.shape[0], d=1./sr)
    dominant_rhythmic_freq = d_freq[np.argmax(fourier_tempogram, axis=0)]
    rhythmic_freq_variance = np.var(fourier_tempogram)
    # Rhythmic entropy calculation (normalize the tempogram before computing entropy)
    normalized_tempogram = fourier_tempogram / fourier_tempogram.sum(axis=0, keepdims=True)
    rhythmic_entropy = -np.sum(normalized_tempogram * np.log(normalized_tempogram + 1e-15), axis=0)

    # Calculate tempogram ratio features
    # Here we use the ratio of the Fourier tempogram mean to the tempogram mean as a simple example
    ratio_mean = fourier_tempogram_mean / tempogram_mean
    ratio_variance = np.var(ratio_mean)
    ratio_max_peak = np.max(ratio_mean)

    features = {
        'tempogram_mean': tempogram_mean,
        'tempogram_variance': tempogram_variance,
        'tempogram_median': tempogram_median,
        'tempogram_skewness': tempogram_skewness,
        'tempogram_kurtosis': tempogram_kurtosis,
        'dominant_rhythmic_freq': dominant_rhythmic_freq,
        'rhythmic_freq_variance': rhythmic_freq_variance,
        'rhythmic_entropy': rhythmic_entropy,
        'ratio_mean': ratio_mean,
        'ratio_variance': ratio_variance,
        'ratio_max_peak': ratio_max_peak
    }
    
    return features

{'tempogram_mean': 0.19968825424744466, 'tempogram_variance': 0.05160212836427607, 'tempogram_median': 0.09269396844228582, 'tempogram_skewness': 0.9423384783119517, 'tempogram_kurtosis': -0.20994183992537074, 'dominant_rhythmic_freq': array([0., 0., 0., ..., 0., 0., 0.]), 'rhythmic_freq_variance': 46.74188340471569, 'rhythmic_entropy': array([4.41060872, 4.41034909, 4.41007256, ..., 3.79281283, 3.79161214,
       3.7904425 ]), 'ratio_mean': 10.048110761796623, 'ratio_variance': 0.0, 'ratio_max_peak': 10.048110761796623}


In [112]:
# Process the sample song
sample_song = df.iloc[0]
y, sr = librosa.load(sample_song['file_path'], sr=None)

start_time = pd.to_numeric(sample_song['chorus_start'], errors='coerce')
end_time = pd.to_numeric(sample_song['chorus_end'], errors='coerce')
start_sample = librosa.time_to_samples(start_time, sr=sr)
end_sample = librosa.time_to_samples(end_time, sr=sr)
segment = y[start_sample:end_sample]
tempogram = librosa.feature.tempogram(y=segment, sr=sr)
tempogram_mean = np.mean(tempogram)
tempogram_variance = np.var(tempogram)
tempogram_median = np.median(tempogram)
tempogram_skewness = skew(tempogram.flatten())
tempogram_kurtosis = kurtosis(tempogram.flatten())
print(tempogram_mean, tempogram_variance, tempogram_median, tempogram_skewness, tempogram_kurtosis)

0.19968825424744466 0.05160212836427607 0.09269396844228582 0.9423384783119517 -0.20994183992537074


In [113]:
# Compute the tempogram
tempogram = librosa.feature.tempogram(y=y, sr=sr)

# Check for negative values, which should not be present
assert not (tempogram < 0).any(), "Tempogram should not contain negative values."

# Plot the tempogram
plt.figure(figsize=(10, 4))
librosa.display.specshow(tempogram, sr=sr, x_axis='time', y_axis='tempo')
plt.colorbar()
plt.title('Tempogram')
plt.tight_layout()
plt.show()

AssertionError: Tempogram should not contain negative values.

In [96]:
features = extract_rhythm_features(y, sr, sample_song['chorus_start'], sample_song['chorus_end'])
print(features)

NameError: name 'tempogram' is not defined

In [95]:
# Assuming features['dominant_rhythmic_freq'] is your array
d_r_freq_series = pd.Series(features['dominant_rhythmic_freq'])

# Now you can use the describe() method
summary_stats = d_r_freq_series.describe()
print(summary_stats)

count    2479.0
mean        0.0
std         0.0
min         0.0
25%         0.0
50%         0.0
75%         0.0
max         0.0
dtype: float64


In [82]:
help(librosa.beat.tempo())

	This function was moved to 'librosa.feature.rhythm.tempo' in librosa version 0.10.0.
	This alias will be removed in librosa version 1.0.
  help(librosa.beat.tempo())


ParameterError: Either y or onset_envelope must be provided

In [77]:
audio_signal, sr = librosa.load(first_song['file_path'], sr=sr)
audio_signal

array([-0.00151792, -0.00187289, -0.00149319, ..., -0.00375323,
       -0.00490353, -0.00278475], dtype=float32)

In [68]:
extract_tempo_features(temp, df)

Error loading ../data/audio_files/processed/1003.mp3\1003.mp3: [Errno 2] No such file or directory: '../data/audio_files/processed/1003.mp3\\1003.mp3'
Failed to load audio from ../data/audio_files/processed/1003.mp3\1003.mp3
Error loading ../data/audio_files/processed/1003.mp3\1003.mp3: [Errno 2] No such file or directory: '../data/audio_files/processed/1003.mp3\\1003.mp3'
Failed to load audio from ../data/audio_files/processed/1003.mp3\1003.mp3
Error loading ../data/audio_files/processed/1003.mp3\1003.mp3: [Errno 2] No such file or directory: '../data/audio_files/processed/1003.mp3\\1003.mp3'
Failed to load audio from ../data/audio_files/processed/1003.mp3\1003.mp3
Error loading ../data/audio_files/processed/1003.mp3\1003.mp3: [Errno 2] No such file or directory: '../data/audio_files/processed/1003.mp3\\1003.mp3'
Failed to load audio from ../data/audio_files/processed/1003.mp3\1003.mp3
Error loading ../data/audio_files/processed/1003.mp3\1004.mp3: [Errno 2] No such file or directory: 

  audio_signal, sr = librosa.load(file_path, sr=sr)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Unnamed: 0,song_id,annotator,chorus_start,chorus_end,chorus_duration,SONG_TITLE,ARTIST,CLASS,GENRE,Genre_itunes,Album,file_path,avg_tempo_chorus,avg_tempo_song,std_dev_tempo_chorus,std_dev_tempo_song
0,1003,2,89.722358,118.502698,28.780340,Im_Moving_On,Big_Water,Live_Music_Archive,,,,../data/audio_files/processed/1003.mp3,,,,
1,1003,1,107.144059,118.560272,11.416213,Im_Moving_On,Big_Water,Live_Music_Archive,,,,../data/audio_files/processed/1003.mp3,,,,
2,1003,1,153.137029,176.392086,23.255057,Im_Moving_On,Big_Water,Live_Music_Archive,,,,../data/audio_files/processed/1003.mp3,,,,
3,1003,2,210.783991,239.428345,28.644354,Im_Moving_On,Big_Water,Live_Music_Archive,,,,../data/audio_files/processed/1003.mp3,,,,
4,1004,1,74.257415,112.743107,38.485692,Fearless,Big_Whiskey,Live_Music_Archive,,,,../data/audio_files/processed/1004.mp3,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1209,996,1,70.024830,96.832653,26.807823,A_Lack_of_Color,Ben_Gibbard,Live_Music_Archive,,,,../data/audio_files/processed/996.mp3,,,,
1210,998,1,67.999637,83.384626,15.384989,They_Love_Each_Other,Big_Blue,Live_Music_Archive,,,,../data/audio_files/processed/998.mp3,,,,
1211,998,2,119.129546,134.513946,15.384399,They_Love_Each_Other,Big_Blue,Live_Music_Archive,,,,../data/audio_files/processed/998.mp3,,,,
1212,998,2,272.412063,287.641882,15.229819,They_Love_Each_Other,Big_Blue,Live_Music_Archive,,,,../data/audio_files/processed/998.mp3,,,,


## Extract Features for Each Chorus Segment and Entire Song

In [26]:
# Initialize empty DataFrame to hold song-level features
song_features_df = pd.DataFrame()

# Get the unique song ids
unique_song_ids = df['song_id'].unique()

# Initialize the progress bar
for song_id in tqdm(unique_song_ids, desc="Processing songs", unit="song"):
    song_data = df[df['song_id'] == song_id]
    song_row = song_data.iloc[0]
    audio_path = os.path.join(audio_dir, str(song_row['song_id']) + '.mp3')
    
    # Extract song-level features
    song_features = get_song_features(audio_path)
    song_features['song_id'] = song_id
    
    # Convert song_features to a DataFrame and concatenate
    song_features_df = pd.concat([song_features_df, pd.DataFrame([song_features])], ignore_index=True)

# Extract segment-level features and add them to the original DataFrame
df_segment_features = df.apply(lambda row: get_segment_features(row, audio_dir), axis=1)
df_with_segment_features = pd.concat([df, df_segment_features], axis=1)

# Join song-level features with the segment-level features DataFrame
df_final = df_with_segment_features.merge(song_features_df, on='song_id', suffixes=('_segment', '_song'))

Processing songs: 100%|██████████| 337/337 [08:56<00:00,  1.59s/song]


In [41]:
df_final.columns

Index([               'song_id',              'annotator',
                 'chorus_start',             'chorus_end',
              'chorus_duration',             'SONG_TITLE',
                       'ARTIST',                  'CLASS',
                        'GENRE',           'Genre_itunes',
                        'Album',                        0,
                   'mfccs_mean', 'spectral_centroid_mean'],
      dtype='object')

In [49]:
import librosa
import numpy as np
import pandas as pd

# Function to extract features
def extract_features(audio_path):
    y, sr = librosa.load(audio_path, duration=30)  # Load the first 30 seconds of the audio file
    tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
    onset_env = librosa.onset.onset_strength(y=y, sr=sr)
    spectral_contrast = np.mean(librosa.feature.spectral_contrast(S=librosa.stft(y=y), sr=sr), axis=1)
    spectral_flatness = np.mean(librosa.feature.spectral_flatness(y=y))
    
    # Create a dictionary with feature names and values
    features = {
        'tempo': tempo,
        'spectral_contrast': spectral_contrast,
        'spectral_flatness': spectral_flatness
    }
    return features

# Replace this with the actual path to your audio file
audio_file_path = r'C:\Users\denni\OneDrive\Desktop\Springboard\MusicAnnotator\data\audio_files\processed\12.mp3'

# Extract features from the audio file
extracted_features = extract_features(audio_file_path)

# Convert the dictionary of features to a DataFrame
features_df = pd.DataFrame([extracted_features])

# Add column names for spectral contrast features
contrast_columns = [f'spectral_contrast_{i}' for i in range(len(extracted_features['spectral_contrast']))]
features_df[contrast_columns] = pd.DataFrame([extracted_features['spectral_contrast']], index=features_df.index)

# Now we can safely drop the 'spectral_contrast' column since its values have been split into separate columns
features_df = features_df.drop(columns='spectral_contrast')

print(features_df)

        tempo  spectral_flatness  spectral_contrast_0  spectral_contrast_1  \
0  135.999178           0.013946           105.697099           104.655369   

   spectral_contrast_2  spectral_contrast_3  spectral_contrast_4  \
0            99.503758            92.353778             89.64094   

   spectral_contrast_5  spectral_contrast_6  
0             87.78099            86.375312  


  valley[..., k, :] = np.mean(sortedr[..., :idx, :], axis=-2)
  peak[..., k, :] = np.mean(sortedr[..., -idx:, :], axis=-2)
