### **Collected Data is imblanced dataset from GTZAN**
***In this dataset we have 10 genres ('Jazz', 'Pop', 'Classical', 'Hip -pop', Blah Blah)***

***Each files contain 100 .wav files of 30 sec audio***

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns

from glob import glob
import librosa
import librosa.display
import IPython.display as ipd

from itertools import cycle

# Seaborn visualization setup
sns.set_theme(style="white", palette=None)
color_pal = plt.rcParams["axes.prop_cycle"].by_key()["color"]
color_cycle = cycle(plt.rcParams["axes.prop_cycle"].by_key()["color"])

In [None]:
# Path to the nested folder structure
audio_files = glob('genres_original/*/*.wav') 

## **Oversample Minority Classes**                ## ye wala part skip kr sakte ho mei gtzan wala utha liya 

***Using oversample minority classes to balance the dataset to prevent bias towards majority classes during model training.***

***Beneficial because balanced datasets help models learn more effectively from minority classes, leading to better generalization.***


In [None]:
# # Oversample files
# oversampled_files = []
# for genre, count in genre_file_count.items():
#     genre_path = os.path.join(base_path, genre)
#     genre_files = glob(os.path.join(genre_path, '*.wav'))
#     if count < max_count:
#         oversampled_files.extend(genre_files)
#         # Duplicate files to match the max_count
#         oversampled_files.extend(resample(genre_files, replace=True, n_samples=max_count - count))
#     else:
#         oversampled_files.extend(genre_files)

# print(f"Total files after oversampling: {len(oversampled_files)}")

## **Plotting the first audio file**

In [None]:
y, sr = librosa.load(audio_files[855])
print(f'y: {y[:10]}')
print(f'shape y: {y.shape}')
print(f'Type of y: {type(y)}')
print(f'sr: {sr}')

## ***Apply data augmentation***

In [None]:
def augment_audio(y, sr):
    # Pitch shifting
    y_pitch_shifted = librosa.effects.pitch_shift(y, sr, n_steps=4)
    # Time stretching
    y_time_stretched = librosa.effects.time_stretch(y, 1.5)
    # Adding noise
    noise = np.random.randn(len(y))
    y_noisy = y + 0.005 * noise
    return [y_pitch_shifted, y_time_stretched, y_noisy]

# Plot the first audio file
# y, sr = librosa.load(audio_files[101])
# print(f'y: {y[:10]}')
# print(f'shape y: {y.shape}')
# print(f'sr: {sr}')

## **Plotting Raw audio** 

In [None]:
pd.Series(y).plot(figsize=(10, 5),lw=1, title='Raw Audio Example', color=color_pal[0])
plt.show()

## **Trim leading/lagging silence**
***Here we removing the first few seconds audio which can cause trouble***

In [None]:
y_trimmed, _ = librosa.effects.trim(y, top_db=20)
pd.Series(y_trimmed).plot(figsize=(10, 5), lw=1, title='Raw Audio Trimmed', color=color_pal[1])
plt.show()

## **Zoomed-in view of raw audio**

In [None]:

pd.Series(y[30000:35000]).plot(figsize=(10, 5), lw=1, title='Raw Audio Zoomed In', color=color_pal[2])
plt.show()

## **Spectrogram**
***Helps in transforming the time domain (raw audio waveform) to the frequency domain helps us understand the different frequency components present in the audio signal***


The Short-Time Fourier Transform (STFT) is used here to analyze the signal in short overlapping time segments

In [None]:
# Extraction
D = librosa.stft(y)
D_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

In [None]:
# Visualisation
fig, ax = plt.subplots(figsize=(10, 5))
img = librosa.display.specshow(D_db, x_axis='time', y_axis='log', ax=ax)
ax.set_title('Spectrogram Waveform', fontsize=20)
fig.colorbar(img, ax=ax, format='%0.2f')
plt.show()

## **Mel Spectogram** mfcc
Feature Extraction

In [None]:
# Extraction
S = librosa.feature.melspectrogram(y=y,sr=sr,n_mels=128,)
S_db_mel = librosa.amplitude_to_db(S, ref=np.max)
S_power_deb_mel = librosa.power_to_db(S, ref=np.max)

In [None]:
# Visualisation
fig, ax = plt.subplots(figsize=(10, 5))
# Plotting mel spectogram
img = librosa.display.specshow(S_db_mel,x_axis='time',y_axis='log',ax=ax)
ax.set_title('Mel Spectogram Waveform ', fontsize=20)
fig.colorbar(img, ax=ax, format=f'%0.2f')
plt.show()

In [None]:
# Visualisation
fig, ax = plt.subplots(figsize=(10, 5))
# Plotting mel spectogram
img = librosa.display.specshow(S_power_deb_mel,x_axis='time',y_axis='log',ax=ax)
# img = librosa.display.specshow(S_power_deb_mel)
ax.set_title('Mel Spectogram Waveform ', fontsize=20)
fig.colorbar(img, ax=ax, format=f'%0.2f')
plt.show()

In [None]:
mfcc = librosa.feature.mfcc(S=S)

In [None]:
# plt.figure(figsize=(20,10))
sns.heatmap(mfcc)

D

D_db

S

S_db_mel

S_power_db_mel

mfcc

In [None]:
D.shape

In [None]:
D_db.shape

In [None]:
S.shape

In [None]:
S_db_mel.shape

In [None]:
S_power_deb_mel.shape

In [None]:
mfcc.shape