# MEL Spectrogram 3 Seconds

We can extend our dataset by creating windows for each of the mel spectrograms. This will allow us to have more training data and improve the performance of our model. We will create windows of 3 seconds for each mel spectrogram, which will give us a total of 10 windows for each mel spectrogram. This will increase our dataset from 1000 samples to 10000 samples, which will help our model learn better.

This process is known as data augmentation, and it is a common technique used in machine learning to increase the size of the training dataset and improve the performance of the model. By creating windows for each mel spectrogram, we are effectively creating new samples that can be used for training, which can help our model learn better and generalize well to unseen data.

## Slicing Mel Spectrograms into 3 Second Windows

In [14]:
import cv2
import os
import numpy as np
import shutil

In [None]:
BASE_PATH = r"C:\Users\JTWit\Documents\ECE 579\Datasets\GTZAN Dataset\images_original"
SAVE_PATH = r"C:\Users\JTWit\Documents\ECE 579\Datasets\GTZAN Dataset\images_3_sec_split"

We will have to get rid of the white space in our mel spectrograms, which will allow us to have a more compact representation of our data. This will help our model learn better and improve the performance of our model. We will use the `numpy` library to remove the white space from our mel spectrograms, which will give us a more compact representation of our data.

In [7]:
def crop_white_borders(img):
    # Convert to grayscale to find the boundaries
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # Threshold the image: anything not white becomes 0 (black)
    # We use 254 to catch "almost white" pixels too
    _, thresh = cv2.threshold(gray, 254, 255, cv2.THRESH_BINARY_INV)

    # Find the coordinates of all non-zero pixels
    coords = cv2.findNonZero(thresh)
    
    # Get the bounding box of those coordinates
    x, y, w, h = cv2.boundingRect(coords)

    # Crop the original image to that bounding box
    return img[y:y+h, x:x+w]

We are now ready to slice the spectrogram into 3 second windows. This will allow us to have more training data and improve the performance of our model. We will use the `numpy` library to slice our spectrograms into 3 second windows, which will give us a total of 10 windows for each mel spectrogram. This will increase our dataset from 1000 samples to 10000 samples, which will help our model learn better.

In [8]:
def slice_spectrograms(source_dir, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    for root, dirs, files in os.walk(source_dir):
        for file in files:
            if file.endswith(".png"):
                img_path = os.path.join(root, file)
                img = cv2.imread(img_path)
                
                if img is None: continue
        
                #Crop Image To Remove Border
                cropped_img = crop_white_borders(img)

                # Get dimensions
                h, w, _ = cropped_img.shape

                # Calculate width of one 3-second slice
                slice_width = w // 10 
                
                # Get genre name from the folder
                genre = os.path.basename(root)
                genre_out_path = os.path.join(output_dir, genre)
                os.makedirs(genre_out_path, exist_ok=True)

                for i in range(10):
                    start_x = i * slice_width
                    end_x = (i + 1) * slice_width
                    
                    # Slice the image [y_start:y_end, x_start:x_end]
                    slice_img = cropped_img[:, start_x:end_x]
                    
                    # Resize to your model's target (e.g., 128x128)
                    slice_img = cv2.resize(slice_img, (128, 128))
                    
                    # Construct name: blues.00000.0.png, blues.00000.1.png
                    base_name = file.replace(".png", "")
                    new_filename = f"{base_name}.{i}.png"
                    
                    cv2.imwrite(os.path.join(genre_out_path, new_filename), slice_img)


Let's call the method so we can create the splits and create a new folder for our augmented data.

In [11]:
slice_spectrograms(BASE_PATH,SAVE_PATH)

We will now make a slice folder that will have test and train split data that we will use in other notebooks in our project. This workflow will allow us to have a more organized structure for our data and make it easier to use in our model training and evaluation. We will use the `os` library to create the necessary folders and move the sliced spectrograms into the appropriate folders for training and testing. This will help us keep our data organized and make it easier to access when we need it for our model training and evaluation.

In [19]:
#Paths to the raw dataset
BASE_PATH = r'C:\Users\JTWit\Documents\ECE 579\Datasets\GTZAN Dataset'
IMAGES_PATH = os.path.join(BASE_PATH,"images_3_sec_split")

#Path to where we will move the split data
SPLIT_BASE_PATH = r'C:\Users\JTWit\Documents\ECE 579\Datasets\Split GTZAN Dataset3 Sec'
SPLIT_TRAIN_PATH = os.path.join(SPLIT_BASE_PATH, 'train')
SPLIT_TEST_PATH = os.path.join(SPLIT_BASE_PATH, 'test')


#Make the target base path and the train and text split directories
os.makedirs(SPLIT_BASE_PATH,exist_ok = True)
os.makedirs(SPLIT_TRAIN_PATH,exist_ok = True)
os.makedirs(SPLIT_TEST_PATH,exist_ok = True)

#Let's also include all the subfolders for train and test
for label in os.listdir(IMAGES_PATH):

    train_path = os.path.join(SPLIT_TRAIN_PATH,label)
    test_path = os.path.join(SPLIT_TEST_PATH,label)

    os.makedirs(train_path,exist_ok = True)
    os.makedirs(test_path,exist_ok = True)

In [24]:
images = {}
for root, dirs, files in os.walk(IMAGES_PATH):
    
    image_paths = []    
    try:
        for file in files:
            file_path = os.path.join(root,file)
            image_paths.append(file_path)

        key = file.split('0')[0]
        images[key] = image_paths
    except Exception as e:
        print(f"Error: {e}")


In [25]:
for key in images.keys():
    np.random.shuffle(images[key]) 

    for i,image in enumerate(images[key]):

        if i < int(0.8*len(images[key])):
            genre = key
            image_name = os.path.basename(image)
            destination_path = os.path.join(SPLIT_TRAIN_PATH,genre,image_name)
            shutil.copyfile(image,destination_path)

        else:
            image_name = os.path.basename(image)
            genre = key
            destination_path = os.path.join(SPLIT_TEST_PATH,genre,image_name)
            shutil.copyfile(image,destination_path)
    

    

## Connecting 3 Second Windows to Labels

The dataset also includes a CSV file that contains the labels for each of the mel spectrograms. We will need to connect the 3 second windows to their corresponding labels in order to use them for training our model. We will use the `pandas` library to read the CSV file and create a mapping between the 3 second windows and their corresponding labels. This will allow us to use the augmented data for training our model and improve its performance.

In [26]:
import pandas as pd

In [27]:
LIBROSA_PATH = r'C:\Users\JTWit\Documents\ECE 579\Datasets\GTZAN Dataset\features_3_sec.csv'

In [34]:
df = pd.read_csv(LIBROSA_PATH)

In [29]:
print(df.head())

            filename  length  chroma_stft_mean  chroma_stft_var  rms_mean  \
0  blues.00000.0.wav   66149          0.335406         0.091048  0.130405   
1  blues.00000.1.wav   66149          0.343065         0.086147  0.112699   
2  blues.00000.2.wav   66149          0.346815         0.092243  0.132003   
3  blues.00000.3.wav   66149          0.363639         0.086856  0.132565   
4  blues.00000.4.wav   66149          0.335579         0.088129  0.143289   

    rms_var  spectral_centroid_mean  spectral_centroid_var  \
0  0.003521             1773.065032          167541.630869   
1  0.001450             1816.693777           90525.690866   
2  0.004620             1788.539719          111407.437613   
3  0.002448             1655.289045          111952.284517   
4  0.001701             1630.656199           79667.267654   

   spectral_bandwidth_mean  spectral_bandwidth_var  ...  mfcc16_var  \
0              1972.744388           117335.771563  ...   39.687145   
1              2010.05

We can create a mapping between the 3 second windows and their corresponding labels by using the `pandas` library to read the CSV file and create a dictionary that maps each 3 second window to its corresponding label. This will allow us to easily access the labels for each of the 3 second windows when we are training our model. We can then use this mapping to create our training and testing datasets, which will include both the 3 second windows and their corresponding labels. This will help us train our model more effectively and improve its performance.

In [35]:
df['image_name'] = df['filename'].str.replace('.wav','.png')

#Print the head of the dataframe to make sure the replacement was successful
print(df.head())

            filename  length  chroma_stft_mean  chroma_stft_var  rms_mean  \
0  blues.00000.0.wav   66149          0.335406         0.091048  0.130405   
1  blues.00000.1.wav   66149          0.343065         0.086147  0.112699   
2  blues.00000.2.wav   66149          0.346815         0.092243  0.132003   
3  blues.00000.3.wav   66149          0.363639         0.086856  0.132565   
4  blues.00000.4.wav   66149          0.335579         0.088129  0.143289   

    rms_var  spectral_centroid_mean  spectral_centroid_var  \
0  0.003521             1773.065032          167541.630869   
1  0.001450             1816.693777           90525.690866   
2  0.004620             1788.539719          111407.437613   
3  0.002448             1655.289045          111952.284517   
4  0.001701             1630.656199           79667.267654   

   spectral_bandwidth_mean  spectral_bandwidth_var  ...  mfcc17_mean  \
0              1972.744388           117335.771563  ...    -3.241280   
1              2010.

We will make a simple loop that will demonnstrate how to match a file to it's corresponding label in the CSV file. This will allow us to create our training and testing datasets with the correct labels for each of the 3 second windows. By doing this, we can ensure that our model is trained on the correct data and can learn to make accurate predictions based on the labels provided in the CSV file.

In [None]:
for root,dirs,files in os.walk(SPLIT_BASE_PATH):


    for file in files:
        print(file)
        print(df.loc[df['image_name'] == file].values.reshape(-1)[2:59])
        
    

blues00000.0.1.png
blues00000.0.png
blues00000.1.9.png
blues00000.2.4.png
blues00000.2.png
blues00000.3.1.png
blues00000.3.9.png
blues00000.4.2.png
blues00000.4.6.png
blues00000.4.7.png
blues00000.5.0.png
blues00000.5.1.png
blues00000.5.3.png
blues00000.5.7.png
blues00000.5.9.png
blues00000.6.1.png
blues00000.6.4.png
blues00000.7.7.png
blues00000.8.6.png
blues00000.9.1.png
blues00000.9.2.png
blues00000.9.8.png
blues00000.9.9.png
blues00001.0.6.png
blues00001.0.png
blues00001.1.1.png
blues00001.1.3.png
blues00001.2.0.png
blues00001.2.7.png
blues00001.3.4.png
blues00001.3.5.png
blues00001.3.8.png
blues00001.3.png
blues00001.4.1.png
blues00001.4.5.png
blues00001.5.6.png
blues00001.5.8.png
blues00001.6.1.png
blues00001.7.6.png
blues00001.7.9.png
blues00001.9.2.png
blues00001.9.5.png
blues00001.9.7.png
blues00002.0.0.png
blues00002.0.6.png
blues00002.0.7.png
blues00002.1.0.png
blues00002.1.2.png
blues00002.2.3.png
blues00002.2.7.png
blues00002.3.3.png
blues00002.4.2.png
blues00002.4.7.png
b