# 1. Collect Diverse Song Samples

Diverse Song Samples Collected From Github Repository.

# 2. Preprocess Audio Files

Build a Function which takes input as an audio file,
1. Normalizes the audio to a certain threshold and returns the normalized wave tensor,
2. Removes it's non-vocal Section and returns the wave tensor of the vocal section,
3. Splits it into equal shape tensors,
4. and then saves the splitted wave tensors into .wav file format into a new Directory (Dataset).

## A. Import all the necessary Libraries

In [11]:
import os
import torchaudio
import torch
from typing import List

from IPython.display import Audio

print(f"PyTorch Version: {torch.__version__}")
print(f"Cuda Avalilable: {torch.cuda.is_available()}")

PyTorch Version: 2.4.1
Cuda Avalilable: True


In [12]:
device = "cuda" if torch.cuda.is_available() else "cpu"

## B. Import all the helper functions to Normalize, Remove non-vocal Section, Split into Chunks and Save the Chunks in the Desired Directory.

In [13]:
from helper_functions import convert_to_wav
from helper_functions import normalize
from helper_functions import remove_non_vocals
from helper_functions import separate_sources
from helper_functions import split_into_equal_tensors
from helper_functions import save_tensors_to_directory

In [14]:
def make_preprocessed_dataset(raw_dataset_path: str, output_dir: str, class_names: List[str], dB: int, window_size: int, hop_length:int, device: torch.device):
    classes_path = [os.path.join(raw_dataset_path, classes) for classes in class_names]
    
    for classes, class_name in zip(classes_path, class_names):
        print(f"Class: {class_name}")
        for file_name in os.listdir(classes):
            file_path = os.path.join(classes, file_name)
            
            if os.path.basename(file_name).split(".")[-1] == "wav":
                wave_tensor, sr = torchaudio.load(uri=file_path)
                wave_tensor = normalize(wave_tensor, dB=dB, device=device)
                
                wave_tensor = separate_sources(mix=wave_tensor.unsqueeze(0),
                                               segment=5,
                                               overlap=3,
                                               device=device)
                
                wave_chunk = split_into_equal_tensors(wave_tensor,
                                                      window_size=window_size,
                                                      hop_length=hop_length,
                                                      device=device)
                
                save_tensors_to_directory(wave_chunk, sr=sr, input_path=file_path, output_dir=output_dir, class_name=class_name)
        print(f"All Files of class \"{class_name}\" have been preprocessed and saved in the Output Directory \"{os.path.basename(output_dir)}\"")
        print()
    print(f"Preprocessing of audio files from Dataset \"{os.path.basename(raw_dataset_path)}\" is finished")

In [15]:
raw_dataset_path = r"D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Song-Language-Classifier\DataBase"

get_classes = (lambda input_directory: [label for label in os.listdir(input_directory) if os.path.isdir(os.path.join(raw_dataset_path, label))])
get_classes(raw_dataset_path)

output_directory = r"D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Dataset"


make_preprocessed_dataset(raw_dataset_path=raw_dataset_path,
                          output_dir=output_directory,
                          class_names=get_classes(raw_dataset_path),
                          dB=60,
                          window_size=int(10*44100),
                          hop_length=int(5*44100),
                          device=device)

Class: English
Saved chunks of Dabin - Alive in D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Dataset\English
Saved chunks of OneRepublic - If I Lose Myself in D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Dataset\English
Saved chunks of Said The Sky - Potions in D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Dataset\English
Saved chunks of Seven Lions - First Time in D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Dataset\English
Saved chunks of Shallou - You and Me in D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Dataset\English
All Files of class "English" have been preprocessed and saved in the Output Directory "Dataset"

Class: French
Saved chunks of Aya Nakamura - Djadja (Clip officiel) in D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Dataset\French
Saved chunks of Indila - Dernière Danse (Clip Officiel) in D:\Sarvesh\VIT Stuff\2024-25 Fall Sem\Song Language Detector\Dataset\French
Saved chunks

# 3. Extract Relevent Features

### &nbsp; &nbsp; a. Mel-frequency Cepstral Coefficients (MFCCs)

### &nbsp; &nbsp; b. Spectrograms

# 4. Label Data

# 5. Split Dataset

# 6. Data Augmentation (Optional)