In [5]:
from keras.preprocessing import sequence
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split
import os
import librosa
import numpy as np

## Data Loading and Preprocessing

### **Flowchart**
<div style="text-align:'center';">
<img src="./flowchart.png" width="500"/>
</div>

### **Explanation**
In this particular process we need to firstly load the data. The data is downloaded locally and is put in path *'/charaNet'*. This dataset was downloaded from Kaggle and it's source is linked at the end of this paper.

We defined three directories, each for test, train and validation set that has already been pre-defined from the data source. In this each of the folder contain sub folders which have name of the bird for whose audio are inside the respective folder.

In the next step we define a function that will assist us to load and preprocess the data. Preprocessing data is important for two reasons:
1. The data is in audio format but we want it to 

This function, `load_data(data_dir)`, is designed to load audio data from a specified directory. It takes the directory path `data_dir` as input. Within this directory, the function expects subdirectories, each representing a category or class of audio data. The function iterates through these subdirectories, loading each audio file found within them using the librosa library. It pads the audio data to ensure uniform length using `pad_sequences` from `sequence.data_utils`. The loaded audio data is appended to a list `X`, while corresponding labels are assigned based on the folder name and appended to a separate list `y`. Finally, it returns `X` and `y` as numpy arrays, where `X` contains the audio data and `y` contains the corresponding labels. Overall, this function facilitates the preprocessing and organization of audio data for machine learning tasks such as classification.

In [7]:

# Define the path to your data directory
input_dir = './charaNet/train'
test_dir = './charaNet/test'
val_dir = './charaNet/val'

# Define the sampling rate and duration of audio samples
sampling_rate = 44100
duration = 2  # Duration in seconds

# Function to load audio files and their corresponding labels
def load_data(data_dir):
    X = []  # List to store audio data
    y = []  # List to store corresponding labels
    
    # Iterate over the folders in your data directory
    for label, category in enumerate(os.listdir(data_dir)):
        category_dir = os.path.join(data_dir, category)
        
        # Check if it's a directory
        if os.path.isdir(category_dir):
            # Iterate over the files in the category directory
            for file in os.listdir(category_dir):
                file_path = os.path.join(category_dir, file)
                
                # Load audio data using librosa
                audio_data, _ = librosa.load(file_path, sr=sampling_rate, duration=duration)
                
                # Pad audio data to ensure uniform length
                audio_data = sequence.data_utils.pad_sequences([audio_data], maxlen=sampling_rate * duration, padding='post')[0]
                
                # Append the audio data to X
                X.append(audio_data)
                
                # Assign the label based on the folder name
                y.append(label)
    
    return np.array(X), np.array(y)

# Load data and labels
X, y = load_data(data_dir)

# Convert labels to one-hot encoded vectors
num_classes = len(np.unique(y))
y = to_categorical(y, num_classes=num_classes)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Print shapes for verification
print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)

X_train shape: (4325, 88200)
y_train shape: (4325, 41)
X_test shape: (1082, 88200)
y_test shape: (1082, 41)
