# Mann's Planet

The objective is the classify given audio files into one of the following categories:
1) strathea_iv
2) aegir_27
3) solmara_vi
4) zephyrion_9
5) veyrah_theta
6) xyphos_1

## The approach:

A Mel spectrogram is a representation of the audio signal in the time-frequency domain. It is calculated by applying the Short-Time Fourier Transform (STFT) to the audio signal, followed by a transformation to the Mel scale, which mimics the way humans perceive pitch.
Steps Involved:
1) Load Audio: The audio file (e.g., .wav) is loaded into memory. This can be done using libraries like librosa in Python.

2) Short-Time Fourier Transform (STFT): The audio signal is divided into small overlapping frames, and the Fourier Transform is applied to each frame. This process extracts the frequency information for each frame.

3) Mel Filter Bank: The frequency bins from the STFT are mapped to the Mel scale using a filter bank. This scale approximates the human ear’s response to different frequencies, emphasizing lower frequencies and compressing higher frequencies.

4) Logarithmic Compression: The Mel spectrogram is often compressed logarithmically to reduce the dynamic range and make the features more suitable for neural network-based learning.

5) Resulting Output: The output is a 2D matrix, where the x-axis represents time and the y-axis represents frequency (in Mel scale). Each cell contains the amplitude at that particular time and frequency.



### Imports

In [164]:
import h5py
from keras import Input
from keras.optimizers import Adam
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
import librosa
import numpy as np
import os
import optuna
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Input, GlobalAveragePooling2D, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

## Data pre-processing

In [46]:
train_dir = "train/" 
test_dir = "test/"

#### Constants

In [11]:
SR = 16000  # Sampling rate
TARGET_DURATION = 30  # Target audio length in seconds
MAX_PAD_LENGTH = 128  # Max time steps for Mel spectrogram
FEATURE_SIZE = 128 * MAX_PAD_LENGTH  # Size of flattened Mel spectrogram

#### Checking the training dataset for empty/corrupted files

The empty files are substituted with placeholder mel spectrograms to account for empty files in the test dataset

In [14]:
valid_files = 0
empty_files = 0

for category in os.listdir(train_dir):
    category_path = os.path.join(train_dir, category)
    if os.path.isdir(category_path):
        for file in os.listdir(category_path):
            file_path = os.path.join(category_path, file)
            try:
                y, sr = librosa.load(file_path, sr=16000)
                if len(y) == 0:
                    print(f"Empty file: {file_path}")
                    empty_files += 1
                else:
                    valid_files += 1
            except Exception as e:
                print(f"Error loading {file_path}: {e}")

print(f"✅ Valid audio files: {valid_files}")
print(f"❌ Empty files: {empty_files}")

Empty file: train/strathea_iv\strathea_iv_32.wav
Empty file: train/strathea_iv\strathea_iv_34.wav
Empty file: train/strathea_iv\strathea_iv_35.wav
Empty file: train/strathea_iv\strathea_iv_36.wav
Empty file: train/strathea_iv\strathea_iv_38.wav
Empty file: train/strathea_iv\strathea_iv_39.wav
Empty file: train/strathea_iv\strathea_iv_40.wav
Empty file: train/strathea_iv\strathea_iv_41.wav
Empty file: train/strathea_iv\strathea_iv_42.wav
Empty file: train/strathea_iv\strathea_iv_70.wav
Empty file: train/strathea_iv\strathea_iv_71.wav
Empty file: train/strathea_iv\strathea_iv_78.wav
✅ Valid audio files: 468
❌ Empty files: 12


#### Function to extract mel spectrogram features of the audio files

The tasks performed by this function are:
1) Check length of audio file, if 0 mark as empty file and set zeroed np array
2) To make it so the CNN is trained on uniform data each spectrogram is made using only 30 seconds of audio data, if the length of the audio clip is greater than 30 seconds, the function splits the audio clip into segments. For example a 40 second long audio clip would be split into 30 seconds and 10 seconds segments, a mel spectrogram would be made on the 30 second segment and the 10 second segment gets padded up to 30 seconds and then another spectrogram is made on that
3) If length of audio clip is less than 30 seconds, the clip is padded upto 30 seconds and then the mel spectrogram is extracted

In [17]:
def extract_mel_spectrogram(file_path):
    """
    Extracts Mel spectrograms from audio, handles empty files with a placeholder.
    """
    try:
        y, sr = librosa.load(file_path, sr=16000)

        if len(y) == 0:
            print(f"⚠️ Empty file: {file_path}")
            return [np.zeros((FEATURE_SIZE,))]  # Placeholder for empty files

        max_length = sr * 30 
        num_segments = max(1, len(y) // max_length)

        features = []
        for i in range(num_segments):
            start, end = i * max_length, min((i + 1) * max_length, len(y))
            segment = y[start:end]

            if len(segment) < max_length:
                segment = np.pad(segment, (0, max_length - len(segment)))

            mel_spec = librosa.feature.melspectrogram(y=segment, sr=sr, n_mels=128)
            mel_spec = librosa.power_to_db(mel_spec, ref=np.max)
            features.append(mel_spec.flatten())

        print(f"✅ {file_path}: {len(features)} spectrograms extracted")
        return features

    except Exception as e:
        print(f"❌ Error processing {file_path}: {e}")
        return [np.zeros((FEATURE_SIZE,))]  # Return placeholder on error

#### This function loops through the following directory structure:
##### Manns-Planet.ipynb
##### train/
##### ├── strathea_iv/
##### │    ├── strathea_iv_1.wav
##### │    ├── strathea_iv_2.wav
##### │    ├── ...
##### │    └── strathea_iv_80.wav
##### ├── aegir_27/
##### │    ├── aegir_27_1.wav
##### │    ├── aegir_27_2.wav
##### │    ├── ...
##### │    └── aegir_27_80.wav
##### ├── solmara_vi/
##### │    ├── solmara_vi_1.wav
##### │    ├── solmara_vi_2.wav
##### │    ├── ...
##### │    └── solmara_vi_80.wav
##### ├── zephyrion_9/
##### │    ├── zephyrion_9_1.wav
##### │    ├── zephyrion_9_2.wav
##### │    ├── ...
##### │    └── zephyrion_9_80.wav
##### ├── veyrah_theta/
##### │    ├── veyrah_theta_1.wav
##### │    ├── veyrah_theta_2.wav
##### │    ├── ...
##### │    └── veyrah_theta_80.wav
##### └── xyphos_1/
#####      ├── xyphos_1_1.wav
#####      ├── xyphos_1_2.wav
#####      ├── ...
#####      └── xyphos_1_80.wav

In the loop this function passes each audio file to the extract_mel_spectrogram(file_path) function while also creating a list of features and labels which is then added into the output file - tain_dataset.csv

In [20]:
def process_audio_dataset(data_dir, save_path):
    data = []
    labels = []

    for category in os.listdir(data_dir):
        category_path = os.path.join(data_dir, category)
        if os.path.isdir(category_path):
            label = category  
            for file in os.listdir(category_path):
                file_path = os.path.join(category_path, file)
                mel_features = extract_mel_spectrogram(file_path)
                
                for feature in mel_features:
                    data.append(feature)
                    labels.append(label)
    
    df = pd.DataFrame(data)
    df["label"] = labels  

    if len(df) > 0:
        print("📊 First row of dataset:", df.iloc[0])  
        df.to_csv(save_path, index=False)
        print(f"✅ Dataset saved to {save_path} with {len(df)} samples.")
    else:
        print("❌ No data to save! Check audio processing.")

#### Processes the training data and outputs train_dataset.csv

In [22]:
process_audio_dataset(train_dir, "train_dataset.csv")

✅ train/aegir_27\aegir_27_0.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_1.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_10.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_11.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_12.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_13.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_14.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_15.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_16.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_17.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_18.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_19.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_2.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_20.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_21.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_22.wav: 1 spectrograms extracted
✅ train/aegir_27\aegir_27_23.wav: 1 spectrograms extracted


#### Due to large number of features, csv files become inefficient so the csv file is converted to .h5 (HDF5) file using h5py library

Advantages:
1) Optimized for large datasets: HDF5 is very efficient for storing large amounts of data, especially multi-dimensional arrays.
2) Supports compression: You can compress data in HDF5, which can help reduce storage requirements.
3) Efficient reading/writing: HDF5 is designed for efficient random access to large datasets, which is great for training.
4) Widely supported in machine learning frameworks: Libraries like TensorFlow and Keras work well with HDF5 files, making it easy to integrate with your CNN training pipeline.

In [25]:
df = pd.read_csv("train_dataset.csv")

In [26]:
with h5py.File('train_dataset.h5', 'w') as hf:
    hf.create_dataset('features', data=df.drop(columns=['label']).values)
    hf.create_dataset('labels', data=df['label'].values)

In [27]:
file_path = 'train_dataset.h5' 
with h5py.File(file_path, 'r') as file:
    print("Keys in the file:", list(file.keys()))

    dataset_name = 'labels'  
    if dataset_name in file:
        dataset = file[dataset_name]
        print(f"Dataset shape: {dataset.shape}")
        print(f"Dataset dtype: {dataset.dtype}")
        print(f"Dataset contents (first 5 elements): {dataset[:5]}")

Keys in the file: ['features', 'labels']
Dataset shape: (528,)
Dataset dtype: object
Dataset contents (first 5 elements): [b'aegir_27' b'aegir_27' b'aegir_27' b'aegir_27' b'aegir_27']


#### We'll reshape the dataset to better suit the CNN training process

The shape of the features is now (528, 256, 469), which means there are 528 samples, each represented by a 2D array of size (256, 469). This structure is suitable for input into a CNN.

The labels are in binary form (e.g. [b'aegir_27...]). The labels are correctly decoded to strings, and we have an array of labels (such as 'aegir_27') corresponding to each sample.

In [31]:
with h5py.File(file_path, 'r') as file:
    features_dataset = file['features']
    print(f"Features dataset shape: {features_dataset.shape}")
    print(f"Features dataset dtype: {features_dataset.dtype}")
    
    reshaped_features = features_dataset[:].reshape(-1, 256, 469)  
    print(f"Reshaped features shape: {reshaped_features.shape}")

    labels_dataset = file['labels']
    print(f"Labels dataset shape: {labels_dataset.shape}")
    print(f"Labels dataset dtype: {labels_dataset.dtype}")
    
    decoded_labels = [label.decode('utf-8') for label in labels_dataset]
    print(f"Decoded labels (first 5): {decoded_labels[:5]}")

Features dataset shape: (528, 120064)
Features dataset dtype: float64
Reshaped features shape: (528, 256, 469)
Labels dataset shape: (528,)
Labels dataset dtype: object
Decoded labels (first 5): ['aegir_27', 'aegir_27', 'aegir_27', 'aegir_27', 'aegir_27']


### Final steps before building CNN

#### Processing training set

In [33]:
file_path = "train_dataset.h5"
with h5py.File(file_path, 'r') as file:
    X = np.array(file["features"]) 
    y = np.array(file["labels"])

In [34]:
unique_labels = np.unique(y)
label_to_index = {label: i for i, label in enumerate(unique_labels)}
y = np.array([label_to_index[label] for label in y])

In [186]:
label_to_index

{b'aegir_27': 0,
 b'solmara_vi': 1,
 b'strathea_iv': 2,
 b'veyrah_theta': 3,
 b'xyphos_1': 4,
 b'zephyrion_9': 5}

In [182]:
unique_labels

array([b'aegir_27', b'solmara_vi', b'strathea_iv', b'veyrah_theta',
       b'xyphos_1', b'zephyrion_9'], dtype=object)

One hot encoding the labels

In [36]:
y = to_categorical(y, num_classes=len(unique_labels))

In [180]:
y

array([[1., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0.],
       ...,
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0., 1.]])

Add channels dimension

In [38]:
X = X.reshape(X.shape[0], 256, 469, 1) 
print(f"Final shape of X: {X.shape}")  

Final shape of X: (528, 256, 469, 1)


Checking for NaN values and Infinity

In [88]:
X = np.nan_to_num(X, nan=0.0, posinf=0.0, neginf=0.0)

In [90]:
print("NaN count after replacement:", np.isnan(X).sum())
print("Inf count after replacement:", np.isinf(X).sum())

NaN count after replacement: 0
Inf count after replacement: 0


Scaling using standard scaler

In [92]:
X_reshaped = X.reshape(X.shape[0], -1)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_reshaped).reshape(X.shape)

In [95]:
print("NaN count after scaling:", np.isnan(X_scaled).sum())
print("Inf count after scaling:", np.isinf(X_scaled).sum())

NaN count after scaling: 0
Inf count after scaling: 0


In [85]:
variances = np.var(X_reshaped, axis=0)
print("Zero variance columns:", np.where(variances == 0)[0])

Zero variance columns: []


#### Train-test splitting

In [99]:
X_train, X_val, y_train, y_val = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

print(f"Training set: {X_train.shape}, Validation set: {X_val.shape}")

Training set: (422, 256, 469, 1), Validation set: (106, 256, 469, 1)


#### Processing testing set

In [198]:
def extract_test_mel_spectrogram(file_path):
    """
    Extracts Mel spectrograms from audio, handles empty files with a placeholder.
    For files longer than 30 seconds, only the first segment is considered.
    """
    try:
        y, sr = librosa.load(file_path, sr=16000)

        if len(y) == 0:
            print(f"⚠️ Empty file: {file_path}")
            return [np.zeros((FEATURE_SIZE,))]  # Placeholder for empty files

        max_length = sr * 30  # 30 seconds
        num_segments = max(1, len(y) // max_length)

        features = []
        
        # If the file is longer than 30 seconds, only process the first segment
        num_segments = min(num_segments, 1)

        for i in range(num_segments):
            start, end = i * max_length, min((i + 1) * max_length, len(y))
            segment = y[start:end]

            if len(segment) < max_length:
                segment = np.pad(segment, (0, max_length - len(segment)))

            mel_spec = librosa.feature.melspectrogram(y=segment, sr=sr, n_mels=128)
            mel_spec = librosa.power_to_db(mel_spec, ref=np.max)
            features.append(mel_spec.flatten())

        print(f"✅ {file_path}: {len(features)} spectrograms extracted")
        return features

    except Exception as e:
        print(f"❌ Error processing {file_path}: {e}")
        return [np.zeros((FEATURE_SIZE,))]  # Return placeholder on error

In [200]:
def process_test_dataset(data_dir, save_path):
    data = []
    
    for file in os.listdir(data_dir):
        file_path = os.path.join(data_dir, file)
        mel_features = extract_test_mel_spectrogram(file_path)
        for feature in mel_features:
            data.append(feature)
    
    df = pd.DataFrame(data)

    if len(df) > 0:
        print("📊 First row of dataset:", df.iloc[0])  
        df.to_csv(save_path, index=False)
        print(f"✅ Dataset saved to {save_path} with {len(df)} samples.")
    else:
        print("❌ No data to save! Check audio processing.")

In [202]:
process_test_dataset(test_dir, "test_dataset.csv")

✅ test/sample_001.wav: 1 spectrograms extracted
✅ test/sample_002.wav: 1 spectrograms extracted
✅ test/sample_003.wav: 1 spectrograms extracted
✅ test/sample_004.wav: 1 spectrograms extracted
✅ test/sample_005.wav: 1 spectrograms extracted
✅ test/sample_006.wav: 1 spectrograms extracted
✅ test/sample_007.wav: 1 spectrograms extracted
✅ test/sample_008.wav: 1 spectrograms extracted
✅ test/sample_009.wav: 1 spectrograms extracted
✅ test/sample_010.wav: 1 spectrograms extracted
✅ test/sample_011.wav: 1 spectrograms extracted
✅ test/sample_012.wav: 1 spectrograms extracted
✅ test/sample_013.wav: 1 spectrograms extracted
✅ test/sample_014.wav: 1 spectrograms extracted
✅ test/sample_015.wav: 1 spectrograms extracted
✅ test/sample_016.wav: 1 spectrograms extracted
✅ test/sample_017.wav: 1 spectrograms extracted
✅ test/sample_018.wav: 1 spectrograms extracted
✅ test/sample_019.wav: 1 spectrograms extracted
✅ test/sample_020.wav: 1 spectrograms extracted
⚠️ Empty file: test/sample_021.wav
✅ tes

In [204]:
tdf = pd.read_csv("test_dataset.csv")

In [206]:
with h5py.File('test_dataset.h5', 'w') as hf:
    hf.create_dataset('features', data=tdf.values)

In [208]:
file_path = 'test_dataset.h5' 
with h5py.File(file_path, 'r') as file:
    X_test = np.array(file["features"])
    print("Keys in the file:", list(file.keys()))
    print(X_test.shape)

    dataset_name = 'features'  
    if dataset_name in file:
        dataset = file[dataset_name]
        print(f"Dataset shape: {dataset.shape}")
        print(f"Dataset dtype: {dataset.dtype}")
        print(f"Dataset contents (first 5 elements): {dataset[:5]}")

Keys in the file: ['features']
(120, 120064)
Dataset shape: (120, 120064)
Dataset dtype: float64
Dataset contents (first 5 elements): [[-35.37368011 -40.78717041 -44.33540726 ... -56.90691757 -45.9742012
  -44.61586761]
 [-46.02172852 -45.13357544 -46.42724609 ... -60.46498871 -65.21768188
  -56.02202606]
 [-34.56647491 -33.95949554 -32.14589691 ... -41.91710281 -45.1299324
  -51.14447403]
 [-38.57144928 -38.9967041  -42.0949707  ... -47.98118973 -49.95967102
  -46.78057861]
 [-39.484478   -41.15600586 -42.4562912  ... -80.         -80.
  -80.        ]]


Reshaping to fit the training set shape

In [210]:
X_test = X_test.reshape(X_test.shape[0], 256, 469, 1)
X_test.shape

(120, 256, 469, 1)

Checking for and getting rid of NaN and infinity

In [212]:
X_test = np.nan_to_num(X_test, nan=0.0, posinf=0.0, neginf=0.0)
np.isnan(X_test).sum()

0

Scaling values using standard scaler

In [214]:
X_test_reshaped = X_test.reshape(X_test.shape[0], -1)
scaler = StandardScaler()
X_test_scaled = scaler.fit_transform(X_test_reshaped).reshape(X_test.shape)

In [216]:
print("NaN count after scaling:", np.isnan(X_test_scaled).sum())
print("Inf count after scaling:", np.isinf(X_test_scaled).sum())

NaN count after scaling: 0
Inf count after scaling: 0


In [218]:
test_variances = np.var(X_test_reshaped, axis=0)
print("Zero variance columns:", np.where(test_variances == 0)[0])

Zero variance columns: []


# Building the CNN

In [168]:
def residual_block(x, filters, kernel_size=(3, 3), downsample=False, dropout_rate=0.3):
    shortcut = x  
    stride = (2, 2) if downsample else (1, 1)

    # First convolution layer with L2 regularization
    x = layers.Conv2D(filters, kernel_size, strides=stride, padding='same',
                      kernel_regularizer=regularizers.l2(0.0001))(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    # Second convolution layer with L2 regularization
    x = layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='same',
                      kernel_regularizer=regularizers.l2(0.0001))(x)
    x = layers.BatchNormalization()(x)

    # Match the shortcut dimensions when downsampling
    if shortcut.shape[-1] != filters or downsample:
        shortcut = layers.Conv2D(filters, (1, 1), strides=stride, padding='same',
                                 kernel_regularizer=regularizers.l2(0.0001))(shortcut)
        shortcut = layers.BatchNormalization()(shortcut)

    # Add the shortcut and apply activation
    x = layers.Add()([x, shortcut])
    x = layers.ReLU()(x)

    # Dropout for regularization
    x = layers.Dropout(dropout_rate)(x)

    return x

In [170]:
def build_resnet(input_shape, num_classes):
    inputs = layers.Input(shape=input_shape)

    # Initial Convolutional Layer
    x = layers.Conv2D(32, kernel_size=3, strides=2, padding='same',
                      kernel_regularizer=regularizers.l2(0.0001))(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    # Residual Blocks with Downsampling and Dropout
    x = residual_block(x, 32)
    x = residual_block(x, 64, downsample=True)
    x = residual_block(x, 128, downsample=True)
    x = residual_block(x, 256, downsample=True)

    # Global Average Pooling
    x = layers.GlobalAveragePooling2D()(x)

    # Dropout for additional regularization before the output layer
    x = layers.Dropout(0.4)(x)

    # Output Layer
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    model = models.Model(inputs, outputs)

    return model

In [158]:
input_shape = (256, 469, 1)  
num_classes = 6
model = build_resnet(input_shape, num_classes)

model.summary()

In [172]:
batch_size = 16  
epochs = 50  

# Learning rate adjustments & early stopping
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=1e-5)
early_stopping = EarlyStopping(monitor='val_loss', patience=7, restore_best_weights=True)

# Compile Model
model.compile(optimizer=Adam(learning_rate=0.0005), loss='categorical_crossentropy', metrics=['accuracy'])

# Train Model
history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=epochs,
    batch_size=batch_size,
    callbacks=[early_stopping, reduce_lr],
    verbose=1
)

Epoch 1/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 747ms/step - accuracy: 0.6566 - loss: 1.7010 - val_accuracy: 0.1509 - val_loss: 3.7187 - learning_rate: 5.0000e-04
Epoch 2/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 737ms/step - accuracy: 0.7768 - loss: 1.3903 - val_accuracy: 0.2075 - val_loss: 4.4759 - learning_rate: 5.0000e-04
Epoch 3/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 731ms/step - accuracy: 0.7960 - loss: 1.2867 - val_accuracy: 0.1604 - val_loss: 5.2197 - learning_rate: 5.0000e-04
Epoch 4/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 747ms/step - accuracy: 0.7975 - loss: 1.2365 - val_accuracy: 0.1792 - val_loss: 7.0214 - learning_rate: 5.0000e-04
Epoch 5/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 743ms/step - accuracy: 0.9154 - loss: 0.9934 - val_accuracy: 0.1792 - val_loss: 4.1520 - learning_rate: 2.5000e-04
Epoch 6/50
[1m27/27[0m [32m━━━━━━━━━━━━━━━

In [220]:
predictions = model.predict(X_test_scaled, batch_size=16)
predicted_classes = np.argmax(predictions, axis=1)

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 159ms/step


In [224]:
class_dict = {
    'aegir_27': 0,
    'solmara_vi': 1,
    'strathea_iv': 2,
    'veyrah_theta': 3,
    'xyphos_1': 4,
    'zephyrion_9': 5
}
reversed_dict = {v: k for k, v in class_dict.items()}

file_names = [f"sample_{str(i).zfill(3)}.wav" for i in range(1, 121)]

predicted_class_names = [reversed_dict[class_label] for class_label in predicted_classes]

submission_data = {'file_name': file_names[:len(predicted_class_names)], 'label': predicted_class_names}

submission_df = pd.DataFrame(submission_data)

submission_df.to_csv('submission.csv', index=False)

print(submission_df.head())

        file_name         label
0  sample_001.wav      aegir_27
1  sample_002.wav   zephyrion_9
2  sample_003.wav   strathea_iv
3  sample_004.wav  veyrah_theta
4  sample_005.wav      xyphos_1
