# Multi-Input Neural Network 

In this notebook we will experiment with neural networks that can handle multiple types of input data. We will use TensorFlow and Keras to build a model that takes both numerical and image data as inputs.

## Librosa Features

The motivation behind this notebook is to create a neural network that can take both numerical data (e.g., features extracted from audio files using Librosa) and image data (e.g., spectrograms or other visual representations of audio) as inputs. This can be useful in scenarios where we want to leverage both types of information for tasks such as classification or regression.

In [1]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split

In [22]:
LIBROSA_DATA = r"C:\Users\JTWit\Documents\ECE 579\Datasets\GTZAN Dataset\features_30_sec.csv"
BASE_PATH = r"C:\Users\JTWit\Documents\ECE 579\Datasets\Split GTZAN Dataset"
TEST_PATH = os.path.join(BASE_PATH,'test')
TRAIN_PATH = os.path.join(BASE_PATH,'train')

UN_SPLIT_PATH = r"C:\Users\JTWit\Documents\ECE 579\Datasets\GTZAN Dataset\images_original"

Let's load the librosa data using pandas.

In [3]:
librosa_df = pd.read_csv(LIBROSA_DATA)

librosa_df["filename"] = librosa_df["filename"].str.replace(".", "",1, regex=False)
librosa_df["filename"] = librosa_df["filename"].str.replace(".wav",'.png')

FEATURE_COLS = librosa_df.columns.difference(["filename", "label"])
librosa_df["label_id"] = librosa_df["label"].astype("category").cat.codes

train_df, val_df = train_test_split(
    librosa_df,
    test_size=0.2,
    stratify=librosa_df["label_id"],  # keeps class balance
    random_state=42
)

def build_image_path(row, base_dir):
    return os.path.join(base_dir, row["label"], row["filename"])

train_df["image_path"] = train_df.apply(
    lambda r: build_image_path(r, UN_SPLIT_PATH), axis=1
)

val_df["image_path"] = val_df.apply(
    lambda r: build_image_path(r, UN_SPLIT_PATH), axis=1
)


In [4]:
def filter_existing_images(df, unsplit_base):
    def image_exists(row):
        # Extract basename and class from current image_path
        basename = os.path.basename(row["image_path"])
        # Split on '0', adjust this logic as needed for your class extraction!
        class_name = basename.split('0')[0]
        candidate_path = os.path.join(unsplit_base, class_name, basename)
        return os.path.exists(candidate_path)
    
    mask = df.apply(image_exists, axis=1)
    return df[mask].reset_index(drop=True)

# Usage:
train_df = filter_existing_images(train_df, UN_SPLIT_PATH)
val_df = filter_existing_images(val_df, UN_SPLIT_PATH)

## Neural Network Archetecture

In [20]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Concatenate, Flatten
from tensorflow.keras.models import Model

Let's start by defining the two input branches of our neural network: one for numerical data and another for image data.

In [21]:
librosa_shape = (len(FEATURE_COLS),)
image_shape = (432,288,3)

# Input 1: e.g., a numerical feature vector (shape depends on your data)
input1 = Input(librosa_shape, name='librosa_input') 

# Input 2: e.g., image data (shape depends on your image dimensions and channels)
input2 = Input(image_shape, name='image_input') 

Next we can create specificic branches for each input type. 

In [41]:
# Branch 1 for numerical data
x1 = Dense(32, activation='relu')(input1)
x1 = Dense(16, activation='relu')(x1)

# Branch 2 for image data (using placeholder layers for illustration)
x2 = tf.keras.layers.Conv2D(16, (3, 3), activation='relu')(input2)
x2 = tf.keras.layers.MaxPooling2D((2, 2))(x2)
x2 = Flatten()(x2) # Flatten the output of the CNN branch
x2 = Dense(16, activation='relu')(x2)

The next step is to combine the outputs of these branches and add some dense layers to process the combined information.

In [None]:
# Concatenate the outputs of the two branches
concatenated = Concatenate()([x1, x2])

# Add more layers after concatenation as needed
y = Dense(32, activation='relu')(concatenated)

Finally we need to define the output layer of the model.

In [43]:
# Final output layer
output = Dense(1, activation='linear')(y) # Example for a regression task

# Create the final model
model = Model(inputs=[input1, input2], outputs=output)

In [44]:
model.summary()

## Training Preliminary Work

In [10]:
import os
import numpy as np
import tensorflow as tf
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from datetime import datetime as datetime

Now that we have an archetecture defined, we can prepare for training the model. This includes compiling the model with an appropriate loss function and optimizer. These steps are the same as the standard model training process.

Let's start by defining some base paths that we will use for loading and saving data.

In [11]:
BASE_PATH = r"C:\Users\JTWit\Documents\ECE 579\Datasets\Split GTZAN Dataset"
TEST_PATH = os.path.join(BASE_PATH,'test')
TRAIN_PATH = os.path.join(BASE_PATH,'train')

SAVE_PATH = os.path.join(r"C:\Users\JTWit\Documents\ECE 579","Custom DNN Models")

#Make the save path for the neural network just in case it does not yet exist
os.makedirs(SAVE_PATH,exist_ok = True)

checkpoint_dir = os.path.join(r"C:\Users\JTWit\Documents\ECE 579",'Training Checkpoints')
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}.weights.h5")

# Create the directory if it doesn't exist
os.makedirs(checkpoint_dir, exist_ok=True)

Next, we will compile the model.

In [96]:
model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

We can also add some optimizers and metrics to monitor during training.

In [46]:
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.2, 
    patience=2)

earlystop = EarlyStopping(
    monitor='val_acc',
    mode="max", 
    patience=3)

checkpoint = ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True,  
    monitor='val_loss',      
    save_best_only=False,    
    verbose=1                
)


callbacks = [reduce_lr,earlystop] 

We should now specify some of the hyperparameters for training the model, such as the number of epochs, thie initial learning rate, and batch size. We will also set the target size for the images and the name that we will use to save the trained model.

In [None]:
LEARNING_RATE = 1e-5
EPOCHS = 30

BATCH_SIZE = 16

TARGET_SIZE = (432,288)

NETWORK_NAME = "GTZAN Multi-Input"

Next, let's define the function to train the model using both types of input data.

In [12]:
IMG_SIZE = (432, 288)
BATCH_SIZE = 16

def load_sample(image_path, num_features, label):
    img = tf.io.read_file(image_path)
    img = tf.image.decode_png(img, channels=3)
    img = tf.image.resize(img, IMG_SIZE)
    img = tf.cast(img, tf.float32) / 255.0
    return (num_features, img), label


In [13]:
def make_dataset(df, training=True):
    ds = tf.data.Dataset.from_tensor_slices(
        (
            df["image_path"].values,
            df[FEATURE_COLS].values.astype("float32"),
            df["label_id"].values
        )
    )

    if training:
        ds = ds.shuffle(1024)

    ds = ds.map(load_sample, num_parallel_calls=tf.data.AUTOTUNE)
    ds = ds.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
    return ds


In [46]:
train_ds = make_dataset(train_df, training=True)
val_ds   = make_dataset(val_df, training=False)


## Training the Model

Now that we are finished with the preliminary work, we can train the model using the training data generator.

In [47]:
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=EPOCHS,
    callbacks=callbacks
)

accuracy = history.history['accuracy'][-1]
date_str = datetime.today().strftime('%Y-%m-%d')
name_string = f"{NETWORK_NAME}(accuracy = {accuracy:.4f})(date = {date_str}).keras"

save_file = os.path.join(SAVE_PATH,name_string)

model.save(save_file)

Epoch 1/30


  return self.fn(y_true, y_pred, **self._fn_kwargs)


[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 286ms/step - accuracy: 0.0751 - loss: 5.3637e-07 - val_accuracy: 0.0850 - val_loss: 5.3644e-07 - learning_rate: 0.0010
Epoch 2/30


  current = self.get_monitor_value(logs)


[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 282ms/step - accuracy: 0.0751 - loss: 5.3637e-07 - val_accuracy: 0.0850 - val_loss: 5.3644e-07 - learning_rate: 0.0010
Epoch 3/30
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 266ms/step - accuracy: 0.0751 - loss: 5.3637e-07 - val_accuracy: 0.0850 - val_loss: 5.3644e-07 - learning_rate: 0.0010
Epoch 4/30
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 198ms/step - accuracy: 0.0751 - loss: 5.3637e-07 - val_accuracy: 0.0850 - val_loss: 5.3644e-07 - learning_rate: 2.0000e-04
Epoch 5/30
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 199ms/step - accuracy: 0.0751 - loss: 5.3637e-07 - val_accuracy: 0.0850 - val_loss: 5.3644e-07 - learning_rate: 2.0000e-04
Epoch 6/30
[1m50/50[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 207ms/step - accuracy: 0.0751 - loss: 5.3637e-07 - val_accuracy: 0.0850 - val_loss: 5.3644e-07 - learning_rate: 4.0000e-05
Epoch 7/30
[1m50/50[0m

## Evaluating the Model

Now that we have trained the model, we can evaluate its performance on a test dataset containing both numerical and image data. Let's start by making a generator that we can use to load the test data in batches.

## Scratch Work

In [15]:
import os
import pandas as pd
import numpy as np
import cv2 

In [None]:
LIBROSA_DATA = r"C:\Users\JTWit\Documents\ECE 579\Datasets\GTZAN Dataset\features_30_sec.csv"

BASE_PATH = r"C:\Users\JTWit\Documents\ECE 579\Datasets\Split GTZAN Dataset"
TEST_PATH = os.path.join(BASE_PATH,'test')
TRAIN_PATH = os.path.join(BASE_PATH,'train')

In [45]:
df = pd.read_csv(LIBROSA_DATA)

df['image_name'] = df['filename'].str.replace('.','',1)
df['image_name'] = df['image_name'].str.replace('.wav','.png')

arr = df.loc[df['image_name'] == 'blues00000.png'].values

print(arr.reshape(-1)[2:59])



[0.3500881195068359 0.0887565687298774 0.1302279233932495
 0.0028266964945942 1784.165849538755 129774.06452515082
 2002.4490601176965 85882.76131549841 3805.8396058403423 901505.425532842
 0.0830448206689868 0.000766945654594 -4.5297241740627214e-05
 0.0081722820177674 7.783231922076084e-06 0.0056981821544468 123.046875
 -113.57064819335938 2564.20751953125 121.57179260253906 295.913818359375
 -19.168142318725582 235.57443237304688 42.36642074584961
 151.10687255859375 -6.364664077758789 167.93479919433594
 18.623498916625977 89.18083953857422 -13.704891204833984
 67.66049194335938 15.34315013885498 68.93257904052734 -12.274109840393066
 82.2042007446289 10.976572036743164 63.38631057739258 -8.326573371887207
 61.773094177246094 8.803791999816895 51.24412536621094 -3.672300100326538
 41.21741485595703 5.747994899749756 40.55447769165039 -5.162881851196289
 49.775421142578125 0.752740204334259 52.4209098815918 -1.6902146339416504
 36.524070739746094 -0.4089791774749756 41.5971031188964

In [None]:
X_TRAIN = list()

for root, dirs, files in os.walk(TRAIN_PATH):
    for dir in dirs:

        temp_dict = {}
        for file in os.listdir(os.path.join(root,dir)):
            

            file_path = os.path.join(root,dir,file)
            temp_dict['image'] = cv2.imread(file_path)
            temp_dict['librosa_data'] = (df.loc[df['image_name'] == file].values).reshape(-1)[2:59]
            temp_dict['label'] = dir

            X_TRAIN.append(temp_dict)

(799,)


In [50]:
X_TRAIN = np.array(X_TRAIN)
print(X_TRAIN.shape)
print(X_TRAIN[0].keys())

(799,)
dict_keys(['image', 'librosa_data', 'label'])
