## Introduction
First we need to import the necessary libraries to use for this lab. We'll be using [`numpy`](http://www.numpy.org/) for linear algebra operations, [`matplotlib`](https://matplotlib.org/) for plotting, [`scipy`](https://docs.scipy.org/doc/scipy/reference/) for scientific and numerical computations, and [`tensorflow`](https://www.tensorflow.org/) for computing the neural networks. We collaborated using our [github repository](https://github.com/CoeWorl/ML-Lab-04) where all the code and datasets are located, and instructions to install necessary libraries at Andrew Ng's [github repository](https://github.com/dibgerge/ml-coursera-python-assignments) and the instructions to install the [tensorflow](https://www.tensorflow.org/install) library.

In [5]:
# used for manipulating directory paths
import os

# Scientific and vector computation for python
import numpy as np

# Plotting library
from matplotlib import pyplot

# Optimization module in scipy
from scipy import optimize

# Computing neural network layers
import tensorflow as tf

# Parsing the .tsv/.csv files
import pandas as pd

#scikit learn
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint, EarlyStopping


#### Importing the data
Using pandas to create the dataframe and creating the arrays for the data.

In [6]:
df = pd.read_csv('localify_music_genre - song_dataset_short_limited.csv')

#Testing to see what the dataframe looks like
#print(df.head()) #to test the dataframe

X, y = df.iloc[:, 2:10], df.iloc[:, 12]


In [7]:
df.head()

Unnamed: 0,song_name,artist_name,acousticness,danceability,energy,instrumentalness,liveness,loudness,speechiness,tempo,valence,genre_name,Unnamed: 12
0,All Mixed Up,311,0.0104,0.76,0.702,0.0,0.38,-9.404,0.0716,92.323998,0.659,alternative,0
1,Don't Tread On Me,311,4.5e-05,0.574,0.919,0.0028,0.116,-6.418,0.0737,176.531998,0.704,alternative,0
2,Beautiful Disaster,311,0.000387,0.576,0.738,0.00131,0.13,-8.122,0.0353,168.132004,0.675,alternative,0
3,Backfired,408,0.0767,0.496,0.937,0.0,0.123,-2.628,0.132,178.020996,0.182,punk,8
4,Manic,408,0.00137,0.436,0.98,0.0,0.638,-3.491,0.222,200.097,0.674,punk,8


In [8]:

import random
#create df for training
trainData = df.copy()

#calculate length of validation df
lenValidation = len(df)*0.2

#create empty df for validation
validationData = df.iloc[0:0]

#run until validationData is 20% of original df
while len(validationData) != int(lenValidation):
    #find random index from df
    randomIndex = random.randint(0,len(trainData)-1)
    #get row at index
    row = trainData.iloc[randomIndex]
    #extract artist name
    artist_name = row['artist_name']
    #extract all indices with that artist name
    artist_rows = trainData[trainData['artist_name'] == artist_name]
    #if there is enough space in validationData for them all add to validation
    if(lenValidation-len(validationData) >= len(artist_rows)):
        #add rows to validation df
        validationData = pd.concat([validationData, artist_rows], ignore_index=True)
        #remove validation rows from traindata
        trainData = trainData[trainData['artist_name'] != artist_name]

print(len(trainData))
print(len(validationData))

8000
2000


In [9]:
#create train test split
X_train, X_test, y_train, y_test = train_test_split(X, y)
#Define the deep learning model
model = Sequential()
model.add(Dense(64, activation = 'relu'))
model.add(Dense(32, activation = 'relu'))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(10, activation='softmax'))

# EarlyStopping callback
earlystopping = EarlyStopping(monitor='val_loss', patience=100, restore_best_weights=True)

# ModelCheckpoint callback
checkpoint = ModelCheckpoint(filepath='best_model_weights2.weights.h5', save_weights_only=True, monitor='val_accuracy', save_best_only=True, verbose=1)

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=1000, batch_size=124, validation_data=(X_test, y_test), callbacks=[checkpoint, earlystopping])



Epoch 1/1000
[1m48/61[0m [32m━━━━━━━━━━━━━━━[0m[37m━━━━━[0m [1m0s[0m 2ms/step - accuracy: 0.1064 - loss: 4.6969
Epoch 1: val_accuracy improved from -inf to 0.13400, saving model to best_model_weights2.weights.h5
[1m61/61[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step - accuracy: 0.1079 - loss: 4.3230 - val_accuracy: 0.1340 - val_loss: 2.2711
Epoch 2/1000
[1m45/61[0m [32m━━━━━━━━━━━━━━[0m[37m━━━━━━[0m [1m0s[0m 2ms/step - accuracy: 0.1415 - loss: 2.2760
Epoch 2: val_accuracy improved from 0.13400 to 0.16040, saving model to best_model_weights2.weights.h5
[1m61/61[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.1455 - loss: 2.2754 - val_accuracy: 0.1604 - val_loss: 2.2598
Epoch 3/1000
[1m57/61[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 3ms/step - accuracy: 0.1573 - loss: 2.2613
Epoch 3: val_accuracy improved from 0.16040 to 0.17040, saving model to best_model_weights2.weights.h5
[1m61/61[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x15b6f1f90>