# Deep Learning

## We'll be using sklearning and tensorflow.

### Goals with DL:
 The strategy is to apply DL in order to predict a song's popularity. The threshold to make a song popular is a popularity score above 60. Then a 1 equals popular and a 0 not-popular.

### Preprocessing:

 Before starting to apply any DL methodologies, we first had to prepare our dataset.
 Removing columns with unique for the songs' information would not bring any purpose to predicting the its popularity. Therefore, information such as Song name, Playlist name, URI, and more, were removed.
 
 Bucketing and OneHot Encode will not be used in this datas set as we seek to bring the popupar position only with the features of the songs.

In [97]:
# Import our dependencies
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
from tensorflow.keras.callbacks import ModelCheckpoint

import os
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

In [107]:
# Getting the CSV file with all the information merged together
full_ml_df = pd.read_csv('./Resources/all_data.csv', index_col=False)
unique_ml_df = full_ml_df.copy()

# Every song has a unique Track_URI. So, using drop_duplicates ensures all song tracks are unique.
unique_ml_df = full_ml_df.drop_duplicates(subset=['track_uri'])

# Dropping columns which is specific for each song (like name and uri)
full_ml_df = full_ml_df.drop(columns=['Unnamed: 0', 'followers', 'songs',
                            'playlist_uri','track_uri','artist_name',
                            'song_name', 'analysis_url', 'id','uri',
                            'time_signature', 'playlist_name','track_href',
                            'type', 'mode', 'genre_2', 'genre_1' ])
# Now for the unique songs df.
unique_ml_df = unique_ml_df.drop(columns=['Unnamed: 0', 'followers', 'songs',
                            'playlist_uri','track_uri','artist_name',
                            'song_name', 'analysis_url', 'id','uri',
                            'time_signature', 'playlist_name','track_href',
                            'type', 'mode', 'genre_2', 'genre_1' ])

## Attempt: Full dataset

The dataset was created by selecting 100 playlists with a variety of genres. Because, really popular music may be on more the one playlist, it would be useful to have duplicates included to create a tendency of what a popular song should have as scores.

In [108]:
# Splitting the preprocessed data into features and targets

X = full_ml_df.drop(columns=['popular', 'popularity'])
y = full_ml_df['popular'].values

# Training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y) # Test size is 25%

In [109]:
# Scaling the dataset
scaler = StandardScaler()
X_scaler = scaler.fit(X_train)

X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

### Activation function:

The reasoning behind Tanh, Relu, and sigmoid are:
 - Tanh: The dataset holds some negative values, so for the first layer there is a great value in having them as negative.
 - Relu: Because it starts from zero, it is a good mid-section to connect the negative inputs.
 - Sigmoid: Good activation function for the last layer.


In [124]:
# Define the model - deep neural net, i.e., the number of input features and hidden nodes for each layer.
# Adopting a ratio to generate neurons according to the number of features (columns) given.

number_input_features = len(X_train_scaled[0])
hidden_nodes_layer1 = number_input_features//0.75
hidden_nodes_layer2 = number_input_features//1.20

nn = tf.keras.models.Sequential()

# First hidden layer
nn.add(tf.keras.layers.Dense(units=hidden_nodes_layer1, input_dim=number_input_features, activation='tanh'))

# Second hidden layer
nn.add(tf.keras.layers.Dense(units=hidden_nodes_layer2, activation='relu'))

# Output layer
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

# Compiling the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Check the strucutre of the model
nn.summary()

Model: "sequential_16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_50 (Dense)             (None, 14)                168       
_________________________________________________________________
dense_51 (Dense)             (None, 9)                 135       
_________________________________________________________________
dense_52 (Dense)             (None, 1)                 10        
Total params: 313
Trainable params: 313
Non-trainable params: 0
_________________________________________________________________


### Saving DP Info

Saving the process of the training and the model.

In [125]:
# Creating a checkpoint save for this specify setup

os.makedirs("checkpoint_dl_fulldata/", exist_ok=True)
checkpoint_path = "checkpoint_dl_fulldata/weights.{epoch:02d}.hdf5"

# Create a callback that saves the model's weights every epoch
cp_callback = ModelCheckpoint(
    filepath=checkpoint_path,
    verbose=1,
    save_weights_only=True,
    save_freq='epoch',
    period=12)



### Training the model.

In [126]:
# Train the model
fit_model = nn.fit(X_train_scaled,y_train,epochs=120,callbacks=[cp_callback])

Epoch 1/120
Epoch 2/120
Epoch 3/120
Epoch 4/120
Epoch 5/120
Epoch 6/120
Epoch 7/120
Epoch 8/120
Epoch 9/120
Epoch 10/120
Epoch 11/120
Epoch 12/120

Epoch 00012: saving model to checkpoint_dl_fulldata/weights.12.hdf5
Epoch 13/120
Epoch 14/120
Epoch 15/120
Epoch 16/120
Epoch 17/120
Epoch 18/120
Epoch 19/120
Epoch 20/120
Epoch 21/120
Epoch 22/120
Epoch 23/120
Epoch 24/120

Epoch 00024: saving model to checkpoint_dl_fulldata/weights.24.hdf5
Epoch 25/120
Epoch 26/120
Epoch 27/120
Epoch 28/120
Epoch 29/120
Epoch 30/120
Epoch 31/120
Epoch 32/120
Epoch 33/120
Epoch 34/120
Epoch 35/120
Epoch 36/120

Epoch 00036: saving model to checkpoint_dl_fulldata/weights.36.hdf5
Epoch 37/120
Epoch 38/120
Epoch 39/120
Epoch 40/120
Epoch 41/120
Epoch 42/120
Epoch 43/120
Epoch 44/120
Epoch 45/120
Epoch 46/120
Epoch 47/120
Epoch 48/120

Epoch 00048: saving model to checkpoint_dl_fulldata/weights.48.hdf5
Epoch 49/120
Epoch 50/120
Epoch 51/120
Epoch 52/120
Epoch 53/120
Epoch 54/120
Epoch 55/120
Epoch 56/120
Epoch

Epoch 76/120
Epoch 77/120
Epoch 78/120
Epoch 79/120
Epoch 80/120
Epoch 81/120
Epoch 82/120
Epoch 83/120
Epoch 84/120

Epoch 00084: saving model to checkpoint_dl_fulldata/weights.84.hdf5
Epoch 85/120
Epoch 86/120
Epoch 87/120
Epoch 88/120
Epoch 89/120
Epoch 90/120
Epoch 91/120
Epoch 92/120
Epoch 93/120
Epoch 94/120
Epoch 95/120
Epoch 96/120

Epoch 00096: saving model to checkpoint_dl_fulldata/weights.96.hdf5
Epoch 97/120
Epoch 98/120
Epoch 99/120
Epoch 100/120
Epoch 101/120
Epoch 102/120
Epoch 103/120
Epoch 104/120
Epoch 105/120
Epoch 106/120
Epoch 107/120
Epoch 108/120

Epoch 00108: saving model to checkpoint_dl_fulldata/weights.108.hdf5
Epoch 109/120
Epoch 110/120
Epoch 111/120
Epoch 112/120
Epoch 113/120
Epoch 114/120
Epoch 115/120
Epoch 116/120
Epoch 117/120
Epoch 118/120
Epoch 119/120
Epoch 120/120

Epoch 00120: saving model to checkpoint_dl_fulldata/weights.120.hdf5


In [127]:
# Evaluating the model with our test data

model_loss, model_accuracy = nn.evaluate(X_test_scaled,y_test,verbose=2)
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

98/98 - 0s - loss: 0.5066 - accuracy: 0.7630
Loss: 0.5066050291061401, Accuracy: 0.7629629373550415


In [128]:
# Export our model to HDF5 file
nn.save("fulldata_dl.h5")

## Attempt: Unique songs dataset

Observing the effects of not havind repeated songs, and how it will respond to the popularity score > 60.

In [136]:
# Splitting the preprocessed data into features and targets

Xu = unique_ml_df.drop(columns=['popular', 'popularity'])
yu = unique_ml_df['popular'].values

# Training and testing set
Xu_train, Xu_test, yu_train, yu_test = train_test_split(Xu, yu) # Test size is 25%

In [137]:
# Scaling the dataset
scaler = StandardScaler()
Xu_scaler = scaler.fit(Xu_train)

Xu_train_scaled = Xu_scaler.transform(Xu_train)
Xu_test_scaled = Xu_scaler.transform(Xu_test)

In [138]:
# Define the model - deep neural net, i.e., the number of input features and hidden nodes for each layer.
# Adopting a ratio to generate neurons according to the number of features (columns) given.

number_input_features = len(Xu_train_scaled[0])
hidden_nodes_layer1 = number_input_features//0.75
hidden_nodes_layer2 = number_input_features//1.20

nn = tf.keras.models.Sequential()

# First hidden layer
nn.add(tf.keras.layers.Dense(units=hidden_nodes_layer1, input_dim=number_input_features, activation='tanh'))

# Second hidden layer
nn.add(tf.keras.layers.Dense(units=hidden_nodes_layer2, activation='relu'))

# Output layer
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

# Compiling the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Check the strucutre of the model
nn.summary()

Model: "sequential_18"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_56 (Dense)             (None, 14)                168       
_________________________________________________________________
dense_57 (Dense)             (None, 9)                 135       
_________________________________________________________________
dense_58 (Dense)             (None, 1)                 10        
Total params: 313
Trainable params: 313
Non-trainable params: 0
_________________________________________________________________


In [139]:
# Creating a checkpoint save for this specify setup

os.makedirs("checkpoint_dl_uniquedata/", exist_ok=True)
checkpoint_path = "checkpoint_dl_uniquedata/weights.{epoch:02d}.hdf5"

# Create a callback that saves the model's weights every epoch
cp_callback = ModelCheckpoint(
    filepath=checkpoint_path,
    verbose=1,
    save_weights_only=True,
    save_freq='epoch',
    period=12)





In [140]:
# Train the model
fit_model = nn.fit(Xu_train_scaled,yu_train,epochs=120,callbacks=[cp_callback])

Epoch 1/120
Epoch 2/120
Epoch 3/120
Epoch 4/120
Epoch 5/120
Epoch 6/120
Epoch 7/120
Epoch 8/120
Epoch 9/120
Epoch 10/120
Epoch 11/120
Epoch 12/120

Epoch 00012: saving model to checkpoint_dl_uniquedata/weights.12.hdf5
Epoch 13/120
Epoch 14/120
Epoch 15/120
Epoch 16/120
Epoch 17/120
Epoch 18/120
Epoch 19/120
Epoch 20/120
Epoch 21/120
Epoch 22/120
Epoch 23/120
Epoch 24/120

Epoch 00024: saving model to checkpoint_dl_uniquedata/weights.24.hdf5
Epoch 25/120
Epoch 26/120
Epoch 27/120
Epoch 28/120
Epoch 29/120
Epoch 30/120
Epoch 31/120
Epoch 32/120
Epoch 33/120
Epoch 34/120
Epoch 35/120
Epoch 36/120

Epoch 00036: saving model to checkpoint_dl_uniquedata/weights.36.hdf5
Epoch 37/120
Epoch 38/120
Epoch 39/120
Epoch 40/120
Epoch 41/120
Epoch 42/120
Epoch 43/120
Epoch 44/120
Epoch 45/120
Epoch 46/120
Epoch 47/120
Epoch 48/120

Epoch 00048: saving model to checkpoint_dl_uniquedata/weights.48.hdf5
Epoch 49/120
Epoch 50/120
Epoch 51/120
Epoch 52/120
Epoch 53/120
Epoch 54/120
Epoch 55/120
Epoch 56/1

Epoch 76/120
Epoch 77/120
Epoch 78/120
Epoch 79/120
Epoch 80/120
Epoch 81/120
Epoch 82/120
Epoch 83/120
Epoch 84/120

Epoch 00084: saving model to checkpoint_dl_uniquedata/weights.84.hdf5
Epoch 85/120
Epoch 86/120
Epoch 87/120
Epoch 88/120
Epoch 89/120
Epoch 90/120
Epoch 91/120
Epoch 92/120
Epoch 93/120
Epoch 94/120
Epoch 95/120
Epoch 96/120

Epoch 00096: saving model to checkpoint_dl_uniquedata/weights.96.hdf5
Epoch 97/120
Epoch 98/120
Epoch 99/120
Epoch 100/120
Epoch 101/120
Epoch 102/120
Epoch 103/120
Epoch 104/120
Epoch 105/120
Epoch 106/120
Epoch 107/120
Epoch 108/120

Epoch 00108: saving model to checkpoint_dl_uniquedata/weights.108.hdf5
Epoch 109/120
Epoch 110/120
Epoch 111/120
Epoch 112/120
Epoch 113/120
Epoch 114/120
Epoch 115/120
Epoch 116/120
Epoch 117/120
Epoch 118/120
Epoch 119/120
Epoch 120/120

Epoch 00120: saving model to checkpoint_dl_uniquedata/weights.120.hdf5


In [141]:
# Evaluating the model with our test data

model_loss, model_accuracy = nn.evaluate(Xu_test_scaled,yu_test,verbose=2)
print(f"Loss: {model_loss}, Accuracy: {model_accuracy}")

53/53 - 0s - loss: 0.5884 - accuracy: 0.6880
Loss: 0.5883817672729492, Accuracy: 0.6879810690879822


In [142]:
# Export our model to HDF5 file
nn.save("uniquedata_dl.h5")