Neural Networks

This is the notebook taking care of the task no.4 : feature prediction.

In our analysis we have decided very different models to see how they differ from eachother in term of performance and also their explainability to see if they make sense.

For this notebook we decided to explore the nature of NeuralNetworks on the dataset.

First a preparation is due to be used.

## dataset preparation



In [15]:
%run ../task4_machine_learning/preprocessing.py
import pandas as pd
import os

races_final_path = path.join('..','dataset', 'engineered_races.csv')
cyclists_final_path = path.join('..','dataset', 'cyclists_final_enhanced.csv')


cyclists_data = pd.read_csv(cyclists_final_path)
races_data = pd.read_csv(races_final_path)
X_dev,y_dev,X_test,y_test,columns_to_keep=get_train_test_data()

First we binarize the columns

In [16]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

std_scaler=StandardScaler()

X_dev=std_scaler.fit_transform(X_dev)
X_test=std_scaler.fit_transform(X_test)

X_train,X_val,Y_train,Y_val=train_test_split(
    X_dev,y_dev,
    test_size=0.2,
    random_state=42,
    stratify=y_dev
    )

We decided to try neural networks as to test a different family of models, in this case we don't expect very nice results since the features are noisy and unbalanced thus we don't expect very good performances.

For this model we used early stopping to avoid overfitting, the training is done using stratification and with and 80-20 hold out setting for the development set.

For this task we used he and glorot which should counter the exploding/vanishing gradient problems,for optizimization we employed Adam which should be having good performances and beahve well with noisy data that is not too sparse given the preprocessing used and is faster to converge. The loss function we used is a binary cross entropy which is indicated for binary classification tasks.

In [17]:
import tensorflow as tf

from keras import layers, models, initializers
from keras.optimizers import Adamax, SGD
import itertools as it
from keras.callbacks import EarlyStopping
from keras.initializers import GlorotUniform, GlorotNormal,HeNormal,HeUniform
initializer=initializers.HeNormal()


def get_device_auto():
    gpus_list=tf.config.list_physical_devices('GPU')
    device = None
    if len(gpus_list) != 0:
        device=gpus_list[0]
    else:
        device=tf.config.list_physical_devices('CPU')[0]
    return device

def create_ff_nn(
        optimizer=Adamax(),
        num_layers=2,
        num_units=64,
        input_dim=256,
        hidden_activation='relu',
        output_activation='sigmoid',
        loss_function='binary_crossentropy',
        metrics=['accuracy','f1_score','binary_crossentropy'],
        learning_rate=0.001
        ):
    model=models.Sequential()
    optimizer = Adamax()
    model.add(layers.Dense(num_units,input_dim=input_dim,activation=hidden_activation))
    for _ in range(num_layers -1):
        model.add(layers.Dense(
            num_units,
            activation=hidden_activation,
            kernel_initializer=HeNormal()
            ))
    model.add(layers.Dense(
        1,
        activation=output_activation,
        kernel_initializer=GlorotNormal()
        ))
    optimizer.learning_rate=learning_rate
    model.compile(
        optimizer=optimizer,
        loss=loss_function,
        metrics=metrics
    )
    return model

def hyperparams_iterator(hyperparams):
    return map(
        lambda comb:  {k:v for k,v in zip(hyperparams.keys(),comb)},
        it.product(*hyperparams.values())
    )

early_stopping=EarlyStopping(
    monitor='f1_score',
    patience=5,
    verbose=1,
    restore_best_weights=True
)
hyperparams={
    'num_layers':[10,15,20,30],
    'learning_rate':[0.001,0.0001,0.00001],
    'num_units':[1024]
}

device=get_device_auto()
batch_size=1024
tf.random.set_seed(42)
best_val=float('-inf')

In [18]:
results=[]

with tf.device(device.device_type):
    for params in hyperparams_iterator(hyperparams):
        model=create_ff_nn(**params,input_dim=X_train.shape[1])
        model.fit(
            X_train,Y_train,
            batch_size=batch_size,
            validation_data=(X_val,Y_val),
            callbacks=[early_stopping]
            )
        new_row=params

        eval_results=model.evaluate(X_train,Y_train,batch_size=batch_size,return_dict=True)
        f1_score,accuracy,bin_cross_ent=eval_results['f1_score'],eval_results['accuracy'],eval_results['binary_crossentropy']
        new_row|={
            'f1_score_train':f1_score,
            'accuracy_train':accuracy,
            'bin_cross_ent_train':bin_cross_ent,
            }
        eval_results=model.evaluate(X_val,Y_val,batch_size=batch_size,return_dict=True)
        f1_score,accuracy,bin_cross_ent=eval_results['f1_score'],eval_results['accuracy'],eval_results['binary_crossentropy']
        new_row|={
            'f1_score_val':f1_score,
            'accuracy_val':accuracy,
            'bin_cross_ent_val':bin_cross_ent,
            }
        if bin_cross_ent < best_val:
            best_val = bin_cross_ent
            model.save('weights/best_ff_nn.h5')
        print(new_row)
        results.append(new_row)
pd_results=pd.DataFrame(results)

pd_results.sort_values(by='bin_cross_ent_val')


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m434/434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m620s[0m 1s/step - accuracy: 0.8386 - binary_crossentropy: 0.4381 - f1_score: 0.2885 - loss: 0.4381 - val_accuracy: 0.8445 - val_binary_crossentropy: 0.3839 - val_f1_score: 0.2894 - val_loss: 0.3839
Restoring model weights from the end of the best epoch: 1.
[1m434/434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m170s[0m 393ms/step - accuracy: 0.8469 - binary_crossentropy: 0.3791 - f1_score: 0.2883 - loss: 0.3791
[1m109/109[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 384ms/step - accuracy: 0.8446 - binary_crossentropy: 0.3837 - f1_score: 0.2900 - loss: 0.3837
{'num_layers': 10, 'learning_rate': 0.001, 'num_units': 1024, 'f1_score_train': 0.289430171251297, 'accuracy_train': 0.8466071486473083, 'bin_cross_ent_train': 0.38013291358947754, 'f1_score_val': 0.28942960500717163, 'accuracy_val': 0.8445424437522888, 'bin_cross_ent_val': 0.383856862783432}
[1m434/434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m632s[0

Unnamed: 0,num_layers,learning_rate,num_units,f1_score_train,accuracy_train,bin_cross_ent_train,f1_score_val,accuracy_val,bin_cross_ent_val
10,30,0.0001,1024,0.28943,0.84777,0.378067,0.28943,0.845309,0.38307
7,20,0.0001,1024,0.28943,0.847586,0.378185,0.28943,0.844957,0.383234
4,15,0.0001,1024,0.28943,0.846984,0.379108,0.28943,0.844903,0.383785
0,10,0.001,1024,0.28943,0.846607,0.380133,0.28943,0.844542,0.383857
3,15,0.001,1024,0.28943,0.846695,0.379666,0.28943,0.843848,0.383898
6,20,0.001,1024,0.28943,0.846948,0.380031,0.28943,0.844272,0.384069
1,10,0.0001,1024,0.28943,0.846005,0.380074,0.28943,0.843596,0.384319
9,30,0.001,1024,0.28943,0.845674,0.381488,0.28943,0.843641,0.384579
11,30,1e-05,1024,0.28943,0.844102,0.384151,0.28943,0.841783,0.388526
8,20,1e-05,1024,0.28943,0.842991,0.387503,0.28943,0.840953,0.390892


After doing the training we can see that results are very poor, which is expected for such family of models on this dataset. From an analysis of the results we see that 30 layers with 1024 units seem to be the best indicating that other configurations perform similarly however we have that 20 layers perform better while deeper models have worst performances this could be due to overfitting in general,as expected, we have very poor performances as expected of this family of models.

As expected we have a very poor perofrmance and we see that even among the top ones there is no real difference aside from a slight discrepancy in the binary cross entropy loss given by the increasing complexity there is no real improvement in the classification metric, we can still try to use the most performing model on the test set as to obtain a result that is comparable to the others, further we can observe that most model have validation binary cross entropy that is above the training one meaning we are not overfitting however it possible to improve the model without expecting very good results given the nature of this family of models.

In this case it seems reasonableto pick the most performing model by validation binary cross entropy.

In [19]:
from sklearn.metrics import classification_report
def report_scores(test_label, test_pred):
    print(classification_report(test_label,
                            test_pred,
                            target_names=['0', '1']))

In [20]:
best_idx=0

best_params=pd_results.sort_values(by='bin_cross_ent_val').iloc[best_idx]

best_params

Unnamed: 0,10
num_layers,30.0
learning_rate,0.0001
num_units,1024.0
f1_score_train,0.28943
accuracy_train,0.84777
bin_cross_ent_train,0.378067
f1_score_val,0.28943
accuracy_val,0.845309
bin_cross_ent_val,0.38307


In [22]:
model=create_ff_nn(
    num_layers=int(best_params['num_layers']),
    num_units=int(best_params['num_units']),
    learning_rate=best_params['learning_rate'],
    input_dim=X_train.shape[1]
)

model.fit(
    X_dev,y_dev,
    batch_size=batch_size,
    validation_data=(X_val,Y_val),
    callbacks=[early_stopping]
)

y_pred_prob = model.predict(X_test)  # Get probabilities
y_test_pred = (y_pred_prob > 0.5).astype(int).flatten()  # Convert probabilities to binary labels

report_scores(y_test,y_test_pred)



  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m542/542[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2342s[0m 4s/step - accuracy: 0.8383 - binary_crossentropy: 0.4027 - f1_score: 0.2899 - loss: 0.4027 - val_accuracy: 0.8408 - val_binary_crossentropy: 0.3840 - val_f1_score: 0.2894 - val_loss: 0.3840
Restoring model weights from the end of the best epoch: 1.
[1m1107/1107[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m78s[0m 70ms/step
              precision    recall  f1-score   support

           0       0.86      0.99      0.92     30219
           1       0.52      0.05      0.08      5187

    accuracy                           0.85     35406
   macro avg       0.69      0.52      0.50     35406
weighted avg       0.81      0.85      0.80     35406

