# task no. 4

This is the notebook taking care of the task no.4 : feature prediction.

In our analysis we have decided very different models to see how they differ from eachother in term of performance and also their explainability to see if they make sense.

For this notebook we decided to explore the nature of NeuralNetworks on the dataset.

First a preparation is due to be used.

## dataset preparation



In [1]:
import pandas as pd
import os


RACES_PATH=os.path.join("..","dataset","engineered_races.csv")

races_df=pd.read_csv(RACES_PATH)

First we binarize the columns

In [None]:
from sklearn.model_selection import train_test_split

races_df['position']=(races_df['position']>20).astype(int)
split_idx= pd.to_datetime(races_df['date']).dt.year <= 2022
races_df['date']=pd.to_datetime(races_df['date']).astype('int64')

one_hot=pd.get_dummies(races_df.select_dtypes(include=['bool','object']))
numeric=races_df.select_dtypes(include=['number'])
std_numeric= (numeric-numeric.mean())/(numeric.max()-numeric.min()).drop(columns='position')


data_set=pd.concat([one_hot,std_numeric],axis=1)

X_test_set=data_set.loc[~split_idx]

X_dev_set=data_set[split_idx]
Y_dev_set=data_set.loc[split_idx,'position']

X_train_set,X_val_set,Y_train_set,Y_val_set=train_test_split(
    X_dev_set,
    Y_dev_set,
    test_size=0.2,
    stratify=Y_dev_set,
    random_state=42
)

X_train_set.shape[1]

7317

: 

first a stratification can only help the generization capabilities.

Now we have to setup the task, for this kind of setting the binary cross entropy is the most appropriate given we just want to classify stuff and we are not doing any regression whatsoever.

A first test using a simple NN might be usefull in this case to see the most basic algorithm.

In [None]:
import tensorflow as tf

from keras import layers, models, initializers
from keras.optimizers import Adam, SGD
import itertools as it
from keras.callbacks import EarlyStopping
from keras.initializers import GlorotUniform, GlorotNormal,HeNormal,HeUniform
initializer=initializers.HeNormal()


def get_device_auto():
    gpus_list=tf.config.list_physical_devices('GPU')
    device = None
    if len(gpus_list) != 0:
        device=gpus_list[0]
    else:
        device=tf.config.list_physical_devices('CPU')[0]
    return device

def create_ff_nn(
        optimizer=SGD(),
        num_layers=2,
        num_units=64,
        input_dim=256,
        hidden_activation='relu',
        output_activation='sigmoid',
        loss_function='binary_crossentropy',
        metrics=['accuracy','f1-score','binary_crossentropy'],
        learning_rate=0.001
        ):
    model=models.Sequential()

    model.add(layers.Dense(num_units,input_dim=input_dim,activation=hidden_activation))
    for _ in range(num_layers -1):
        model.add(layers.Dense(
            num_units,
            activation=hidden_activation,
            kernel_initializer=HeNormal()
            ))
    model.add(layers.Dense(
        1,
        activation=output_activation,
        kernel_initializer=GlorotNormal()
        ))
    optimizer.learning_rate=learning_rate
    model.compile(
        optimizer=optimizer,
        loss=loss_function,
        metrics=metrics
    )
    return model

def hyperparams_iterator(hyperparams):
    return map(
        lambda comb:  {k:v for k,v in zip(hyperparams.keys(),comb)},
        it.product(*hyperparams.values())
    )

early_stopping=EarlyStopping(
    monitor='val_loss',
    patience=5,
    verbose=1,
    restore_best_weights=True
)
hyperparams={
    'num_layers':[5,10,15,20],
    'learning_rate':[0.001,0.0001],
    'num_units':[1024,2048,4096]
}

device=get_device_auto()
batch_size=1
tf.random.set_seed(42)
best_val=float('-inf')

results=[]

with tf.device(device.device_type):
    for params in hyperparams_iterator(hyperparams):
        model=create_ff_nn(**params,input_dim=X_train_set.shape[1])
        model.fit(
            X_train_set,Y_train_set,
            batch_size=batch_size,
            validation_data=(X_val_set,Y_val_set),
            callbacks=[early_stopping]
            )
        new_row=params
        
        f1_score,accuracy,bin_cross_ent=model.evaluate(X_train_set,Y_train_set,batch_size=batch_size)
        new_row|={
            'f1_score_train':f1_score,
            'accuracy_train':accuracy,
            'bin_cross_ent_train':bin_cross_ent,
            }
        
        
        f1_score,accuracy,bin_cross_ent=model.evaluate(X_val_set,Y_val_set,batch_size=batch_size)
        new_row|={
            'f1_score_val':f1_score,
            'accuracy_val':accuracy,
            'bin_cross_ent_val':bin_cross_ent,
            }
        if bin_cross_ent < best_val:
            best_val = bin_cross_ent
            model.save('weights/best_ff_nn.h5')
        print(new_row)
        results.append(new_row)
pd_results=pd.DataFrame(results)

pd_results.sort_values(by='bin_cross_ent_val')


2024-12-06 00:44:57.991965: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-12-06 00:44:58.159597: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-06 00:44:58.204666: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-06 00:44:58.216370: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-06 00:44:58.308271: I tensorflow/core/platform/cpu_feature_guar

In [None]:
model.summary()