### Comments

**HAS BEEN DONE**
* Jakart vs Cosine vs Euclidian for features
* Logistic regression simple et tuné + filtres
* try different architecture -> testé combinaison de linear et GCNN / APPNP => GCNN avec une linear avant et deux linear apres = best


**DONE 28.12.2019**

* Trasformer some features because skewed OK
* Use genre as feature as well OK 
* Grid search: learning rate, first_layer_size, hidden_size --> comparer en fonction du F1 sur le val set  OK
* See/tune the impact of weight decay OK


**TO DO**

* Skip Connections ? 
* Try to compute a Random classifier, do it 20 times and see what is the F1 score in this case, or is the baseline we have enough
* Recheck for standardization based on the F1 score
* Explore different architectures?



# Machine Learning

In [None]:
import numpy as np
import pandas as pd
import pickle

import sklearn.metrics
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import confusion_matrix, precision_score, precision_recall_fscore_support
from sklearn.utils import resample
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

import random

import dgl.function as fn
from dgl import DGLGraph
import dgl.nn.pytorch as dgl_nn
import dgl.transform as dgl_transform

import time

import torch
import torch.nn as nn
import torch.nn.functional as F

import matplotlib.pyplot as plt

Data_path = 'Data/'

### Loading the similarity matrix and generate graph

In [None]:
# Load similarity matrix
file = open(Data_path + 'Adjacency_matrix_all025.pickle', 'rb')
adj_mat =  pickle.load(file)

# Generate graph
G = DGLGraph(graph_data=adj_mat)
G = dgl_transform.add_self_loop(G) # we are sure of doing this?

### Loading features and labels

In [None]:
# Features
features_df = pd.read_csv(Data_path + 'features.csv',index_col=0).drop(columns = ['title'])

# To consider also genres in the features comment this:
#features_df = features_df[['budget', 'popularity', 'revenue', 'runtime', 'vote_average', 'vote_count']]


# Labels
labels_df = pd.read_csv(Data_path + 'labels.csv',index_col=0).drop(columns = ['title', ])
IMDB_nom = labels_df['Nominations'].copy()
IMDB_nom.loc[IMDB_nom > 0] = 1
# Checking class imbalance
IMDB_nom.value_counts() # 18.263 % of CLASS 1

In [None]:
# If you don't want to transform the features don't run the following section and uncomment this:

#transformed_features = features_df.copy()

### Features engineering

In [None]:
def feat_transform(x):
    trasformed_x = np.zeros(len(x))
    for i in range(len(x)):
        if x[i]>0:
            trasformed_x[i] = np.log(x[i]) # could try also sqrt
        else:
            trasformed_x[i] = x[i]
    return trasformed_x

In [None]:
# Only budget, popularity, revenue and vote_count are very skewed
features_to_transform = features_df[['budget', 'popularity', 'revenue', 'vote_count']]
feat_names = ['budget', 'popularity', 'revenue', 'vote_count']
transformed_feat = features_df.copy()
for name in feat_names:
    transformed_feat[name] = feat_transform(features_to_transform[name])
    

### Generate masks and split train, val, test

In [None]:
sss1 = StratifiedShuffleSplit(n_splits=1, train_size=0.8) #random_state=0
sss2 = StratifiedShuffleSplit(n_splits=1, train_size=0.8) #random_state=0

for prov_index, test_index in sss1.split(transformed_feat, IMDB_nom.values):
    prov_mask = prov_index
    test_mask = test_index

for train_index, val_index in sss2.split(transformed_feat.iloc[prov_mask], IMDB_nom.values[prov_mask]):
    train_mask = train_index
    val_mask = val_index

### Create tensors 

In [None]:
# features
tensor_data = torch.FloatTensor(transformed_feat.values)

# labels
tensor_labels = torch.LongTensor(IMDB_nom.values)

### Standardize 

In [None]:
# Standardizing the data increases performances (73 -> 82), in terms of repartition bit more imbalanced
# but still increase of performance for class 1 samples

# To recheck this in terms of F1 score!!!!

#scaler = StandardScaler()
#transformed_feat.iloc[train_mask] = scaler.fit_transform(transformed_feat.iloc[train_mask].to_numpy())
#transformed_feat.iloc[val_mask] = scaler.transform(transformed_feat.iloc[val_mask].to_numpy())
#transformed_feat.iloc[test_mask] = scaler.transform(transformed_feat.iloc[test_mask].to_numpy())

### Building Logistic Regression

In [None]:
clf = LogisticRegression(C= 1,random_state = 0,solver = 'lbfgs').fit(transformed_feat.iloc[train_mask].to_numpy(),IMDB_nom.values[train_mask])
train_pred = clf.predict(transformed_feat.iloc[train_mask].to_numpy())
test_pred = clf.predict(transformed_feat.iloc[test_mask].to_numpy())

tr_pre,tr_rec,tr_f1,tr_sup = precision_recall_fscore_support(train_pred,IMDB_nom.values[train_mask])
print('Training set:')
print('>>> Precision: {:0.4}'.format(tr_pre[1]))
print('>>> Recall: {:0.4}'.format(tr_rec[1]))
print('>>> F1: {:0.4}'.format(tr_f1[1]))
print('>>> Support: {:}'.format(tr_sup[1]))
print('')

test_pre,test_rec,test_f1,test_sup = precision_recall_fscore_support(test_pred,IMDB_nom.values[test_mask])
print('Test set:')
print('>>> Precision: {:0.4}'.format(test_pre[1]))
print('>>> Recall: {:0.4}'.format(test_rec[1]))
print('>>> F1: {:0.4}'.format(test_f1[1]))
print('>>> Support: {:}'.format(test_sup[1]))

In [None]:
# Confusion Matrix on validation classification
#disp = sklearn.metrics.plot_confusion_matrix(clf, transformed_feat.iloc[val_mask],IMDB_nom.values[val_mask],cmap=plt.cm.Blues,display_labels = ['Not Nominated','Nominated'],normalize='true')


### Building Logistic Regression & Graph Filtering

In [None]:
# Plot confusion matrix when classifier not available only confusion matrix
def confusion_matrix(matrix):
    figure = plt.figure()
    axes = figure.add_subplot(111)
    test = axes.matshow(matrix, cmap = plt.cm.get_cmap('Blues'));
    axes.set_yticklabels(['','Not Nominated','Nominated'],style='italic')
    axes.set_xticklabels(['','Not Nominated','Nominated'],style='italic')
    axes.set_ylabel('True Label')
    axes.set_xlabel('Predicted Label')
    figure.colorbar(test)
    for (j,i),label in np.ndenumerate(matrix):
        axes.text(i,j,np.round(label,3),ha='center',va='center',color = 'grey')
    plt.show()

In [None]:
class LaplacianPolynomial(nn.Module):
    def __init__(self,
                 in_feats: int,
                 out_feats: int,
                 k: int,
                 dropout_prob: float,
                 norm=True):
        super().__init__()
        self._in_feats = in_feats
        self._out_feats = out_feats
        self._k = k
        self._norm = norm
        # Contains the weights learned by the Laplacian polynomial
        self.pol_weights = nn.Parameter(torch.Tensor(self._k + 1))
        # Contains the weights learned by the logistic regression (without bias)
        self.logr_weights = nn.Parameter(torch.Tensor(in_feats, out_feats))
        self.dropout = nn.Dropout(p=dropout_prob)
        self.reset_parameters()

    def reset_parameters(self):
        """Reinitialize learnable parameters."""
        torch.manual_seed(0)
        torch.nn.init.xavier_uniform_(self.logr_weights, gain=0.01)
        torch.nn.init.normal_(self.pol_weights, mean=0.0, std=1e-3)

    def forward(self, graph, feat):
        r"""Compute graph convolution.

        Notes
        -----
        * Input shape: :math:`(N, *, \text{in_feats})` where * means any number of additional
          dimensions, :math:`N` is the number of nodes.
        * Output shape: :math:`(N, *, \text{out_feats})` where all but the last dimension are
          the same shape as the input.

        Parameters
        ----------
        graph (DGLGraph) : The graph.
        feat (torch.Tensor): The input feature

        Returns
        -------
        (torch.Tensor) The output feature
        """
        feat = self.dropout(feat)
        graph = graph.local_var()
        
        # D^(-1/2)
        norm = torch.pow(graph.in_degrees().float().clamp(min=1), -0.5)
        shp = norm.shape + (1,) * (feat.dim() - 1)
        norm = torch.reshape(norm, shp)

        # mult W first to reduce the feature size for aggregation.
        feat = torch.matmul(feat, self.logr_weights) # X*Teta

        result = self.pol_weights[0] * feat.clone() # a0*L^0*X*Teta <-- fisrt polynomial weight a0 * L^0 * x

        for i in range(1, self._k + 1): # get the next polynomial coefficient (a1*L^1, a2*L^2, ..... ak*L^k) 
            old_feat = feat.clone()
            if self._norm:
                feat = feat * norm
            graph.ndata['h'] = feat
            # Feat is not modified in place
            graph.update_all(fn.copy_src(src='h', out='m'),
                             fn.sum(msg='m', out='h')) # update all nodes with msg function copy_src (get data from source node) and reduce function sum
            if self._norm:
                graph.ndata['h'] = graph.ndata['h'] * norm

            feat = old_feat - graph.ndata['h']
            result += self.pol_weights[i] * feat

        return result

    def extra_repr(self):
        """Set the extra representation of the module,
        which will come into effect when printing the model.
        """
        summary = 'in={_in_feats}, out={_out_feats}'
        summary += ', normalization={_norm}'
        return summary.format(**self.__dict__)

In [None]:
def train(model, g, features, labels, train_mask, loss_fcn, optimizer):
    """ 
    DESCRIPTION : Train and update model classification performances with training set
    INPUT:
        |--- model: [] classification model to train
        |--- g: [DGLgraph] DeepGraphLearning graph object
        |--- features: [FloatTensor] 2D tensor containing samples' features
        |--- labels: [LongTensor] 1D tensor containing samples' labels (0-1)
        |--- train_mask: [np.array] indices of training set
        |--- loss_fcn: pytorch loss function chosen for model training
        |--- optimizer: pytorch model optimizer 
    OUTPUT:
        |--- loss: [float] value of loss function for the model at current state
    """
    model.train()  
    
    pred = model(g, features)[train_mask] # prediction
    loss = loss_fcn(pred, labels[train_mask])
    optimizer.zero_grad()    
    loss.backward()

    optimizer.step()
    #_, indices = torch.max(pred, dim=1)
    #correct = torch.sum(indices == labels[train_mask])
    #acc = correct.item() * 1.0 / len(labels[train_mask]) #not the best metric
    
    #C = sklearn.metrics.confusion_matrix(tensor_labels[train_mask], indices.numpy(), labels=[0,1], sample_weight=None, normalize='true')

    #return loss, acc, C
    return loss
    
def evaluate(model, g, features, mask, labels):
    """ 
    DESCRIPTION : Evaluate model classification performance on validation set 
    INPUT:
        |--- model: [] classification model to evaluate
        |--- g: [DGLgraph] DeepGraphLearning graph object
        |--- features: [FloatTensor] 2D tensor containing samples' features
        |--- labels: [LongTensor] 1D tensor containing samples' labels (0-1)
        |--- mask: [np.array] indices of validation set
    OUTPUT:
        |--- acc: [float] classification accuracy
        |--- recall: [float] classification recall
        |--- precision: [float] classification precision
        |--- f1: [float] classification f1 score
    """
    model.eval() 
    
    with torch.no_grad():
        pred = model(g, features)[mask]  # only compute the evaluation set
        labels = labels[mask]
        _, indices = torch.max(pred, dim=1)
        
        pre,rec,f1,sup = precision_recall_fscore_support(labels,indices.numpy())
        #correct = torch.sum(indices == labels)
        #acc = correct.item() * 1.0 / len(labels) #not the best metric
        #f1 = f1_score(labels, indices)
        #recall = recall_score(labels, indices)
        #precision = precision_score(labels, indices)
        
        C = sklearn.metrics.confusion_matrix(labels, indices.numpy())
        
        return pre[1], rec[1], f1[1], sup[1], C

##### Results of optimisation : 

Optimisation performed by looking at validation accuracy and distribution of errors across classes using confusion matrix
- polynomial order : increase of the order tends to increase the instability of performances accross epochs, no strong impact on filter final shape -> tradeoff complexity/stability at 3
- learning rate: small shift towards very unbalanced error, higher learning rate enabled to get a better trade-off between accuracy and distribution of error -> 0.2
- number of epochs : event of strong instabilities across trials whatever parameters; strong instabilities allows better balance of errors but weaker accuracies; around 1500 period of stable learning
- dropout : increase generates instabilities, tradeoff between accuracy and distribution of errors at 0.
=> Final filter is basically always the same sort of shape as shown below

### Tuning of Laplacian Polynomial

In [None]:
n_epochs = 500

# To tune
learning_rate = [1e-2,1e-3,1e-4,1e-5]
pol_order = [2,3,4] 
p_dropout = [0.2,0.3,0.4] 
weight_decay = [0,5e-5,5e-6] # by default = 0


n_classes = 2
in_feats=tensor_data.shape[1]

true_ratio = 1074/4802 # <-- fraction of Nominations
weights_loss = torch.FloatTensor([true_ratio, 1-true_ratio]) # to rebalance

In [None]:
def grid_search_LP(learning_rate, pol_order,p_dropout,weight_decay):
    

    performances = torch.zeros(len(learning_rate),len(pol_order),len(p_dropout),len(weight_decay))
    for l, lr_ in enumerate(learning_rate):
        for p, p_order in enumerate(pol_order):
            for d, dropout in enumerate(p_dropout):
                for w, weight in enumerate(weight_decay):
                        
                    model = LaplacianPolynomial(in_feats, n_classes, p_order, dropout)

                    loss_fcn = torch.nn.CrossEntropyLoss(weight=weights_loss)
                    optimizer = torch.optim.Adam(model.parameters(),lr=lr_, weight_decay = weight)
                    losses_tr = []
                        
                    for epoch in range(n_epochs):
                        loss = train(model, G, tensor_data, tensor_labels, train_mask, loss_fcn, optimizer)
                        losses_tr.append(loss.item())
                        pre, rec, f1, sup, C = evaluate(model, G, tensor_data, val_mask, tensor_labels) 
                        performances[l,p,d,w] = f1
                        if (epoch+1)%200 == 0:
                            print("Epoch {:05d} | Train Loss {:.4f} | Val precision {:.4%} | Val recall {:.4%} | Val F1 {:.4%}". format(epoch+1, loss.item(), pre, rec, f1))
                            

        best_performance = torch.max(performances)
        best_idx = (performances == best_performance).nonzero();
            
        best_lr = learning_rate[best_idx[0,0]]
        best_p_order = pol_order[best_idx[0,1]]
        best_dropout = p_dropout[best_idx[0,2]]
        best_weight = weight_decay[best_idx[0,3]]
                
        results = [best_performance, best_lr,  best_p_order, best_dropout,best_weight]
    return results

In [None]:
results_LP = grid_search_LP(learning_rate, pol_order,p_dropout,weight_decay)
print(results_LP)

In [None]:
def polynomial_graph_filter_response(coeff: np.array, lam: np.ndarray):
    """ 
        DESCRIPTION : Compute response of filtering using a polynomial filter 
        INPUT:
            |--- coeff: [np.array] coeffiicients of polynomial filter
            |--- lam: [np.ndarray] eigenvalues 
        OUTPUT:
            |--- response: [np.ndarray] response[i] is the spectral response at frequency lam[i]
    """
    V = np.vander(lam,coeff.shape[0],increasing=True)
    response = V@coeff
    return response

In [None]:
def spectral_decomposition(laplacian: np.ndarray):
    """ 
        DESCRIPTION : Compute spectral decomposition of a graph using the graph Laplacian
        INPUT:
            |--- laplacian: [np.ndarray] graph laplacian 
        OUTPUT:
            |--- lamb: [np.ndarray] containing graph eigenvalues
            |--- U: [np.ndarray] containing corresponding graph eigenvectors
    """
    # compute the eigenvalues and eigenvectors
    if np.allclose(laplacian, laplacian.T, 1e-12):
        lamb, U = np.linalg.eigh(laplacian)
    else:
        lamb, U = np.linalg.eig(laplacian)
        #sort them
        idx = np.argsort(lamb, axis=0)
        lamb = lamb[idx]
        U = U[:,idx]
    
    return lamb, U

In [None]:
def compute_laplacian(adjacency: np.ndarray, normalize: bool):
    """ 
        DESCRIPTION : Compute spectral decomposition of a graph using the graph Laplacian
        INPUT:
            |--- adjacency: [np.ndarray] adjacency matrix of the graph
            |--- normalize: [bool] if normalize laplacian or not
        OUTPUT:
            |--- L: [n x n ndarray] combinatorial or symmetric normalized Laplacian. of the graph 
    """
    # degrees
    I = np.identity(adjacency.shape[0])
    degree = np.sum(adjacency, axis=1)
    # Compute laplacian
    D = I.copy()
    np.fill_diagonal(D, degree)
    L = D - adjacency
    # normalized if requested 
    if normalize:
        D12 = np.where(D > 0, np.power(D, -0.5, where=D>0), 0)
        L = D12 @ L @ D12
        
    return L

### Building Graph Neural Network

In [None]:
#  model: Combine GraphConv layers first then two fully connected layers --> seems less stable over epochs
class Linear_GNN(nn.Module):
    def __init__(self, in_feats: int, out_feats: int, first_layer_size: int, hidden_size: int):
        super().__init__()
        self._in_feats = in_feats
        self._out_feats = out_feats
        self._first_layer_size = first_layer_size
        self._hidden_size = hidden_size
        
        # Layers --> as much GraphConv as diameter --> reach everywhere
        layer_size = 128
        self.linear = nn.Linear(self._in_feats, self._first_layer_size)
        self.gcn1 = dgl_nn.conv.GraphConv(self._first_layer_size, layer_size, activation=F.relu)
        self.gcn2 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn3 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn4 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn5 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn6 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn7 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn8 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn9 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn10 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.gcn11 = dgl_nn.conv.GraphConv(layer_size, layer_size, activation=F.relu)
        self.linear1 = nn.Linear(layer_size, self._hidden_size)
        self.linear2 = nn.Linear(self._hidden_size, self._out_feats)
        
    def forward(self, graph, feat):
        h = F.relu(self.linear(feat))
        h = self.gcn1(graph, h)
        h = self.gcn2(graph, h)
        h = self.gcn3(graph, h)
        h = self.gcn4(graph, h)
        h = self.gcn5(graph, h)
        h = self.gcn6(graph, h)
        h = self.gcn7(graph, h)
        h = self.gcn8(graph, h)
        h = self.gcn9(graph, h)
        h = self.gcn10(graph, h)
        h = self.gcn11(graph, h)
        h = self.linear1(h)
        h = F.relu(h)
        h = self.linear2(h)
        h = F.log_softmax(h, dim=1)
        return h 

# Model : Only GraphConv layers --> seems more stable
class Pure_GNN(nn.Module):
    def __init__(self, in_feats: int, out_feats: int, hidden_size: int):
        super().__init__()
        self._in_feats = in_feats
        self._out_feats = out_feats
        self._hidden_size = hidden_size
        
        # Layers
        self.gcn1 = dgl_nn.conv.GraphConv(self._in_feats, 32, activation=F.relu)
        self.gcn2 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn3 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn4 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn5 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn6 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn7 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn8 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn9 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn10 = dgl_nn.conv.GraphConv(32, 32, activation=F.relu)
        self.gcn11 = dgl_nn.conv.GraphConv(32,  self._out_feats, activation=None)
        
    def forward(self, graph, feat):
        h = self.gcn1(graph, feat)
        h = self.gcn2(graph, h)
        h = self.gcn3(graph, h)
        h = self.gcn4(graph, h)
        h = self.gcn5(graph, h)
        h = self.gcn6(graph, h)
        h = self.gcn7(graph, h)
        h = self.gcn8(graph, h)
        h = self.gcn9(graph, h)
        h = self.gcn10(graph, h)
        h = self.gcn11(graph, h)
        h = F.log_softmax(h, dim=1)
        return h 

# model : Use and APPNP layer with k=7 (the network diameter) followed by 2 fully connected linears. 
class Simple_APPNP(nn.Module):
    def __init__(self, in_feats: int, out_feats: int, hidden_size: int, k: int):
        super().__init__()
        self._k = k
        self._in_feats = in_feats
        self._out_feats = out_feats
        self._hidden_size = hidden_size
        
        # Layers
        self.appnpconv1 = dgl_nn.conv.APPNPConv(self._k, 0.1, 0) #alpha teleport proba = 0.1 (cf paper)
        self.linear1 = nn.Linear(self._hidden_size, self._hidden_size)
        self.linear2 = nn.Linear(self._hidden_size, self._out_feats)
        
    def forward(self, graph, feat):
        h = self.appnpconv1(graph, feat)
        h = self.linear1(h)
        h = F.relu(h)
        h = self.linear2(h)
        h = F.log_softmax(h, dim=1)
        return h 


In [None]:
# Fixed values
in_feats = tensor_data.shape[1]
out_feats = 2

k = 11 # number of hop (how far to look) usually best to use the network diameter (according to paper)

# Not relevant parameters
n_epochs = 500

# To tune in the grid search
learning_rate = [1e-3,1e-4,1e-5]
first_layer_size = [16,32,64]
hidden_size = [256,512]
weight_decay = [0,5e-5,5e-6] # by default = 0


#lr2 = 2e-5
#lr3 = 8e-6

#p_dropout = 0 # for now not doing it



In [None]:
def grid_search_NN(learning_rate, first_layer_size,hidden_size,weight_decay):
    
    performances = torch.zeros(len(learning_rate),len(first_layer_size),len(hidden_size),len(weight_decay))
    for l, lr_ in enumerate(learning_rate):
        for f, f_layer in enumerate(first_layer_size):
            for h, hidden in enumerate(hidden_size):
                for w, weight in enumerate(weight_decay):
                        
                    model = Linear_GNN(in_feats, out_feats, f_layer, hidden)

                    loss_fcn = torch.nn.CrossEntropyLoss(weight=weights_loss)
                    optimizer = torch.optim.Adam(model.parameters(),lr=lr_, weight_decay=weight)
                    losses_tr = []
                        
                    for epoch in range(n_epochs):
                        loss = train(model, G, tensor_data, tensor_labels, train_mask, loss_fcn, optimizer)
                        losses_tr.append(loss.item())
                        pre, rec, f1, sup, C = evaluate(model, G, tensor_data, val_mask, tensor_labels) 
                        performances[l,f,h,w] = f1
                        if (epoch+1)%200 == 0:
                                print("Epoch {:05d} | Train Loss {:.4f} | Val precision {:.4%} | Val recall {:.4%} | Val F1 {:.4%}". format(epoch+1, loss.item(), pre, rec, f1))
                            

        best_performance = torch.max(performances)
        best_idx = (performances == best_performance).nonzero();
            
        best_lr = learning_rate[best_idx[0,0]]
        best_first_layer = first_layer_size[best_idx[0,1]]
        best_hidden_layer = hidden_size[best_idx[0,2]]
        best_weight = weight_decay[best_idx[0,3]]
                
        results = [best_performance, best_lr, best_first_layer, best_hidden_layer,best_weight]
    return results

In [None]:
results_NN = grid_search_NN(learning_rate, first_layer_size,hidden_size,weight_decay)
print(results_NN)


In [None]:
#print()
#print('Test:')
#pre, rec, f1, sup, C = evaluate(model, G, tensor_data, test_mask, tensor_labels)
#confusion_matrix(C)
#print("Precision {:.4%} | Recall {:.4%} | F1 {:.4%}". format(pre, rec, f1))

### Results of Graph CNN Optimisation : 

- APP Conv : with or without linear layer added generates a linearly decreasing loss and associated linearly decreasing accuracy with no good repartition of error between classes, difficulty to learn
- GCNN without linear layers: more unstable results, no learning 
- GCNN with linear layers: linear layer at the beginning help stabilize and learn, a second linear layer in front doesn't create significant impact |no linear layer at the end no learning, second linear layer at the end reduces learning/less balanced errors, performance around 75%
- Number of CNN layers: the addition of layers helps stabilize the learning accross epochs, when 4/5 layers less stables hence ~66% with more balanced, when 10/11 layers after 200 epochs very stales, errors not balanced at all
- Add dropout, increase instability, when in a max -> strong acc, bad repartition, when in a min, the opposite => removed 
- Addition of a Avgpooling layer: no significant improvement on accuracy or error repartition 
- Hidden layer size for GCNN: if increase layer size increase creates instability but at a certain extend balances the errors in classes, around 60
- Hidden layer size for final linear layer: 30, tradeoff with error-accuracy
- Cross entropy + Soft Max give very unstable results over trials, converge to all 0 or all 1
- NLL loss with log_sofmax gives very unstable results over epochs but reaches learning
- Adding weights to loss function does generate improvement 
- BCE Loss not appropriate
- Number of epochs, no need to go above 250, stabilisation around 200/250

=> if stable , accuracy around 75%, very unbalanced errors
=> if unstable (less layers, dropout ...), accuracy around 60%, more balanced errors 


### Not using it

In [None]:
# PCA without feature selection not significant impact
#pca = PCA(n_components=transformed_feat.shape[1])
#transformed_feat.iloc[train_mask] = pca.fit_transform(transformed_feat.iloc[train_mask].to_numpy())
#transformed_feat.iloc[val_mask] = pca.transform(ftransformed_feat.iloc[val_mask].to_numpy())