# Jupyter magic commands

Use %lsmagic to see all magic commands.

Use %command to use it.

Use %%command to use it accross the cell

Use ?command to find out what that command does.

# Import libraries

In [None]:
import numpy as np
import matplotlib as plt
%matplotlib inline
import sklearn as sk
import torch
from torch import nn

# Preparing the dataset

## Replace missing data by the mean of the column 

In [None]:
# Method for pandas
empty = data.apply(lambda col: pd.isnull(col)).sum() # just detects which ones are empty. Not necessary
data['Column'].fillna(data['Column'].mode()[0], inplace=True)

In [None]:
# Method for Numpy arrays
# --> I have still to come across it / develop it

## Divide into training and test partitions

In [None]:
from sklearn.model_selection import train_test_split
# X is all data; rows = n_samples and columns = features. Y is the labels associated to them (shape = (n_samples,))
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_state=0)
# We have to divide both observations and data

## Data normalization

In [None]:
from sklearn.preprocessing import StandardScaler

transformer = StandardScaler() # Define the object
transformer.fit(X_train)  # fit does nothing, just learns mean and std from training data
X_train_norm = transformer.transform(X_train)
X_test_norm =  transformer.transform(X_test) 

**IMPORTANT**: both test and training data are normalized with the same mean and std, that of the training data, so they are normalized in the same way.

## Model evaluation
- **Training MSE**: $$MSE_{train} = \frac{1}{N_{train}} \sum_{i=1}^{N_{train}} \left(y^{(i)}-f({\bf x_{train}}^{(i)})\right)^2$$

- **Test MSE**: \begin{align}
MSE_{test} =  \frac{1}{N_{test}}\sum_{i=1}^{N_{test}} \left(y^{(i)}-f({\bf x_{test}}^{(i)})\right)^2
\end{align}

Note that we are interested in evaluating how well our data **generalizes to data we have never seen**. Therefore, **the test database should NEVER be used** at any stage of the training, nor during the selection of the hyperparameters.

## Data cross validation
In data cross validation, the training data is split in training and validation data iteratively, each time changing which subset of data is used for validation. Then, the results are averaged.

Also, this is done over different (hyper)parameters values in order to find their optimal values.

In [None]:
from sklearn.model_selection import GridSearchCV
parameters_dictionary = {'n_neighbors' : np.arange(1,40)}
model = KNNreg() # model in which we want to optimize the hyperparameters
# cv = 10 means that a 10 fold cross validation is performed
# That is, training data is divided in 10 subsets and each is used as validation once, over 10 different trials
cross_val = GridSearchCV(model,parameters,iid=False,cv=10,scoring= 'neg_mean_squared_error')
# Before, we just defined it. With .fit, it iterates over the data
cross_val.fit(X_train,Y_train) # this executes the cross-validation
# The results are the following:
optimal_estimator = cross_val.best_estimator_ # model already trained and optimal parameters
dict_parameters = cross_val.best_params_ # dictionary containing the optimal values of the parameters

## Create a KNN regressor

In [None]:
from sklearn.neighbors import KNeighborsRegressor
neigh = KNeighborsRegressor(n_neighbors = 5)
neigh.fit(X_train,Y_train)
y_test_predicted = neigh.predict(X_test)

# Training a Neural Network


In [None]:
# First, it is needed to define our NN as a class, inheriting from nn.Module
from torch import nn
from torch import optim

class LR(nn.Module):
    def __init__(self,dimx):
        super().__init__() # needed to inherit        
        # Define the nn.Parameters, i.e. the values to be optimized
        self.weights = nn.Parameter(torch.randn(dimx,1),requires_grad = True)        
        self.bias = nn.Parameter(torch.randn(1,1),requires_grad = True)
        
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):  # define the forward function. In this example, a simple sigmoid
        # Pass the input tensor through each of our operations
        p = self.sigmoid(torch.matmul(x,self.weights)+self.bias)
        return p
    
my_classifier = LR(x.shape[1]) # Instantiate the NN. 
# Remember thn the __init__ method requires dimx, the dimension (number of features) of the data
criterion = nn.BCELoss() # define a binary cross entropy as loss function
output = my_classifier.forward(torch.tensor(x))
loss = criterion(output,torch.tensor(y))  # it is a scalar value
# But contains the information to compute the gradients of the operations and parameters that led to such value
loss.backward() # Through the backward operator, the gradient for each parameter is computed and stored in x.grad
# If we were to perform .backward again, gradients get added (not overwritten)
# So it is necessary to set them to zero before using it again:
my_classifier.zero_grad()

# Now, it would be needed to iterate and optimize with respect to the gradient
# That has to be done with the specific optimizer library we are using, with a .step() function or sth similar
# Of course, it has to be iterated in a for loop