# Problem 5.1

In [None]:
'''
You can start using the same code as in HW3. Some things to consider:

- Now we are using the skorch library which allows us to do parameter tunning in the same way as for
other classifiers in sklearn. 
- The code to add L1 regularization is already implemented.


'''

In [None]:
'''
Import packages 
'''

import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from sklearn.metrics import accuracy_score
import torch.nn.functional as F

# Device configuration - If you have CUDA configured, you must use it. Try training with CPU and observe what happens
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


'''
Step - Import Mnist training and testing data sets as we did for HW3. But do not use Dataloader, 
skorch will do that step for us.

Step - Normalize the data so the pixel values are between 0 and 1. 
     - Make sure the shape of the data tensor is not (n_observations, 28, 28), but (n_observations, 784)
     - Also make sure the data is type float and the targets type integer
     - Name the data X_train and X_test and the labels y_train and y_test
     
'''


In [None]:
#Adding L1 regularization

from skorch import NeuralNet

class MyModule(nn.Module):
    
    '''
    
    Step - Fill in the Neural Network here. Use the same arquitecture as in HW3, but now add a new hidden layer.
    and dropout
    Remember to use ReLU activations in the hidden layers. 
    
    
    
    '''
    
    pass

class RegularizedNet(NeuralNet):
    
    ''''''
    
    def __init__(self, *args, lambda1=0.01, **kwargs):
        super().__init__(*args, **kwargs)
        self.lambda1 = lambda1
    
    ''' *** - What is the following method doing? Explain in detail in the main pdf ***'''
    
    def get_loss(self, y_pred, y_true, X=None, training=False):
        loss = super().get_loss(y_pred, y_true, X=X, training=training)
        loss += self.lambda1 * sum([w.abs().sum() for w in self.module_.parameters()])
        return loss

    

In [None]:
'''
Here we define the RegularizedNet. Make sure you use nn.NLLLoss. Thus, you have to use a correct last activation
in the forward method of your network

We can especify different parameters as learning rate (lr), our optimizar (start with standard SGD, in 5.3 we will
try another ones), batch size etc.
To define the arquitecture parameters for MyModule write them as module__<name of your parameter> = ....

Since we have to train it first with L2 regularization lambda1 should be equal to 0
'''
new_net = RegularizedNet(module=MyModule, criterion=torch.nn.NLLLoss, 
                        optimizer=..., lr = 0.001, lambda1 = 0,  module__dropout = ...,
                        optimizer__weight_decay = ...)
    

In [None]:
'''Step - train the network'''

new_net.fit(X_train, y_train)
y_pred_probs = new_net.predict(X_test)
'''
Look how your loss is going down as well as the validation accuracy is increasing 
'''

In [None]:
'''Step - Predict for the test set and print the final accuracy score, your validation accuracy obtained in the previous
cell should be similar to the accuracy in the test set
'''
y_pred = new_net.predict(X_test)
accuracy_score(y_test, y_pred)

In [None]:
'''The idea is that you should get more than 98% of accuracy, so try different parameters as requested in the main pdf
The fit method is already showing you a validation error which can be used to compare between different parameters.

for the final submission leave the best parameters in your RegularizedNet(...)
'''
'''
Instead of doing it manually skorch allows us to use GridSearchCV from sklearn 
'''
from sklearn.model_selection import GridSearchCV

'''
Step - define a grid with some parameters that you consider may give you good results and 
the code will do the rest for you

* Especially take into account the parameters we are asking for to tune:  
Learning rate, regularization parameter, and the number of nodes

'''

grid = {
    'lr': [0.001, 0.1,...],
    'other parameter': [....]
}

'''
Important that you keep refit = True
'''
gs = GridSearchCV(new_net, grid, refit=True, cv=5, scoring='accuracy')


'''
Finally fit
'''
gs.fit(X_train, y_train)

#Report Best Parameters
print(gs.best_score_, gs.best_params_)


# Problem 5.2

In [None]:
'''
Step - Now we are going to train the network with L1 regularization instead of L2 and dropout,
we are going to create a new network with a lambda1 parameter different than 0
- Keep the rest of the parameters you used in the previous network but dropout and L2 parameters are 0 
'''

new_net_l1 = RegularizedNet(module=MyModule, criterion=torch.nn.NLLLoss, 
                        optimizer= ..., lr = 0.001, lambda1 = ...,  module__dropout = 0,
                        optimizer__weight_decay = 0)

In [None]:
#Refer to https://skorch.readthedocs.io/en/stable/user/save_load.html

import pickle

#Transfer Learning - 
#The following code will transfer the weights from L2 trained networks to initialize the new network before L1 training

'''

Notes - I assumed you have trained your L2 network using Skorch's NeuralNetClassifier
        I assume your trained model object is called "new_net"

'''

#Step - 1 - Save weights from L2 network

new_net.save_params(f_params='some-file.pkl') #This comes after net.fit(). You are saving the model weights in a pickle


#Step - 2

new_net_l1.initialize()
new_net_l1.load_params(f_params='some-file.pkl')



In [None]:
'''
Step train the network with the weights transfered from new_net, and perform grid search for the lambda1 parameter
'''

In [None]:
'''
Step train the network with default initialization parameters
we can simply initialize with the same code as before (make sure to use the same parameters)

perform grid search for the lambda1 parameter as in the previous cell
'''

new_net_l1 = RegularizedNet(module=MyModule, criterion=torch.nn.NLLLoss, 
                        optimizer= ..., lr = 0.001, lambda1 = ...,  module__dropout = 0,
                        optimizer__weight_decay = 0)




# Problem 5.3

In [None]:
'''
Keeping all the parameters for which you have got the best results before
try different optimizers.

Basically create the same new_net_l1 or new_net but train it with the requested optimizers in the pdf

GridSeacrh is not required but you can do it if you want for the different parameters of the optimizers

Notice you already train it with SGD in the previous problems
'''



In [None]:
# Adam (look how I defined optimizer)

new_net_l1 = RegularizedNet(module=MyModule, criterion=torch.nn.NLLLoss, 
                        optimizer= torch.optim.Adam, lr = 0.001, lambda1 = ...,  module__dropout = 0,
                        optimizer__weight_decay = 0)

'''
-Step now fit it and print the accuracy as in problem 1
'''

In [None]:
# SGD with momentum 

In [None]:
# AdaGrad