# Model Selection Practical Notebook

In this notebook you will demonstrate what you have learnt in the lesson and tackle the following challenges:

1. Write your own "Randomised Selection" code to select the optimal configuration for your linear model.
2. Modify the `SimpleNeuralNetwork()` code in the lesson notebook such that it runs a classificaiton model.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pylab as plt

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris, load_diabetes, load_breast_cancer
from sklearn.linear_model import LinearRegression, Ridge, Lasso

from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [None]:
# Preparing the data

diabetes = load_diabetes()
X = diabetes['data']
y = diabetes['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

**Q1.** Implement your own Randomised Selection function.

Recall that randomised selection is a method to determine the optimal configuration for a linear regression model.

_Hint: you would need a sampling function that gives you potential feature candidates, consider using `np.random.choice`. Remember also that AIC and BIC should only be calculated on the training sample, where you may also wish to calculate MSE on the test sample_

In [None]:
# AIC and BIC function

def AIC(X, y, lm):
    """ Compute the AIC score of a linear model"""
    N = X.shape[0]
    y_pred = lm.predict(X)
    ll = (N/2)*np.log(mean_squared_error(y_pred, y))
    k = len(lm.coef_)
    
    return 2*k + 2*ll

def BIC(X, y, lm):
    """ Compute the BIC score of a linear model"""
    N = X.shape[0]
    y_pred = lm.predict(X)
    ll = (N/2)*np.log(mean_squared_error(y_pred, y))
    k = len(lm.coef_)
    
    return k*np.log(N) + 2*ll

In [None]:
# Your code here ...



**Q2.** Modify the regression code in `SimpleNeuralNetwork` into a classification model.

_Hint: Modify the `model` object defined in the `.fit()` method and also change the loss function from mean squared loss to binary cross entropy loss._

In [None]:
# Prepare binary classfication data

cancer = load_breast_cancer()
X, y = cancer['data'], cancer['target'].reshape(-1, 1)

In [None]:
import torch

# Regression code to modify
class SimpleNeuralNetwork():

    def __init__(self, h1, epoch, verbose=False):
        """
        args:
            h1: number of nodes in hidden layer 1
            epoch: number of gradient updates
        """
        self.epoch = epoch
        self.h1 = h1
        self.verbose = verbose

    def fit(self, X, y):
        """
        args:
            X, y: predictor and target
        """
        n, p = X.shape
        inputs = torch.from_numpy(X).float()
        targets = torch.from_numpy(y).float()

        # Create the neural network model
        model = torch.nn.Sequential(torch.nn.Linear(p, self.h1),
                                    torch.nn.ReLU(),
                                    torch.nn.Linear(self.h1, 1))
        loss_fn = torch.nn.MSELoss()
        opt = torch.optim.SGD(model.parameters(), lr=1e-3)

        for rd in range(self.epoch):
            pred = model(inputs)
            loss = loss_fn(pred, targets)

            if rd%300 == 0:
                print("loss: %.2f" %loss)

            loss.backward()
            opt.step()
            opt.zero_grad()

        self.model = model

    def predict(self, x_test):
        inputs = torch.from_numpy(x_test).float()
        return self.model(inputs).data.numpy()


