# Tutorial session Week 3: Regression
**Lecturer: Dr Maria Deprez**

**Small tutorial groups:**
* **Lindsay:** BEng A-B, BEng C-F
* **Mariana:** BEng G-J, BEng K-L
* **Cher:** BEng M-N, BEng P-T
* **Maria:** BEng V-Z, MSc/MRes/MEng

## Program 
### 10:00-10:15 am: Introduction and Q&A
Given by the lecturer in the main channel **General**
### 10:15-11:45 am: Tutorial
In your small group tutorial channel with your TA/Lecturer
### 11:45 am-12 pm: Q&A
Return to the main channel for final remarks and Q&A about the tutorial

**Note: solutions to exercises 1-4 available as videos on KEATS**

## Content
* **Exercises 1,2**: Multivariate linear regression
* **Exercise 3**: Penalised regression
* **Exercise 4**: Non-linear regression
* **Exercise 5**: Your own kernel ridge regression

## Tuning Ridge regression in sklearn

We will recap how to tune models in `sklearn` using the example of `Ridge` regression

### Load dataset with 86 features

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

def CreateFeaturesTargets(filename):
    
    df = pd.read_csv(filename,header=None)
    
    # convert from 'DataFrame' to numpy array
    data = df.values

    # Features are in columns one to end
    X = data[:,1:]
    
    # Scale features
    X = StandardScaler().fit_transform(X)

    # Labels are in the column zero
    y = data[:,0]

    # return Features and Labels
    return X, y

X,y = CreateFeaturesTargets('datasets/GA-brain-volumes-86-features.csv')

### Un-tuned Ridge model

In [None]:
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score

# choose ridge regression model
model = Ridge()

# fit and evaluate the model
scores = cross_val_score(model,X,y)
print('R2 score: ',round(scores.mean(),2))

### Tune Ridge regression model

In [None]:
# grid for hyperparameter alpha 
parameters = {"alpha": np.logspace(-3,3,7)}
print('parameter grid: ', parameters)

In [None]:
# create ridge model
model = Ridge()

In [None]:
from sklearn.model_selection import GridSearchCV

# perform grid search
grid_search = GridSearchCV(model, parameters)
grid_search.fit(X, y)

# print best score
print('R2 score: ',round(grid_search.best_score_,2))

### Compare original and tuned alpha

In [None]:
print('Original alpha:', model.alpha)
print('Tuned alpha:', grid_search.best_estimator_.alpha)