# Supervised Learning with Linear Regression
## Practical


Aims of this practical:

- Supervised learning approach
- Linear regression as an ML regressor
- Out-of- sample validation
- L1 and L2 regularisation
 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

We know that Lasso regression is more robust to outliers in the data than linear regression:
- Generate a simple data set with one explanatory varaible with outliers and test if this is true

In [None]:
np.random.seed(46)
n_samples = 100
X = np.linspace(0, 1, n_samples)
y = 2*X + 0.7*np.random.randn(n_samples)

# add some outliers to the dataset
outliers_indices = [60, 65, 70, 75, 80, 45, 50, 61]
y[outliers_indices] = y[outliers_indices] + 16

Create a scatterplot with `plt.scatter` to visualise the data and all the outliers

In [None]:
# Your code here
plt.figure(figsize=(12, 12))
plt.scatter(X, y)
plt.show()


Use `LinearRegression` and run `.fit()` to find the optimal parameters for the standard linear regression. 

- Remember to use `X.reshape(-1,1)` because we only have 1 explanatory varaible.

In [None]:
# Your code here
lr = LinearRegression()
lr.fit(X.reshape(-1, 1), y)


Create a scatter plot of the observation in the data and plot the predictions. 
- Use `.predict()` to obtain the predictions for each observation in the data set.
- Remember to use `X.reshape(-1,1)` because we only have 1 explanatory varaible.

In [None]:
# Your code here
plt.figure(figsize=(12, 12))
plt.scatter(X, y)
plt.plot(X, lr.predict(X.reshape(-1, 1)), color='red')
plt.show()


We can see that the standard linear regression model is pulled towards the outliers. 

We can apply Lasso regression with an additional penalty term to improve the fit of this model.
- Use `Lasso` from `sklearn` and run `.fit()` to obtain the oprimal parameters.
- Set the regularisation term $\lambda$ as parameter `aplha=0.1`
- Remember to use `X.reshape(-1,1)` because we only have 1 explanatory varaible.


In [None]:
# Your code here
lasso = Lasso(alpha=0.1)
lasso.fit(X.reshape(-1, 1), y)


Let's visualise the results of both the linear and the lasso regression model predictions on the entire data set.

In [None]:
plt.figure(figsize=(12, 12))
plt.scatter(X, y)
plt.plot(X, lr.predict(X.reshape(-1, 1)), color='red', label='Linear Regression')
plt.plot(X, lasso.predict(X.reshape(-1, 1)), color='green', label='Lasso Regression')
plt.legend()
plt.show()

We can see that Lasso regression predictions are more robust against outliers.

Change the strength of the regularisation term $\lambda$ to `alpha=2`.
- What do you observe when you create the same scatter plot with the new Lasso regression model?

In [None]:
# Your code here
lasso = Lasso(alpha=2)
lasso.fit(X.reshape(-1, 1), y)


In [None]:
plt.figure(figsize=(12, 12))
plt.scatter(X, y)
plt.plot(X, lr.predict(X.reshape(-1, 1)), color='red', label='Linear Regression')
plt.plot(X, lasso.predict(X.reshape(-1, 1)), color='green', label='Lasso Regression')
plt.legend()
plt.show()

The regularisation parameter $\lambda$ controls the strength of the regularisation applied. 

For the next example we'll use data from `sklearn`: California Housing dataset and apply linear , lasso and ridge regression. We will also evaluate these models based on the MSE of the test set.

#### Data information

- Number of observations: 20640
- Target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).
- Number of explanatory variables: 8 numeric, predictive attributes and the target

- Attribute Information:
            - MedInc        median income in block group
            - HouseAge      median house age in block group
            - AveRooms      average number of rooms per household
            - AveBedrms     average number of bedrooms per household
            - Population    block group population
            - AveOccup      average number of household members
            - Latitude      block group latitude
            - Longitude     block group longitude

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

We've provided a copy of the dataset in the `data/` folder, let's load it with pandas:

In [None]:
df = pd.read_csv("data/housing.csv")

You can run `.head()`, `.describe()` and `.info()` to get an overview of the data

In [None]:
df.head()

Create a test(20%) and train(80%) set split
- We drop the target variable `PRICE` from the explanatory varaibles

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df.drop('Price', axis=1), 
                                                    df['Price'], test_size=0.2, random_state=42)

We can view the number of observations per split with `.shape`

In [None]:
print('Training input data Shape:', X_train.shape)
print('Training output data Shape:', y_train.shape)
print('Testing input data Shape:', X_test.shape)
print('Testing output data Shape:', y_test.shape)

Call `LinearRegression`, `Ridge` and `Lasso` and run `fit()` for each model on the train set.
- Use the regularisation term $\lambda$ as parameter `alpha=1.0` for both Ridge and Lasso regression.

In [None]:
# Your code here
lr = LinearRegression()
lr.fit(X_train, y_train)

ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

lasso = Lasso(alpha=1.0)
lasso.fit(X_train, y_train)


We can view the coefficients for the linear, ridge and lasso regression to compare
- We add the column names for the explanatory variables with `.columns`
- We add the model name as the index

In [None]:
pd.DataFrame([lr.coef_, ridge.coef_, 
              lasso.coef_], index=['linear',
                                   'ridge', 'lasso'], columns=df.drop('Price', axis=1).columns)

We can see that lasso regression shrunk `AveRooms`, `AveBedrms`, `AveOccup`, `Latitude` and `Longitude` all the way to 0.

Let's create predictions for the test set for each model and calculate the $R^2$ score and the MSE
- We will use `mean_squared_error` and `r2_score` from `sklearn.metrics`
- Linear and ridge regression has been set up and all that's left is to add lasso regression

In [None]:
y_pred_lr = lr.predict(X_test)
mse_lr = mean_squared_error(y_test, y_pred_lr)
r2_lr = r2_score(y_test, y_pred_lr)

y_pred_ridge = ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
r2_ridge = r2_score(y_test, y_pred_ridge)

# Your code here 
y_pred_lasso = lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
r2_lasso = r2_score(y_test, y_pred_lasso)


Let's print out the results for each model

In [None]:
print("Linear Regression MSE: {:.3f}, R-squared: {:.3f}".format(mse_lr, r2_lr))
print("Ridge Regression MSE: {:.3f}, R-squared: {:.3f}".format(mse_ridge, r2_ridge))
print("Lasso Regression MSE: {:.3f}, R-squared: {:.3f}".format(mse_lasso, r2_lasso))

The Lasso regression with $\lambda=0.1$ did not perform as well on this data set.

There is another regression technique that we can implement in `sklearn`:
- A random forest regression 
- This technique is based on lots of decision trees working together to get a prediction for the test set

In [None]:
from sklearn.ensemble import RandomForestRegressor

We can run the model with the exact same functions as linear regression
- we still use `.fit()` to get the optimal parameters

In [None]:
# Instantiate model
rf = RandomForestRegressor()

# Train the model on training data
rf.fit(X_train, y_train)

We can run the model with the exact same functions as linear regression
- we still use `.predict()` to get the predictions for the test set

In [None]:
# Use the forest's predict method on the test data
y_pred_randomf = rf.predict(X_test)

We can run the model with the exact same functions as linear regression
- we still use `.mean_squared_error()` to obtain the MSE for the test set

In [None]:
mse_randomf = mean_squared_error(y_test, y_pred_randomf)
print(mse_randomf)

The MSE is much lower for the random forest regression model than it is for the linear, ridge and lasso regression models.