# Machine Learning with Python

In [None]:
import matplotlib.pyplot as plt
import numpy as np

## 2.2 Regression

In this notebook we will discuss some examples of supervised learning algorithms applied to regression. 


### Example data

We will use the *diabetes* dataset as an example:


In [None]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
print(diabetes.DESCR)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, random_state=42)

### Linear Model

Linear regression uses Ordinary Least Squares to find the optimal model.


In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Use only one feature
X_train_1 = X_train[:, np.newaxis, 2]
X_test_1 = X_test[:, np.newaxis, 2]

lm_1 = LinearRegression()
lm_1.fit(X_train_1, y_train)

In [None]:
y_pred = lm_1.predict(X_train_1)

print("Coefficients: \n", lm_1.coef_)

plt.scatter(X_train_1, y_train, c='m' )
plt.plot(X_train_1, y_pred, c="k")
plt.title("linear model + training data")
plt.xlabel(diabetes.feature_names[0])
plt.ylabel("target")
plt.show()

We evaluate on the test data:

In [None]:
y_pred = lm_1.predict(X_test_1)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))

plt.scatter(X_test_1, y_test, c='c')
plt.plot(X_test_1, y_pred, c='k')
plt.title("linear model + test data")
plt.xlabel(diabetes.feature_names[0])
plt.ylabel("target")
plt.show()

We can improve the model performance by including all the features: 

In [None]:
lm = LinearRegression()
lm.fit(X_train, y_train)

In [None]:
print("Coefficients: \n", lm.coef_)

In [None]:
y_pred = lm.predict(X_test)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))


### Regularisation

OLS places no restrictions on the values of the coefficients. 

However, to avoid overfitting we may prefer to find a model that still fits the training data well, but has coefficients that are (mostly) small. This idea is called *regularisation*.


In [*ridge regression*](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification), we add an additional L2 penalty on top of the squared error.

In [None]:
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=0.1)
ridge.fit(X_train,y_train)

In [None]:
print("Coefficients: \n", ridge.coef_)

In [None]:
y_pred = ridge.predict(X_test)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))



In [*LASSO regression*](https://scikit-learn.org/stable/modules/linear_model.html#lasso), we use an L1 penalty instead. This tends to produce coefficients that are exactly 0, hence they can be removed from the model. LASSO can therefore be used as an automated form of feature selection.

In [None]:
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.5)
lasso.fit(X_train,y_train)

In [None]:
print("Coefficients: \n", lasso.coef_)

In [None]:
y_pred = lasso.predict(X_test)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))


### Nonlinear regression

Note that many supervised learning algorithms can be used for both classification and regression with only minor adaptations:

### k-Nearest Neighbours

In [None]:
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor()
knn.fit(X_train,y_train)

In [None]:
y_pred = knn.predict(X_test)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))


### Decision tree

In [None]:
from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor(random_state=0)
tree.fit(X_train,y_train)

In [None]:
y_pred = tree.predict(X_test)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))


### Random forest

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=100,random_state=0)
rf.fit(X_train,y_train)

In [None]:
y_pred = rf.predict(X_test)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))



### Neural network

In [None]:
from sklearn.neural_network import MLPRegressor
nn = MLPRegressor(hidden_layer_sizes=(100),max_iter=10000)
nn.fit(X_train,y_train)

In [None]:
y_pred = nn.predict(X_test)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))



### Support Vector Machine

In [None]:
from sklearn.svm import SVR
svr = SVR(kernel='linear')
svr.fit(X_train,y_train)

In [None]:
y_pred = svr.predict(X_test)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))



Performance here is unexpectedly bad - this is no better than using the mean target value for all predictions.

The explanation is that SVM is extremely sensitive to unstandardised data!

In [None]:
np.mean(X_train,axis=0)

In [None]:
np.std(X_train,axis=0)

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
np.mean(X_train_scaled,axis=0)

In [None]:
np.std(X_train_scaled,axis=0)

In [None]:
svr2 = SVR(kernel='linear')
svr2.fit(X_train_scaled,y_train)

In [None]:
y_pred = svr2.predict(X_test_scaled)

# The mean squared error
print("Mean squared error, MSE = %.2f" % mean_squared_error(y_test, y_pred))
# The coefficient of determination: 1 is perfect prediction
print("Coefficient of determination, r2 = %.2f" % r2_score(y_test, y_pred))


## Exercise

Train a regressor of your choice on the `wine_quality_white` dataset.

In [None]:
from sklearn.datasets import fetch_openml
w = fetch_openml(name='wine-quality-white',version=1)

Evaluate your model on the test data.

Does your model do better than Ordinary Least Squares?