This is a simple notebook to:

1. Generate linear data with some (non-Gaussian) scatter, and do linear fits with different loss functions.

2. Implement some simple flavors of Gradient Descent.

Author: Viviana Acquaviva

License: [BSD-3-clause](https://opensource.org/license/bsd-3-clause/).

In [None]:
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib
import matplotlib.pyplot as plt
import sklearn
from sklearn import metrics
from sklearn.model_selection import train_test_split, cross_validate, cross_val_predict
from sklearn.model_selection import KFold
from sklearn import linear_model #New!

%matplotlib inline

font = {'size'   : 16}
matplotlib.rc('font', **font)
matplotlib.rc('xtick', labelsize=14) 
matplotlib.rc('ytick', labelsize=14) 
matplotlib.rcParams.update({'figure.autolayout': False})
matplotlib.rcParams['figure.dpi'] = 300

#### We begin by generating some data.

In [None]:
np.random.seed(16) #set seed for reproducibility purposes

x = np.arange(100) 

yp = 3*x + 3 + 5*(np.random.poisson(3*x+3,100)-(3*x+3)) #generate some data with scatter following Poisson distribution 
                                                    #with exp value = y from linear model, centered around 0

In [None]:
#Let's take a look!

plt.scatter(x, yp);

#### Here comes the linear regression model ;) 

In [None]:
model = linear_model.LinearRegression()

In [None]:
model

I can fit the model (right now, I will do it using the entire data set just to compare with the analytic solution). When only one predictor is present, I need to reshape it to column form.

In [None]:
x.shape

In [None]:
x.reshape(-1,1).shape

In [None]:
model.fit(x.reshape(-1,1),yp) 

The fitted model has attributes "coef_", "intercept_"

In [None]:
slope, intercept = model.coef_, model.intercept_

In [None]:
print(slope, intercept)

We can plot the original and the fitted line.

In [None]:
plt.figure(figsize = (10,6))
plt.scatter(x,yp, s = 20, c = 'gray', label = 'Data')
plt.plot(x, slope*x + intercept , c ='k', label = 'Ordinary least squares fit')
plt.plot(x, 3*x + 3, c = 'r', label = 'True regression line')
plt.legend(fontsize = 14)
plt.xlabel('X')
plt.ylabel('Y')

What are the analytic predictions for the coefficients?

In [None]:
#Predictions - fill in the analytic formula

theta1 = 

theta0 = 

In [None]:
print('Theta_0, Theta_1:', theta0, theta1)

#### We can (and should!) do cross validation and all the nice things we have learned to do for classification problems.

In [None]:
cv = KFold(n_splits = 5 , shuffle = True , random_state = 10)

In [None]:
scores = cross_validate(model, x.reshape(-1,1), yp, cv = cv, return_train_score = True)

In [None]:
scores

In [None]:
print('{:.3f}'.format(scores['test_score'].mean()), '{:.3f}'.format(scores['test_score'].std()))
print('{:.3f}'.format(scores['train_score'].mean()), '{:.3f}'.format(scores['train_score'].std()))

### Questions: 

- What are the scores that are being printed out? 

- How are the scores? 

- Does it suffer from high variance? High bias? 

- What would happen to the scores if we increased the scatter (noise)?


### <font color='green'> Scoring in regression problems. </font>

### Here is a way to visualize all the available scorers.

In [None]:
print(sorted(sklearn.metrics.SCORERS.keys()))

### Do you recognize some of them?

Let's see if we can find the MSE.

In [None]:
scores = cross_validate(model, x.reshape(-1,1), yp, cv = cv, scoring = 'neg_mean_squared_error', return_train_score = True)

In [None]:
print('{:.3f}'.format(scores['test_score'].mean()), '{:.3f}'.format(scores['test_score'].std()))
print('{:.3f}'.format(scores['train_score'].mean()), '{:.3f}'.format(scores['train_score'].std()))

### Custom scores

We might like to implement a scorer where we care about percentage error instead. Here is how to do a custom scorer:

In [None]:
from sklearn.metrics import make_scorer

In [None]:
def mape(true,pred): #Modified Mean Absolute Percentage Error
    return np.mean(np.abs((true-pred)/(0.5*(true+pred))))

mape_scorer = make_scorer(mape, greater_is_better = False)

In [None]:
scores = cross_validate(model, x.reshape(-1,1), yp, cv = cv, scoring = mape_scorer, return_train_score = True)

In [None]:
scores

In [None]:
print('{:.3f}'.format(scores['test_score'].mean()), '{:.3f}'.format(scores['test_score'].std()))
print('{:.3f}'.format(scores['train_score'].mean()), '{:.3f}'.format(scores['train_score'].std()))

#### Note: as we already discussed, so far we have not changed the loss function (MSE), or the coefficients of the model. We have only looked at different evaluation metrics.

#### <font color = 'green'> Question 1: would the best fit line change if we optimize a different loss function? </font>

#### <font color = 'green'> Question 2: How can we implement that without an analytic solution? </font>

#### We can add some outliers to our data to make them more interesting.

In [None]:
np.random.seed(12) #set 
out = np.random.choice(100,15) #select 15 outliers indexes
yp_wo = np.copy(yp)
np.random.seed(12) #set again
yp_wo[out] = yp_wo[out] + 5*np.random.rand(15)*yp[out]

In [None]:
plt.scatter(x,yp_wo, label = 'Data + outliers')
plt.scatter(x,yp, label = 'Original data')
plt.legend();

We can see the effect for the MSE loss right away:

In [None]:
model.fit(x.reshape(-1,1),yp_wo)

slope, intercept = model.coef_, model.intercept_

print(slope, intercept)

### Let's now implement the simplest form of gradient descent: batch, stochastic, and mini-batch, one by one.

First, we add the bias term (a constant feature of value = 1); this is merely to write the prediction of the linear model in matrix multiplication form.

In [None]:
X = np.c_[np.ones((100, 1)), x] # add x0 = 1 to each instance; this is the bias term

print(X.shape) #shape is number of instances x number of parameters


Then, we save the coefficients and the loss for the normal equation.

In [None]:
theta_ne = np.array([[...],[...]])

In [None]:
loss_ne = np.mean((X.dot(theta_ne) - yp_wo.reshape(-1,1))**2)

In [None]:
loss_ne

### Batch GD

Fill in the gaps!

In [None]:
np.random.seed(10) #same initial conditions for all

eta = 
n_iterations = 
N = 100 #number of points

theta_path_bgd = []

theta = #initialize how you like it!

for iteration in range(n_iterations):
    gradients = 
    theta = 
    theta_path_bgd.append()

theta_path_bgd = np.array(theta_path_bgd) #save the path

theta_bgd = theta #final result

#### Questions

Are you finding the same coefficients and loss?

What is the percentage difference with the loss derived from the normal equation?

#### If you have time, you can also implement stochastic and/or mini-batch GD.