# Linear Regression

- `DummyRegressor` helps in creating a baseline for regression.
    - It makes a prediction as specified by the strategy.
    - Strategy is based on some statistical property of the training set or user specified value.
    - Strategy:
        - mean
        - median
        - quantile
        - constant

In [6]:
## Dummy list for calling list

X_train, X_test, y_train, y_test = [], [], [], []

In [None]:
from sklearn.dummy import DummyRegressor

dummy_regr = DummyRegressor(strategy='mean')
dummy_regr.fit(X_train, y_train)
dummy_regr.predict(X_test)
dummy_regr.score(X_test, y_test)

### How is Linear Regression Model Trained?

_Step 1_: Instantiate object of a suitable linear regression estimator from one of the following two options
- Normal Eqution
- Iteraive optimisation

In [4]:
# Normal Equation

from sklearn.linear_model import LinearRegression
linear_regr = LinearRegression()

In [None]:
# Iterative Optimisation

from sklearn.linear_model import SGDRegressor
linear_regr = SGDRegressor()

_Step 2_ : Call fit method on linear regression object with training feature matrix and label vector as arguments.

In [None]:
from sklearn.linear_model import LinearRegression
linear_regr = LinearRegression.fit(X_train, y_train)

## SGDRegressor Estimator

- Implements stochastic gradient descent
- Use for large training set up (> 10k samples)
- Provides greater control on optimization process through provision for hyperparameter settings.
- loss parameters:
    - `loss = 'squared error'`
    - `loss = 'huber'`
- Penalty params:
    - `penalty = l1`
    - `penalty = l2`
    - `penalty = elsaticnet`
- Learning rate:
    - `learning_rate = 'constant`
    - `learning_rate = 'optimal`
    - `learning_rate = 'invscaling`
    - `learning_rate = 'adaptive`
- Stopping:
    - `early_stopping = 'True'`
    - `early_stopping = 'False'`

- It is good to use random_state to seed your choice
    - `random_state = 42`

### How to perform feature scaling for SGDRegressor?

SGD is sensitive to feature scaling, so it is highly recommended to scale input feature matrix.

In [None]:
from sklearn.linear_model import SGDRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

sgd = Pipeline([
    ('feature_scaling', StandardScaler()),
    ('sgr_regressor', SGDRegressor())
])

sgd.fit(X_train, y_train)

### How to shuffle training data after each epoch in SGDRegressor?

In [8]:
from sklearn.linear_model import SGDRegressor
linear_regr = SGDRegressor(shuffle=True)

### How to use set learning rate in SGDRegreesor?


In [None]:
from sklearn.linear_model import SGDRegressor
linear_regr = SGDRegressor(random_state=42)

### How to use set constant learning rate ? 

In [None]:
from sklearn.linear_model import SGDRegressor
linear_regr = SGDRegressor(learning_rate='constant', eta0=1e-2)

### How to use set adaptive learning rate ? 

In [9]:
from sklearn.linear_model import SGDRegressor
linear_regr = SGDRegressor(learning_rate='adaptive', eta0=1e-2)

### How to use set #epochs in SGDRegreesor?

- Set max_iter to desired #epochs. The default value is 1000.
- Remember one epoch is one full pass over the training data.
- SGD converges after observing approximately 10Â° training samples. Thus, a reasonable first guess for the number of iterations for n sampled training set is: 
    - max_iter = np.ceil(10^6/n)

### How to use set stopping criteria in SGDRegreesor?

- Option 1:
    - `tol`, `n_iter_no change, max_iter`
- Option 2:
    - `early_stopping`, `validation_fraction=`

In [10]:
from sklearn.linear_model import SGDRegressor

linear_regr = SGDRegressor(loss='squared_error', max_iter=500, tol= 3e-3, n_iter_no_change=5)

In [11]:
from sklearn.linear_model import SGDRegressor

linear_regr = SGDRegressor(loss='squared_error', early_stopping=True, validation_fraction=0.2, max_iter=500, tol= 3e-3, n_iter_no_change=5)

### How to use different loss functions in SGDRegreesor?

In [12]:
from sklearn.linear_model import SGDRegressor

linear_regr = SGDRegressor(loss='squared_error')

### How to use averaged SGD?

Averaged SGD updates the weight vector to average of weights from previous updates.

- Option #1: Averaging across all updates
- Option #2: Set average to int value.
    - Averaging begins once the total number of samples seen reaches average
    - Setting average=10 starts averaging after seeing 10 samples
    - Averaged SGD works best with a larger number of features and a higher eta0

In [15]:
#option1
from sklearn.linear_model import SGDRegressor
linear_regr = SGDRegressor(average=True)

#option2
from sklearn.linear_model import SGDRegressor
linear_regr = SGDRegressor(average=10)


### How do we initialize SGD with weight vector of the previous run?

set `warm_start=True`

In [18]:
from sklearn. linear_model import SGDRegressor
import numpy as np
linear_regressor = SGDRegressor(warm_start=True)

In [21]:
# sgd_reg = SGDRegressor(
#     max_iter=1,
#     tol= -np.inf,
#     warm_start=True,
#     penalty=None,
#     learning_rate="constant",
#     eta0=0.0005
# )

# val_errors = []
# for epoch in range(1000):
#     sgd_reg.fit(X_train, y_train)
#     val_pred = sgd_reg.predict(X_val)
#     val_errors = mean_squared_error(y_val, val_pred)



## Model Inspection

### Accessing the Weights of a Trained Linear Regression Model

For a linear regression model, the prediction is given by:

$$
\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + \dots + w_m x_m = \mathbf{w}^T \mathbf{x}
$$

---

### Model Weights (Coefficients)

The weights  

$$
w_1, w_2, \dots, w_m
$$

are stored in the **`coef_`** attribute of the trained model.

```python
linear_regressor.coef_


## Model inference

### How to make predictions on new data in Linear Regression model?

- **Step 1:** Arrange data for prediction in a **feature matrix** of shape `(#samples, #features)` or in sparse matrix format.

- **Step 2:** Call the `predict` method on the **linear regression object** with the **feature matrix** as an argument.

```python
# Predict labels for feature matrix X_test
linear_regressor.predict(X_test)

# Model Evaluation

### General Steps in model evaluation

- STEP 1: Split data into train and test
- STEP 2: Fit linear regression estimator on training set.
- STEP 3: Calculate training error (a.k.a. empirical error)
- STEP 4: Calculate test error (a.k.a. generalization error)

In [25]:
from sklearn.model_selection import train_test_split
X = [4,5,4,3,6]
y = [3,4,5,2,4]
y_predicted = [3,2,3,2,4]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

### How to evaluate trained Linear Regression model?
- Using score method on linear regression object:
- The score returns R2 or coefficient of determination
- R2 = 1 - (u/v)
- u (residual sum of square) = (Xw-y)^T (Xw-y)
- v = (y-y^_mean)^T((y-y^_mean))
- Best possible R is 1.0
- A constant model that always predicts the expected value of y, would get a score of 0.0
- The score can be negative because the model can be arbitrarily worse.

In [None]:
linear_regr.score(X_test, y_test)

### Evaluation Metrics

sklearn provides a bunch of regression metrics to evaluate performance of the trained estimator on the evaluation set.

`mean_absolute_error`

In [None]:
from sklearn.metrics import mean_absolute_error
eval_score = mean_absolute_error(y_test, y_predicted)

`mean_squarred_error`

In [None]:
from sklearn.metrics import mean_squared_error
eval_score = mean_squared_error(y_test, y_predicted)

`mean_squarred_log_error`

In [None]:
from sklearn.metrics import mean_squared_log_error
eval_score = mean_squared_log_error(y_test, y_predicted)

`mean_absolute_percentage_error`

In [None]:
from sklearn.metrics import mean_absolute_percentage_error
eval_score = mean_absolute_percentage_error(y_test, y_predicted)

`mean_absolute_error`

In [None]:
from sklearn.metrics import mean_absolute_error
eval_score = mean_absolute_error(y_test, y_predicted)

### How to evaluate regression model on worst case error?

Use `max-error `
Worst case error on train set can be calculated.

In [None]:
from sklearn.metrics import max_error
train_error = max_error(y_train, y_predicted)

You can convert error to score by using neg_ suffix.
`mean_squared_error1` -> `neg_mean_squared_error`

## Model Performance and Data Splits

In case we get **comparable performance on train and test** with this split, is this performance guaranteed on other splits too?

* **Is the test set sufficiently large?**
    * In case it is small, the test error obtained may be **unstable** and would not reflect the true test error on a large test set.
* **What is the chance that the easiest examples were kept aside as test by chance?**
    * If this happens, it would lead to an **optimistic estimation** of the true test error.

We can use **Cross-Validation**

### K-fold

- Uses KFold cross validation iterator, that divides training data into 5 folds.
- In each run, it uses 4 folds for training and 1 for evaluation.

In [None]:
from sklearn.model_selection import cross_val_score

linear_regr = LinearRegression()
score = cross_val_score(linear_regr,X,y, cv=5)

### Leave one out

In [None]:
from sklearn.model_selection import LeaveOneOut

linear_regr = LinearRegression()
loocv = LeaveOneOut()
score = cross_val_score(linear_regr,X,y, cv=loocv)


### shufflesplit

In [None]:
from sklearn.model_selection import ShuffleSplit

shufflesplit = ShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
score = cross_val_score(linear_regr, X,y, cv=shufflesplit)

### How to obtain test scores from different folds?

In [32]:
# from sklearn.model_selection import cross_validate

# cv = ShuffleSplit(n_splits=40, test_size=0.3, random_state=0)
# cv_results = cross_validate(regressor, data, target, cv=cv, scoring="neg_mean_absolute_error")

# Polynomial Regression

- Step 1: Apply polynomial transformation on the feature matrix.
- Step 2: Learn linear regression model (via normal equation or SGD) on the transformed feature matrix.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

poly_mode = Pipeline([
    ('polynomial_transformation', PolynomialFeatures(degree=2, interaction_only=True),
    ('linear_regr', LinearRegression()))
])

poly_mode.fit(X_train, y_train)

### Regularization

How to perform ridge regularization with specific regularization rate?

- Option #1
    - Step 1: Instantiate object of Ridge estimator
    - Step 2: Set parameter alpha to the required regularization rate.
- Option #2
    - Instantiate object of SGDRegressor estimator
    - Set parameter alpha to the required regularization rate
and penalty = 12.

In [None]:
from sklearn.linear_model import Ridge
ridge = Ridge(alpha= 1e-3)

#option 2

from sklearn.linear_model import SGDRegressor
sgd = SGDRegressor(alpha=1e-3, penalty='12')


### How to perform ridge regularization in polynomial regression?

In [None]:
poly_model = Pipeline([
    ('polynomial_transformation', PolynomialFeatures()),
    ('Ridge', Ridge(alpha=1e-3))
])

poly_model.fit(X_train,y_train)

Same for `Lasso`

In [None]:
from sklearn.linear_model import Lasso

poly_model = Pipeline([
    ('polynomial_transformation', PolynomialFeatures()),
    ('Ridge', Lasso(alpha=1e-3))
])

poly_model.fit(X_train,y_train)

Use both `Ridge` and `Lasso`

In [None]:
from sklearn.linear_model import ElasticNet

poly_model = Pipeline([
    ('polynomial_transformation', PolynomialFeatures(degree=2)),
    ('elsatic_net', SGDRegressor(penalty='elasticnet', l1_ratio=0.3))
])

poly_model.fit(X_train,y_train)

### Hyper Parameter Tuning

#### How to set these hyperparameters?

Hyper parameter search consists of
- an estimator (regressor or classifier);
- a parameter space;
- a method for searching or sampling candidates;
- a cross-validation scheme; and
- a score function.

#### Two generic HPT approaches implemented in sklearn are:

- GridSearchCV exhaustively considers all parameter combinations for specified values.

In [None]:
param_grid = [{'C':[1,10,100,100], 'kernal': ['linear']}]