### Regression

A **linear regression** is a linear approximation of a causal relationship between two or more (quantitative) variables.

**Independent variable (predictor) : X1, X2, X3 ….. Xn**

**Dependent variable (predictand) : Y**

$$
y=b_0+b_1\star x_1+b_2\star x_2+...+b_n\star x_n
$$

- y - dependent variable
- x₁, x₂, .… xₙ - independent variables
- b₀ - intercept, the value of y when independent variables are zero
- b₁, b₂, .… bₙ - regression coefficient of independent variables

In [1]:
import math
import numpy as np
import pandas as pd

from sklearn.preprocessing import PolynomialFeatures 
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression, Ridge, Lasso

from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error

In [2]:
# fetch data 
housing = fetch_california_housing()
print(type(housing))
print(type(housing.target))

data = housing.data[:,0:2]
target = housing.target

# create dataframe 
df = pd.DataFrame(data, columns=["feature_0","feature_1"])
df["target"] = target
print(df.shape)
df.head()

<class 'sklearn.utils._bunch.Bunch'>
<class 'numpy.ndarray'>
(20640, 3)


Unnamed: 0,feature_0,feature_1,target
0,8.3252,41.0,4.526
1,8.3014,21.0,3.585
2,7.2574,52.0,3.521
3,5.6431,52.0,3.413
4,3.8462,52.0,3.422


In [13]:
X = df[["feature_0","feature_1"]]
y = df["target"]
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=42)

print(X_train.shape)
print(y_train.shape)


(14448, 2)
(14448,)


**Simple Linear regression**

In [5]:
regression = LinearRegression()
model = regression.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)


**R-squared (Coefficient of Determination)**

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the **coefficient of determination**, or the coefficient of multiple determination for multiple regression.

$$
R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
$$

- $SS_{res}$ is the sum of squares of the residual errors.
- -$SS_{tot}$ is the total sum of squares (proportional to the variance of the data):

$$
SS_{res} = \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

$$
SS_{tot} = \sum_{i=1}^n (y_i - \bar{y})^2
$$

In [6]:
r2_train = r2_score(y_train,y_pred_train)
print(f"R2 for train: {r2_train}")

r2_test = r2_score(y_test,y_pred_test)
print(f"R2 for test: {r2_test}")

R2 for train: 0.5092773054476163
R2 for test: 0.5087405786681392


**RMSE**

In [8]:
rmse_train = math.sqrt(mean_absolute_error(y_train, y_pred_train))
print(f"RMSE for train:{rmse_train}")
      
rmse_test = math.sqrt(mean_absolute_error(y_test, y_pred_test))
print(f"RMSE for test:{rmse_test}")
      

RMSE for train:0.7777053315889283
RMSE for test:0.7746422931023739


In [10]:
# get the intercept 
print(f"Intercept : {model.intercept_}")

# get model coefficent 
print(f"Coeff : {model.coef_}")

Intercept : -0.10299578449607161
Coeff : [0.43176259 0.01743944]


- High R² on training and low on test data: Likely overfitting.
- Low R² on both training and test data: Likely underfitting.
- Low RMSE on training and high on test data: Likely overfitting.
- High RMSE on both training and test data: Likely underfitting.

#### Interaction

- Interaction effect means that two or more features/variables combined have a significantly larger effect on a feature as compared to the sum of the individual variables alone. 

- This effect is important to understand in regression as we try to study the effect of several variables on a single response variable.

$$
\mathbf{y}=\mathbf{\beta}_0+\mathbf{\beta}_1*\mathbf{X}_1+\mathbf{\beta}_2*\mathbf{X}_2+\mathbf{\beta}_3*\mathbf{X}_2*\mathbf{X}_1+\epsilon 
$$

In [11]:
# The degree parameter determines the maximum number of features to create interaction
# We set interaction_only = Ture  to have only the interaction terms, not the polynomial features (those raised to an exponent).

interaction = PolynomialFeatures(degree=3, include_bias=False, interaction_only=True)
features_interaction = interaction.fit_transform(X)

In [18]:
X_train, X_test, y_train, y_test = train_test_split(features_interaction, y , test_size= 0.3, random_state=42)

print(X_train.shape)
print(X_train[0])

(14448, 3)
[  4.1312  35.     144.592 ]


In [19]:
regression = LinearRegression()
model = regression.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

r2_train = r2_score(y_train,y_pred_train)
print(f"R2 for train: {r2_train}")

r2_test = r2_score(y_test,y_pred_test)
print(f"R2 for test: {r2_test}")


R2 for train: 0.5098224693975545
R2 for test: 0.5094987356605216


#### Polynomial linear regression

The general form of a polynomial regression equation is:
$$y=\beta_0+\beta_1x+\beta_2x^2+\beta_3x^3+...+\beta_1x^n$$

Where 
- $n$ represents the highest power of the independent variable in the equation.

take_any_data -> apply PolynomialFeatures -> train 

In [21]:
poly = PolynomialFeatures(degree=3, include_bias=False)
features_polynomial = poly.fit_transform(X)
features_polynomial[0]

array([8.32520000e+00, 4.10000000e+01, 6.93089550e+01, 3.41333200e+02,
       1.68100000e+03, 5.77010912e+02, 2.84166716e+03, 1.39946612e+04,
       6.89210000e+04])

In [23]:
X_train, X_test, y_train, y_test = train_test_split(features_polynomial, y , test_size= 0.3, random_state=42)

print(X_train.shape)
print(X_train[0])


regression = LinearRegression()
model = regression.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

r2_train = r2_score(y_train,y_pred_train)
print(f"R2 for train: {r2_train}")

r2_test = r2_score(y_test,y_pred_test)
print(f"R2 for test: {r2_test}")


(14448, 9)
[4.13120000e+00 3.50000000e+01 1.70668134e+01 1.44592000e+02
 1.22500000e+03 7.05064197e+01 5.97338470e+02 5.06072000e+03
 4.28750000e+04]
R2 for train: 0.5400481580698149
R2 for test: 0.5358938035236791


### Regularization

- Regularization is a technique that penalizes the **coefficient**.
- In an overfit model, the **coefficients are generally inflated**. Thus, Regularization adds **penalties** to the parameters and avoids them weigh heavily.
- The **coefficients** are added to the cost function of the linear equation. Thus, if the coefficient inflates, **the cost function will increase**. And Linear regression model will try to optimize the coefficient in order to minimize the cost function.

**Lasso Regularization (L1) :** 

- Also called **Least Absolute Shrinkage and Selection Operator**

$$
\sum_{i=1}^n(Y_i-\sum_{j=1}^pX_{ij}\beta_j)^2+\lambda\sum_{j=1}^p|\beta_j|
$$

- You have many features, and you suspect that only a few of them are truly relevant to the response variable.
- You want to perform feature selection, i.e., drive some coefficients exactly to zero, thus eliminating them from the model.



In [27]:
# fetch data 
housing = fetch_california_housing()
print(type(housing))
print(type(housing.target))

data = housing.data[:,:]
target = housing.target

# create dataframe 
df = pd.DataFrame(data, columns=["feature_0","feature_1","feature_2",
                                 "feature_3","feature_4","feature_5","feature_6","feature_7"])
df["target"] = target
print(df.shape)

X = df.loc[:,df.columns != "target"]
y = df["target"]
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=42)


regression = Lasso(alpha=0.1)

model = regression.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

r2_train = r2_score(y_train,y_pred_train)
print(f"R2 for train: {r2_train}")

r2_test = r2_score(y_test,y_pred_test)
print(f"R2 for test: {r2_test}")


<class 'sklearn.utils._bunch.Bunch'>
<class 'numpy.ndarray'>
(20640, 9)
R2 for train: 0.5461181785194866
R2 for test: 0.545117728367666


**Ridge regression(L2)**

$$
\sum_{i=1}^n(y_i-\sum_{j=1}^px_{ij}\beta_j)^2+\lambda\sum_{j=1}^p\beta_j^2
$$

- if lambda is zero then you can imagine we get back OLS.

When to use: 

- You have many features, and most of them are believed to have a small effect on the response variable.
- Your goal is to handle multicollinearity by shrinking the coefficients of correlated variables.

- Does not perform feature selection; all coefficients are shrunk towards zero but none are eliminated.

In [30]:
regression = Ridge(alpha=0.1)

model = regression.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

r2_train = r2_score(y_train,y_pred_train)
print(f"R2 for train: {r2_train}")

r2_test = r2_score(y_test,y_pred_test)
print(f"R2 for test: {r2_test}")


R2 for train: 0.6093459719471153
R2 for test: 0.5957750170158791


**Elastic net linear regression**
- Elastic net linear regression uses the penalties from both the lasso and ridge techniques to regularize regression models.

$$
L_{enet}(\hat{\beta})=\frac{\sum_{i=1}^{n}(y_{i}-x_{i}^{\prime}\hat{\beta})^{2}}{2n}+\lambda(\frac{1-\alpha}{2}\sum_{j=1}^{m}\hat{\beta}_{j}^{2}+\alpha\sum_{j=1}^{m}|\hat{\beta}_{j}|)
$$

In [29]:
from sklearn.linear_model import ElasticNet

regression = ElasticNet(alpha=0.1)

model = regression.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

r2_train = r2_score(y_train,y_pred_train)
print(f"R2 for train: {r2_train}")

r2_test = r2_score(y_test,y_pred_test)
print(f"R2 for test: {r2_test}")

R2 for train: 0.5761912179593538
R2 for test: 0.5755216146456115
