# Ridge Regression
Ridge regression has the following loss function form:
$$ J(\mathbf\theta) = \frac{1}{2}(\mathbf{X\theta} - \mathbf{Y})^T(\mathbf{X\theta} - \mathbf{Y}) + \frac{1}{2}\alpha||\theta||_2^2 $$
$\alpha$is a constant coefficient with no fixed value and needs to be tuned, and $||\theta||_2$ is the **L2 norm**.

The goal is to tune a suitable $\alpha$, thus finding the θ that minimizes $J(\theta)$. scikt-learn use **least square** to accommplish it. 

## Import Data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

data = pd.read_csv('./pp.csv')
data.head()

Unnamed: 0,AT,V,AP,RH,PE
0,8.34,40.77,1010.84,90.01,480.48
1,23.64,58.49,1011.4,74.2,445.75
2,29.74,56.9,1007.15,41.91,438.76
3,19.07,49.69,1007.22,76.79,453.09
4,11.8,40.66,1017.13,97.2,464.43


In [2]:
data.shape

(9568, 5)

> The dataset is used in another project, data introduction can be fount [there](https://github.com/NUIST-elite/blog-code/blob/main/Linear%20Regression%20code/full%20workflow/workflow.ipynb).

## Split Dataset

In [4]:
# features
X = data[['AT', 'V', 'AP', 'RH']]

# outputs
y = data[['PE']]

In [5]:
from sklearn.model_selection import train_test_split

# random_state makes the result reproducible
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

## Use Ridge Regression

To run Ridge regression, we must specify the **hyperparameter α**. We specify one at random (say 1), and later we will talk about how to use **cross-validation** to quickly select the optimal hyperparameter from multiple input hyperparameters α.

In [6]:
from sklearn.linear_model import  Ridge

ridge = Ridge(alpha=1)
ridge.fit(X_train, y_train)

Ridge(alpha=1)

In [7]:
print(ridge.coef_)
print(ridge.intercept_)

[[-1.97373209 -0.2323016   0.06935852 -0.15806479]]
[447.05552892]


The initial model we got is:
$$ PE = 447.05552892 - 1.97373209*AT - 0.2323016*V + 0.06935852*AP - 0.15806479*RH $$
