# **Linear Regression Model Theory + Scikit-Learn template**


## Theory
Linear regression is a fundamental model in machine learning used for predicting a continuous output variable based on input features. The model function for linear regression is represented as:

$$f_{w,b}(x) = wx + b$$

In this equation, $f_{w,b}(x)$ represents the predicted output, $w$ is the weight parameter, $b$ is the bias parameter, and $x$ is the input feature.

## Model Training

To train a linear regression model, we aim to find the best values for the parameters $(w, b)$ that best fit our dataset.

### Forward Pass

The forward pass is a step where we compute the linear regression output for the input data $X$ using the current weights and biases. It's essentially applying our model to the input data.

### Cost Function

The cost function is used to measure how well our model is performing. It quantifies the difference between the predicted values and the actual values in our dataset. The cost function is defined as:

$$J(w,b) = \frac{1}{2m} \sum_{i=1}^{m}(f_{w,b}(x^{(i)}) - y^{(i)})^2$$

Here, $J(w, b)$ is the cost, $m$ is the number of training examples, $x^{(i)}$ is the input data for the $i$-th example, $y^{(i)}$ is the actual output for the $i$-th example, and $w$ and $b$ are the weight and bias parameters, respectively.

### Backward Pass (Gradient Computation)

The backward pass computes the gradients of the cost function with respect to the weights and biases. These gradients are crucial for updating the model parameters during training. The gradient formulas are as follows:

$$
\frac{\partial J(w,b)}{\partial b} = \frac{1}{m} \sum_{i=0}^{m-1} (f_{w,b}(X^{(i)}) - y^{(i)})
$$

$$
\frac{\partial J(w,b)}{\partial w} = \frac{1}{m} \sum_{i=0}^{m-1} (f_{w,b}(X^{(i)}) - y^{(i)})X^{(i)}
$$

## Training Process

The training process involves iteratively updating the weights and biases to minimize the cost function. This is typically done through an optimization algorithm like gradient descent. The update equations for parameters are:

$$w \leftarrow w - \alpha \frac{\partial J}{\partial w}$$

$$b \leftarrow b - \alpha \frac{\partial J}{\partial b}$$

Here, $\alpha$ represents the learning rate, which controls the step size during parameter updates.

By iteratively performing the forward pass, computing the cost, performing the backward pass, and updating the parameters, the model learns to make better predictions and fit the data.


## **Model Evaluation**

### 1. Mean Squared Error (MSE)

**Formula:**
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_{\text{true}_i} - y_{\text{pred}_i})^2
$$

**Description:**
- **Mean Squared Error (MSE)** is a widely used metric for evaluating the accuracy of regression models.
- It measures the average squared difference between the predicted values ($y_{\text{pred}}$) and the actual target values ($y_{\text{true}}$).
- The squared differences are averaged across all data points in the dataset.

**Interpretation:**
- A lower MSE indicates a better fit of the model to the data, as it means the model's predictions are closer to the actual values.
- MSE is sensitive to outliers because the squared differences magnify the impact of large errors.

### 2. Root Mean Squared Error (RMSE)

**Formula:**
$$
\text{RMSE} = \sqrt{\text{MSE}}
$$

**Description:**
- **Root Mean Squared Error (RMSE)** is a variant of MSE that provides the square root of the average squared difference between predicted and actual values.
- It is often preferred because it is in the same unit as the target variable, making it more interpretable.

**Interpretation:**
- Like MSE, a lower RMSE indicates a better fit of the model to the data.
- RMSE is also sensitive to outliers due to the square root operation.

### 3. R-squared ($R^2$)

**Formula:**
$$
R^2 = 1 - \frac{\text{SSR}}{\text{SST}}
$$

**Description:**
- **R-squared ($R^2$)**, also known as the coefficient of determination, measures the proportion of the variance in the dependent variable ($y_{\text{true}}$) that is predictable from the independent variable(s) ($y_{\text{pred}}$) in a regression model.
- It ranges from 0 to 1, where 0 indicates that the model does not explain any variance, and 1 indicates a perfect fit.

**Interpretation:**
- A higher $R^2$ value suggests that the model explains a larger proportion of the variance in the target variable.
- However, $R^2$ does not provide information about the goodness of individual predictions or whether the model is overfitting or underfitting.

## sklearn template [sckit-kit: LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)

### class sklearn.linear_model.LinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, positive=False)

| **Parameter**      | **Description**                                                                                                  | **Default**     |
|--------------------|------------------------------------------------------------------------------------------------------------------|-----------------|
| `fit_intercept`     | Whether to calculate the intercept for this model. If False, no intercept will be used in the calculations.     | `True`          |
| `normalize`         | This parameter is deprecated in v0.24 and will be removed in 1.2. It used to normalize the data before fitting.   | `deprecated`    |
| `copy_X`            | Whether to copy the input matrix `X`. If False, the model will overwrite `X` for efficiency.                    | `True`          |
| `n_jobs`            | The number of CPU cores to use for computation. `-1` means using all processors.                                 | `None`          |
| `positive`          | If True, it ensures that the coefficients of the linear model are positive.                                      | `False`         |
 
-

| **Attribute**       | **Description**                                                                                                  |
|---------------------|------------------------------------------------------------------------------------------------------------------|
| `coef_`             | The coefficients (weights) of the linear model after fitting.                                                   |
| `intercept_`        | The intercept of the linear model after fitting.                                                                |
| `n_features_in_`    | The number of features in the input data `X`.                                                                   |
| `feature_names_in_` | The names of features seen during fit (if applicable).                                                          |

-

| **Method**          | **Description**                                                                                                  |
|---------------------|------------------------------------------------------------------------------------------------------------------|
| `fit(X, y)`         | Fits the linear model to the data `X` (input) and `y` (output).                                                  |
| `predict(X)`        | Predicts the output using the input data `X` based on the fitted model.                                          |
| `score(X, y)`       | Returns the coefficient of determination (R² score) for the prediction.                                          |
| `get_params()`      | Gets the parameters of the linear regression model.                                                              |
| `set_params(**params)` | Sets the parameters of the linear regression model.                                                            |




# Linear regression - Example

## Data loading

In [8]:
import pandas as pd

test_path = '/home/petar-ubuntu/Learning/ML_Theory/ML_Models/Linear_regression/data/test.csv'
train_path = '/home/petar-ubuntu/Learning/ML_Theory/ML_Models/Linear_regression/data/train.csv'

df_train = pd.read_csv(train_path)
df_test = pd.read_csv(test_path)


##  Data processing

In [9]:
# Droping NA

df_train = df_train.dropna()
df_test = df_test.dropna()

#Set training data and targets
X_train = df_train['x']
X_train = X_train.values.reshape(-1, 1) # has to be reshaped

y_train = df_train['y']

#Set testing data and targets
X_test = df_test['x']
X_test = X_test.values.reshape(-1,1)

y_test = df_test['y']


# Applying scaler

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

## Plotting data

In [None]:
import plotly.express as px

px.scatter(x=df_train['x'], y=df_train['y'],template='gridon')

## Model definition

In [None]:
from sklearn.linear_model import LinearRegression

#Create linear regression instance

model = LinearRegression(fit_intercept=True, 
                         copy_X=True, 
                         n_jobs=None, 
                         positive=False) 

#fit the linear regression model to the training data and labels

model.fit(X_train, y_train) 

## Model evaulation

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

# Assuming y_test is the actual values
predictions = model.predict(X_test)

# Calculate MSE
mse = mean_squared_error(y_test, predictions)

# Calculate RMSE
rmse = mse ** 0.5

# Calculate R²
r2 = r2_score(y_test, predictions)

# Print results
print(f"MSE: {mse}")
print(f"RMSE: {rmse}")
print(f"R²: {r2}")
