# Evaluating Linear Models

In this section, we will focus on evaluating linear models using various metrics and techniques. Model evaluation helps us understand how well the model performs on unseen data.

---

## Table of Contents

1. [Why Evaluate Linear Models?](#1-why-evaluate-linear-models)
2. [Train-Test Split](#2-train-test-split)
3. [Evaluation Metrics for Regression](#3-evaluation-metrics-for-regression)
    - Mean Absolute Error (MAE)
    - Mean Squared Error (MSE)
    - Root Mean Squared Error (RMSE)
    - R-squared (R²)
4. [Cross-Validation](#4-cross-validation)
5. [Regularization Impact on Evaluation](#5-regularization-impact)

---

## 1. Why Evaluate Linear Models?

Evaluating linear models is essential for understanding how well a model generalizes to unseen data. Overfitting or underfitting can lead to poor performance in real-world scenarios. Proper evaluation gives insights into the model's predictive power and robustness.

---

## 2. Train-Test Split

Before evaluating the model, it's important to split the dataset into training and testing sets.


In [1]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

In [2]:
X,y = make_regression(n_samples=1000)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=.3,random_state=42)

## 3. Evaluation Metrics for Regression
### 3.1 Mean Absolute Error (MAE)
The Mean Absolute Error is the average of the absolute differences between the actual and predicted values. It gives an idea of how far off the predictions are from the actual values.

In [None]:
from sklearn.metrics import mean_absolute_error

y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)

print("Mean Absolute Error (MAE):", mae)

## 3.2 Mean Squared Error (MSE)
The Mean Squared Error measures the average of the squared differences between actual and predicted values. Squaring the errors gives more weight to larger errors.

In [None]:
from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)

## 3.3 Root Mean Squared Error (RMSE)
The RMSE is the square root of the MSE. It provides an error metric in the same units as the target variable, making it more interpretable.

In [None]:
rmse = mse ** 0.5

print("Root Mean Squared Error (RMSE):", rmse)

## 3.4 R-squared (R²)
R-squared measures the proportion of variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 1 indicates a perfect fit.

In [None]:
from sklearn.metrics import r2_score

r2 = r2_score(y_test, y_pred)

print("R-squared (R²):", r2)

## 4. Cross-Validation
Cross-validation is a technique for assessing model performance by dividing the data into multiple subsets. Each subset is used as a test set once, and the model is trained on the remaining data.

In [None]:
from sklearn.model_selection import cross_val_score

cv_scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')

print("Cross-validation MSE scores:", -cv_scores)
print("Average cross-validation MSE:", -cv_scores.mean())

## 5. Regularization Impact on Evaluation
When evaluating regularized models like Ridge or Lasso, the regularization strength can impact the model's performance.

In [None]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

ridge_pred = ridge_model.predict(X_test)

ridge_mse = mean_squared_error(y_test, ridge_pred)

print("Ridge Regression MSE:", ridge_mse)