## Exercise: Comparing Linear Regression, Polynomial Regression, and Ridge Regression

### Objective:
In this exercise, you will compare the performance of three different regression models on the housing dataset: Linear Regression, Polynomial Regression, and Ridge Regression. You will evaluate each model using several regression metrics and analyze the results.

### Instructions:

1. **Load the Housing Dataset:**
   - Import the necessary libraries and load the housing dataset using `sklearn.datasets`.

2. **Data Splitting:**
   - Split the dataset into training and testing sets using a 50%-50% ratio. You can use `train_test_split` from `sklearn.model_selection`.
   - Ensure that the random state is fixed for reproducibility.

3. **Model Training and Evaluation:**
   - Repeat the following steps 5 times:
     - **Train the Models:**
       - Train a Linear Regression model on the training set.
       - Train a Polynomial Regression model (you may need to use `PolynomialFeatures` from `sklearn.preprocessing`).
       - Train a Ridge Regression model (use `Ridge` from `sklearn.linear_model`).
     - **Make Predictions:**
       - Use the trained models to make predictions on the test set.
     - **Evaluate the Models:**
       - Calculate the Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R²) Score, and Explained Variance Score for each model.
       - Store the results in a structured format (e.g., a list or a DataFrame).

4. **Results Comparison:**
   - After completing the evaluations for all 5 iterations, calculate the average values of the evaluation metrics for each model.
   - Create a summary table comparing the average performance of Linear Regression, Polynomial Regression, and Ridge Regression.

5. **Analysis:**
   - Analyze and discuss the results:
     - Which model performed the best based on the evaluation metrics?
     - How does polynomial regression compare to linear regression?
     - What effect does Ridge regression have on the model performance?


# SOLUTION

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, explained_variance_score

# Load the housing dataset
housing = datasets.fetch_california_housing()  # Using California housing dataset as an example
X = housing.data
y = housing.target

# Initialize a list to store results for each model
results = {
    'Linear Regression': [],
    'Polynomial Regression': [],
    'Ridge Regression': []
}

# Number of iterations for model training and evaluation
n_iterations = 5

# Repeat the process for the specified number of iterations
for i in range(n_iterations):
    # Split the dataset into training and testing sets (50% train, 50% test)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=i)

    # Train Linear Regression model
    linear_model = LinearRegression()
    linear_model.fit(X_train, y_train)
    y_pred_linear = linear_model.predict(X_test)

    # Evaluate Linear Regression model
    results['Linear Regression'].append({
        'MAE': mean_absolute_error(y_test, y_pred_linear),
        'MSE': mean_squared_error(y_test, y_pred_linear),
        'RMSE': np.sqrt(mean_squared_error(y_test, y_pred_linear)),
        'R2': r2_score(y_test, y_pred_linear),
        'Explained Variance': explained_variance_score(y_test, y_pred_linear)
    })

    # Train Polynomial Regression model
    polynomial_features = PolynomialFeatures(degree=2)  # Change degree as needed
    X_train_poly = polynomial_features.fit_transform(X_train)
    X_test_poly = polynomial_features.transform(X_test)

    polynomial_model = LinearRegression()
    polynomial_model.fit(X_train_poly, y_train)
    y_pred_poly = polynomial_model.predict(X_test_poly)

    # Evaluate Polynomial Regression model
    results['Polynomial Regression'].append({
        'MAE': mean_absolute_error(y_test, y_pred_poly),
        'MSE': mean_squared_error(y_test, y_pred_poly),
        'RMSE': np.sqrt(mean_squared_error(y_test, y_pred_poly)),
        'R2': r2_score(y_test, y_pred_poly),
        'Explained Variance': explained_variance_score(y_test, y_pred_poly)
    })

    # Train Ridge Regression model
    ridge_model = Ridge(alpha=1.0)  # You can adjust alpha for regularization strength
    ridge_model.fit(X_train, y_train)
    y_pred_ridge = ridge_model.predict(X_test)

    # Evaluate Ridge Regression model
    results['Ridge Regression'].append({
        'MAE': mean_absolute_error(y_test, y_pred_ridge),
        'MSE': mean_squared_error(y_test, y_pred_ridge),
        'RMSE': np.sqrt(mean_squared_error(y_test, y_pred_ridge)),
        'R2': r2_score(y_test, y_pred_ridge),
        'Explained Variance': explained_variance_score(y_test, y_pred_ridge)
    })

# Convert results to DataFrames for easier analysis
summary = {}
for model in results:
    metrics = pd.DataFrame(results[model])
    summary[model] = metrics.mean().to_dict()

# Convert summary to DataFrame for better visualization
summary_df = pd.DataFrame(summary).T
summary_df


Unnamed: 0,MAE,MSE,RMSE,R2,Explained Variance
Linear Regression,0.532601,0.533678,0.730518,0.599681,0.599705
Polynomial Regression,0.485704,9.187896,2.146816,-5.876317,-5.875145
Ridge Regression,0.53261,0.533626,0.730482,0.59972,0.599745
