<a href="https://colab.research.google.com/github/VidushiSharma31/Machine-Learning/blob/main/1-Regression/1-Linear/feature_scaling_comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression with feature scaling on California Housing Price dataset

This notebook demonstrates how feature scaling affects output of Linear Regression model.

Dataset used: California housing prices from the scikit-learn library.

### 1. Import Libraries

Import the necessary libraries for data manipulation, model building, and evaluation.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

### 2. Load the Dataset

Load the California Housing dataset from scikit-learn.

In [2]:
housing = fetch_california_housing()

### 3. Split the Data

Split the dataset into training and testing sets.

In [3]:
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)

### 4. Train and Evaluate Linear Regression Model

Train a linear regression model on the original data and evaluate its performance using Mean Squared Error (MSE) and R-squared.

In [4]:
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 0.5558915986952422
R-squared: 0.5757877060324524


### 5. Scale the Data

Scale the features using StandardScaler.

In [5]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### 6. Train and Evaluate Scaled Linear Regression Model

Train a linear regression model on the scaled data and evaluate its performance.

In [6]:
scaled_model = LinearRegression()
scaled_model.fit(X_train_scaled, y_train)
scaled_y_pred = scaled_model.predict(X_test_scaled)

### 7. Compare Results

Compare the performance metrics (MSE and R-squared) of the model trained on the original data versus the scaled data.

In [8]:
mse_scaled = mean_squared_error(y_test, scaled_y_pred)
r2_scaled = r2_score(y_test, scaled_y_pred)
print("Scaled Mean Squared Error:", mse_scaled)
print("Scaled R-squared:", r2_scaled)

Scaled Mean Squared Error: 0.5558915986952442
Scaled R-squared: 0.575787706032451


### Why Scaling Didn't Affect Linear Regression

In this specific case, scaling the features using `StandardScaler` did not significantly change the Mean Squared Error (MSE) and R-squared values of the linear regression model. This is because linear regression finds the best-fitting line (or hyperplane) to the data. Scaling the features essentially stretches or compresses the axes but does not alter the fundamental linear relationship between the features and the target variable. The relative positions of the data points remain the same, so the best-fitting line relative to the scaled axes is equivalent to the best-fitting line relative to the original axes, just scaled accordingly.

Scaling is typically more impactful for algorithms that are sensitive to the magnitude of features or use distance metrics, such as gradient descent-based algorithms, Support Vector Machines (SVMs), K-Nearest Neighbors (KNN), and Principal Component Analysis (PCA).