In [1]:
# ordinary least square regression is a good start but, has problems
# like features must to be independent and not many features can be used

# so basically you need to deal with dimensionality problems, where you have many features
# and they are may be correlated or one of them contributes to much.

# key idea: apply some kind of regularization which affects weights in the equation


In [None]:
# 1. Ridge Regression:

# α∣∣w∣∣ + linear model

# alpha parameter is a key:
 

The regularization parameter 
α in Ridge Regression helps balance the tradeoff between fitting the training data well (low bias) and maintaining the model's ability to perform well on new, unseen data (low variance). Here’s how 
α influences this:

High 
α: Increasing 
α enhances the regularization effect, which means that the coefficients are driven to smaller absolute values, closer to zero. This generally decreases variance but increases bias. When 
α is too large, the model becomes too simple and underfits the data, not capturing the underlying patterns effectively.
Low 
α: Decreasing 
α reduces the impact of the regularization, allowing the coefficients more freedom to fit the data more closely. This can lead to lower bias but higher variance, as the model starts to fit noise in the data rather than just the signal. When 
α is too low, it may lead to overfitting, particularly in scenarios with many features.
Bias-Variance Tradeoff
The bias-variance tradeoff is a fundamental concept in machine learning that describes the tension between the error introduced by approximating a real-world problem (bias) and the error from sensitivity to fluctuations in the training set (variance).

Bias: Error due to erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss relevant relations between features and target outputs (underfitting).
Variance: Error from sensitivity to small fluctuations in the training dataset. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
How 
α Manages the Tradeoff
Choosing the right 
α is about finding a sweet spot where both bias and variance are kept to acceptable levels. This choice often requires empirical testing:

Cross-validation: One common method to select 
α is to use cross-validation. Here, different values of 
α are tested, and the one that results in the best performance on a validation set (or through a cross-validation process) is chosen.
Grid Search: Implementing a grid search over a range of 
α values with cross-validation helps in systematically finding the 
α that balances the bias and variance effectively.

In [2]:
import numpy as np
# Features: [number of customers, average temperature in Celsius]
cafe_data = np.array([
    [75, 22],   # 75 customers, 22°C
    [92, 25],   # 92 customers, 25°C
    [65, 20],   # 65 customers, 20°C
    [120, 30],  # 120 customers, 30°C
    [80, 22],   # 80 customers, 22°C
    [98, 28]    # 98 customers, 28°C
])

# Target: Daily earnings in dollars
daily_earnings = np.array([300, 350, 280, 400, 310, 360])

from sklearn import linear_model
reg = linear_model.Ridge(alpha = 0.1)
reg.fit(cafe_data, daily_earnings)



In [3]:
print('Coefficients: {}\n'.format(repr(reg.coef_)))
print('Intercept: {}\n'.format(reg.intercept_))
r2 = reg.score(cafe_data, daily_earnings)
print('R2: {}\n'.format(r2))

Coefficients: array([1.63930537, 3.24201479])

Intercept: 109.09866321531234

R2: 0.9893861949356211



In [4]:
from sklearn import linear_model
alphas = [0.1,0.2,0.3]
reg = linear_model.RidgeCV(alphas=alphas)
reg.fit(cafe_data,daily_earnings)

print('Coefficients: {}\n'.format(repr(reg.coef_)))
print('Intercept: {}\n'.format(reg.intercept_))
print('Chosen alpha: {}\n'.format(reg.alpha_))

Coefficients: array([1.6661642 , 3.10270991])

Intercept: 110.13910312281507

Chosen alpha: 0.3



In [5]:
r2 = reg.score(cafe_data, daily_earnings)
print('R2: {}\n'.format(r2))

R2: 0.9893703894362674



In [8]:
# ok let's use it for previous notebook data
# Case Study California Housing Dataset

import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

housing = fetch_california_housing()
X = housing.data 
y = housing.target 

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

alphas = [0.1,0.2,0.3]
model = linear_model.RidgeCV(alphas=alphas)
model.fit(X_train,y_train)

y_pred = model.predict(X_test)
score = r2_score(y_test, y_pred)
print("R² score on test data:", score)

print('Coefficients: {}\n'.format(repr(model.coef_)))
print('Intercept: {}\n'.format(model.intercept_))
print('Chosen alpha: {}\n'.format(model.alpha_))




R² score on test data: 0.5758079284930553
Coefficients: array([ 4.48625634e-01,  9.72476999e-03, -1.23230439e-01,  7.82625658e-01,
       -2.03315267e-06, -3.52618018e-03, -4.19790853e-01, -4.33699916e-01])

Intercept: -37.02211572946161

Chosen alpha: 0.3



In [9]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import r2_score

# Load California housing data
housing = fetch_california_housing()
X = housing.data
y = housing.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define a more comprehensive range of alpha values to test
alpha_values = np.logspace(-6, 6, 13)  # From very small to very large values

# Create a Ridge regressor
ridge_model = Ridge()

# Setup GridSearchCV to find the best alpha (using 10-fold cross-validation)
grid_search = GridSearchCV(estimator=ridge_model, param_grid={'alpha': alpha_values}, scoring='r2', cv=10)
grid_search.fit(X_train, y_train)

# Best alpha found
best_alpha = grid_search.best_params_['alpha']
print('Best alpha:', best_alpha)

# Use the best alpha to create a final model
best_ridge_model = Ridge(alpha=best_alpha)
best_ridge_model.fit(X_train, y_train)

# Predict on the test data using the model with the best alpha
y_pred = best_ridge_model.predict(X_test)

# Calculate and print the R² score on the test data
score = r2_score(y_test, y_pred)
print("R² score on test data:", score)

# Print the model's coefficients and intercept
print('Coefficients:', best_ridge_model.coef_)
print('Intercept:', best_ridge_model.intercept_)

Best alpha: 10.0
R² score on test data: 0.5764371559180028
Coefficients: [ 4.47068597e-01  9.74130199e-03 -1.20293353e-01  7.66201258e-01
 -1.99123989e-06 -3.52184780e-03 -4.19720067e-01 -4.33421866e-01]
Intercept: -36.98384396377987


In [10]:
# 2. Lasso Regularization: uses L1 Norm for weights penalty form 

# LASSO regularization tends to prefer linear models with fewer parameter values. 
# This means that it will likely zero-out some of the weight coefficients. 
# This reduces the number of features that the model is actually dependent on 
# (since some of the coefficients will now be 0), 
# which can be beneficial when some features are completely irrelevant or duplicates of other features.

In [11]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load California housing data
housing = fetch_california_housing()
X = housing.data   # features
y = housing.target # target variable

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Lasso regression model with alpha = 0.1
reg = Lasso(alpha=0.1)

# Fit the model to the training data
reg.fit(X_train, y_train)

# Print the coefficients and intercept
print('Coefficients:', reg.coef_)
print('Intercept:', reg.intercept_)

# Evaluate the model on the training data using R²
r2_train = reg.score(X_train, y_train)
print('R² on training data:', r2_train)

# Optionally, evaluate the model on the test data using R²
r2_test = reg.score(X_test, y_test)
print('R² on test data:', r2_test)


Coefficients: [ 3.92693362e-01  1.50810624e-02 -0.00000000e+00  0.00000000e+00
  1.64168387e-05 -3.14918929e-03 -1.14291203e-01 -9.93076483e-02]
Intercept: -7.698845419807457
R² on training data: 0.5489153425707493
R² on test data: 0.5318167610318159


In [13]:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score

# Load California housing data
housing = fetch_california_housing()
X = housing.data   # features
y = housing.target # target variable

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize Lasso regression model with cross-validation
reg = LassoCV(cv=10, random_state=42, alphas=np.logspace(-6, 1, 100))

# Fit the model to the scaled training data
reg.fit(X_train_scaled, y_train)

# Print the coefficients and intercept
print('Coefficients:', reg.coef_)
print('Intercept:', reg.intercept_)

# Evaluate the model on the scaled training data using R²
r2_train = reg.score(X_train_scaled, y_train)
print('R² on training data:', r2_train)

# Evaluate the model on the scaled test data using R²
r2_test = reg.score(X_test_scaled, y_test)
print('R² on test data:', r2_test)

# Print the best alpha selected by cross-validation
print('Chosen alpha:', reg.alpha_)


Coefficients: [ 0.85023107  0.12318078 -0.28400419  0.32879452 -0.00132022 -0.04008463
 -0.88812076 -0.86052561]
Intercept: 2.0719469373786215
R² on training data: 0.6125249499944815
R² on test data: 0.5766436726098645
Chosen alpha: 0.0007924828983539169
