In Bayesian statistics, the main idea is to make certain assumptions about the probability distributions of a model's parameters before being fitted on data. These initial distribution assumptions are called priors for the model's parameters.

In a Bayesian ridge regression model, there are two hyperparameters to optimize: α and λ. The α hyperparameter serves the same exact purpose as it does for regular ridge regression; namely, it acts as a scaling factor for the penalty term.

The λ hyperparameter acts as the precision of the model's weights. Basically, the smaller the λ value, the greater the variance between the individual weight values.


In [1]:
# assumption for weights that they follow normal distributions
# alpha and lambda assumes gamma distribution


In [2]:
from sklearn import linear_model
reg = linear_model.BayesianRidge()
import numpy as np
# Features: [number of customers, average temperature in Celsius]
cafe_data = np.array([
    [75, 22],   # 75 customers, 22°C
    [92, 25],   # 92 customers, 25°C
    [65, 20],   # 65 customers, 20°C
    [120, 30],  # 120 customers, 30°C
    [80, 22],   # 80 customers, 22°C
    [98, 28]    # 98 customers, 28°C
])

# Target: Daily earnings in dollars
daily_earnings = np.array([300, 350, 280, 400, 310, 360])


In [4]:
reg.fit(cafe_data, daily_earnings)

print("Coefficients: {}".format(reg.coef_))
print('Intercept: {}\n'.format(reg.intercept_))
print('R2: {}\n'.format(reg.score(cafe_data, daily_earnings)))
print('Alpha: {}\n'.format(reg.alpha_))
print('Lambda: {}\n'.format(reg.lambda_))

Coefficients: [1.95615832 1.58000306]
Intercept: 121.82927328535226

R2: 0.988201157561161

Alpha: 0.03927530560357717

Lambda: 0.22091939262965232



In [5]:
import numpy as np
from sklearn.datasets import fetch_california_housing

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score

# Load California housing data
housing = fetch_california_housing()
X = housing.data   # features
y = housing.target # target variable

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize Lasso regression model with cross-validation
reg = linear_model.BayesianRidge()

# Fit the model to the scaled training data
reg.fit(X_train_scaled, y_train)

# Print the coefficients and intercept
print('Coefficients:', reg.coef_)
print('Intercept:', reg.intercept_)

# Evaluate the model on the scaled training data using R²
r2_train = reg.score(X_train_scaled, y_train)
print('R² on training data:', r2_train)

# Evaluate the model on the scaled test data using R²
r2_test = reg.score(X_test_scaled, y_test)
print('R² on test data:', r2_test)

# Print the best alpha selected by cross-validation
print('Chosen alpha:', reg.alpha_)

Coefficients: [ 0.8542899   0.12267475 -0.29407961  0.33884338 -0.00226554 -0.04083557
 -0.8956702  -0.86856698]
Intercept: 2.071946937378619
R² on training data: 0.6125510012819989
R² on test data: 0.5758339917659985
Chosen alpha: 1.9298152555250736
