# Linear Regression
Using any programming language, implement linear least square regression with the
penalty term 0.5λwT w. Do not use any library such as scikit-learn that already has linear regression or cross validation implemented. Implementing linear regression and cross validation from scratch will be a good exercise
to make sure that you fully understand those algorithms. Feel free to use general libraries for array and matrix
operations such as numpy. Feel free to verify the correctness of your implementation with existing libraries such
as scikit-learn.

Download the dataset posted on the course web page. The output space is continuous (i.e., y ∈ <). Determine the best λ by 10-fold cross validation.

Find the best λ by 10-fold cross validation. Draw a graph that shows the cross validation accuracy as λ increases from 0 to 4 in increments of 0.1.

Dataset for linear regression: **regression-dataset.zip**
- Problem: this data corresponds to samples from a 2D surface that you can plot to visualize how linear regression is working.
- Format: there is one row per data instance and one column per attribute. The targets are real values.
- The training set is already divided into 10 subsets for 10-fold cross validation.

In [122]:
import numpy as np
import scipy as sp
import pandas as pd
import sklearn as skl
import random

In [2]:
! pip show numpy scipy sklearn pandas | grep -E 'Name|Version'

Name: numpy
Version: 1.19.5
Name: scipy
Version: 1.4.1
Name: sklearn
Version: 0.0
Name: pandas
Version: 1.1.5


## Utils

In [3]:
def read_csv(file_name_prefix, partition=''):
    return pd.read_csv(f'{file_name_prefix}{partition}.csv', header=None).values

def read_train_data(num_partitions):
    train_data = [read_csv('trainInput', i + 1) for i in range(num_partitions)]
    train_labels = [read_csv('trainTarget', i + 1)[:, 0] for i in range(num_partitions)]
    return train_data, train_labels

def read_test_data():
    test_data = read_csv('testInput')
    test_labels = read_csv('testTarget')[:, 0]
    return test_data, test_labels

def split_train_validation(x, y, fold, folds):
    """
    Splits input data into train and validation sets.
    Used in k-fold cross validation.

    For cross validation, when spliting train x validation, consider using stratified sampling:
    https://danilzherebtsov.medium.com/continuous-data-stratification-c121fc91964b
    """
    n = x.shape[0]
    fold_size = n // folds
    fold_start = fold * fold_size
    fold_end = (fold + 1) * fold_size
    xtr = np.concatenate([x[:fold_start], x[fold_end:]])
    ytr = np.concatenate([y[:fold_start], y[fold_end:]])
    xvl, yvl = x[fold_start:fold_end], y[fold_start:fold_end]
    return xtr, ytr, xvl, yvl

def norm(v):
    norm_v = np.linalg.norm(v, axis=1)[:, np.newaxis]
    norm_v[norm_v == 0] = 1
    return v / norm_v    

def center(v):
    """
    Centers variable. (subtracts the mean)
    The mean of the variable becomes 0 (centered)
    """
    return v - v.mean(axis=0)

def standardize(v):
    """
    Subtracts mean and divides by standard deviation. (centers and scales)
    - The convention that you standardize predictions primarily exists so that the units of the regression coefficients are the same.
    - Centering/scaling does not affect your statistical inference in regression models.
    - The estimates are adjusted appropriately and the 𝑝-values will be the same.
    https://stats.stackexchange.com/a/29783
    """
    return center(v) / v.std(axis=0)

def scale(v, factor):
    """
    This is not really a useful method... just to illustrate what scaling is...
    """
    return v / factor

def prepend_col(matrix, value):
    r, c = matrix.shape
    new_matrix = np.zeros([r, c + 1])
    new_matrix[:, 0] = value
    new_matrix[:, 1:] = matrix
    return new_matrix

## Stats Util

In [4]:
def mean_squared_error(y, y_pred):
    mse = np.average((y - y_pred) ** 2)
    return mse


def r2_score(y, y_pred):
    """
    "R squared" provides a measure of how well observed outcomes are replicated by the model, 
    based on the proportion of total variation of outcomes explained by the model.
    """
    numerator = ((y - y_pred) ** 2).sum(axis=0)
    denominator = ((y - np.average(y, axis=0)) ** 2).sum(axis=0)
    if numerator == 0:
        return 1
    if denominator == 0:
        return 0
    return 1 - numerator / denominator

# Plotting

In [5]:
import plotly.express as px
import plotly.graph_objects as pgo

import seaborn as sns

import matplotlib.pyplot as plt
# Axes3D import has side effects, it enables using projection='3d' in add_subplot
from mpl_toolkits.mplot3d import Axes3D

### plotly

In [6]:
! pip show plotly matplotlib seaborn | grep -E 'Name|Version'

Name: plotly
Version: 4.4.1
Name: matplotlib
Version: 3.2.2
Name: seaborn
Version: 0.11.1


In [103]:
def plot_3d_mesh(x, y, z, labels=[], figsize=(500, 500), title='Graph', opacity=.5, color='lightpink', **kwargs):
    fig = pgo.Figure(data=[pgo.Mesh3d(x=x, y=y, z=z, color=color, opacity=opacity)])
    fig.update_layout(
        title=title, 
        autosize=True,
        width=figsize[0], 
        height=figsize[1],
    )
    fig.show()

def plot_3d_linear_regression(X, y, lr_list, labels=[], figsize=(500, 500), title='Linear Regression', opacity=.2, colorscales=['blues'], plot_test=True, **kwargs):
    mesh_size = .02
    margin = 0

    # Create a mesh grid on which we will run our model
    x1 = X[:, 1]
    x2 = X[:, 2]
    x1_min, x1_max = x1.min() - margin, x1.max() + margin
    x2_min, x2_max = x2.min() - margin, x2.max() + margin
    x1_range = np.arange(x1_min, x1_max, mesh_size)
    x2_range = np.arange(x2_min, x2_max, mesh_size)
    xx1, xx2 = np.meshgrid(x1_range, x2_range)

    # Generate the plot
    fig = pgo.Figure()
    if plot_test:
        fig.add_trace(pgo.Scatter3d(x=x1, y=x2, z=y, name='Train', mode='markers', marker=dict(size=5, color='blue')))
        fig.add_trace(pgo.Mesh3d(x=x1, y=x2, z=y, opacity=opacity, color='blue'))
    for i, lr in enumerate(lr_list):
        # Run model
        xx_ = prepend_col(np.c_[xx1.ravel(), xx2.ravel()], 1)
        pred = lr.predict(xx_)
        pred = pred.reshape(xx1.shape)
        # plot model
        label = labels[i] if i < len(labels) else f'LR {i}'
        colorscale = colorscales[i] if i < len(colorscales) else 'sunset'
        fig.add_traces(pgo.Surface(x=x1_range, y=x2_range, z=pred, name=labels[i], opacity=opacity + .5, colorscale=colorscale, showscale=False))
    fig.update_layout(title=title, autosize=True, width=figsize[0], height=figsize[1], showlegend=False)
    fig.show()

def plot_3d_scatter(x, y, z, labels=[], figsize=(500, 500), title=''):
    fig = pgo.Figure()
    for i in range(len(x)):
        # mode = [markers, lines, lines+markers]
        fig.add_trace(pgo.Scatter3d(x=x[i], y=y[i], z=z[i], name=labels[i], mode='markers', marker=dict(size=5)))
    fig.update_layout(title=title, autosize=True, width=figsize[0], height=figsize[1],)
    fig.show()

def plot_ridge_accuracy(alphas, scores, best_alpha, lr_score=None):
    ridge_scores_fig = pgo.Figure()
    ridge_scores_fig.add_trace(pgo.Scatter(x=alphas, y=scores, mode='lines', name='Ridge Accuracy'))
    ridge_scores_fig.add_trace(pgo.Scatter(x=[best_alpha], y=[scores.max()], mode='markers', name='Best Alpha'))
    if lr_score:
        ridge_scores_fig.add_trace(pgo.Scatter(x=[best_alpha], y=[lr_score], mode='markers', name='Ordinary LR'))
    ridge_scores_fig.update_layout(title='Ridge Regression Accuracy', autosize=True, width=500, height=500,)
    ridge_scores_fig.update_xaxes(title_text='Alpha')
    ridge_scores_fig.update_yaxes(title_text='Accuracy (R square)')
    ridge_scores_fig.show()

### matplotlib based plotting...

In [8]:
def mplot_3d_surface(x, y, z, labels=[], figsize=(10, 10), **kwargs):
    fig = plt.figure(figsize=figsize)
    ax = fig.add_subplot(111, projection='3d')

    # X, Y = np.meshgrid(x, y)
    # Z = z.reshape(X.shape)
    # ax.plot_surface(X, Y, Z)

    ax.plot_trisurf(x, y, z, linewidth=0, antialiased=False, **kwargs)

    ax.set_xlabel(labels[0] if len(labels) > 0 else 'X')
    ax.set_ylabel(labels[1] if len(labels) > 1 else 'Y')
    ax.set_zlabel(labels[2] if len(labels) > 2 else 'Z')

    plt.show()

def mplot_3d_scatter(x, y, z, labels=[], figsize=(10, 10)):
    fig = plt.figure(figsize=figsize)
    ax = Axes3D(fig)
    for i in range(len(x)):
        ax.scatter(x[i], y[i], z[i], label=labels[i])
    plt.legend(loc='best')
    plt.show()

x_edges = np.array([[-.5, -.5], [-.5, .5], [.5, -.5], [.5, .5]])

def mplot_3d_linear_regression(x, y, lr_list, x_edges, labels, figsize=(10, 10), **kwargs):
    xp = np.concatenate([x, x_edges])

    fig = plt.figure(figsize=figsize)
    ax = Axes3D(fig)
    ax.set_zlim(-2, 2)
    # ax.scatter(xtr[:, 0], xtr[:, 1], ytr, color='black', label='Train')
    ax.plot_trisurf(x[:, 0], x[:, 1], y, linewidth=0, antialiased=False, alpha=.4, **kwargs)
    # ax.scatter(x[:, 0], x[:, 1], y, color='green', label='Test')
    for i, lr in enumerate(lr_list):
        y_predp = lr.predict(xp)
        # ax.scatter(xp[:, 0], xp[:, 1], y_predp, label=labels[i])
        surf = ax.plot_trisurf(xp[:, 0], xp[:, 1], y_predp, linewidth=1, antialiased=True, alpha=0.4, label=labels[i])
        surf._facecolors2d=surf._facecolors3d
        surf._edgecolors2d=surf._edgecolors3d
    plt.legend(loc='best')
    plt.show()

## Data

In [81]:
xtrp, ytrp = read_train_data(num_partitions=10)
xtr, ytr = np.concatenate(xtrp), np.concatenate(ytrp)
xte, yte = read_test_data()

# adds a 1 as first element of each xi to accommodate for w0
xtr_, xte_ = x = prepend_col(xtr, 1), prepend_col(xte, 1)

In [10]:
# Standaradized (z-score)
xtrsp = [standardize(x) for x in xtrp]
xtrs = np.concatenate(xtrsp)
xtes = standardize(xte)

In [11]:
# Centered
xtrcp = [center(x) for x in xtrp]
xtrc = np.concatenate(xtrcp)
xtec = center(xte)

In [12]:
xtr.shape, ytr.shape, xte.shape, yte.shape

((200, 2), (200,), (100, 2), (100,))

In [13]:
xtr[:2, :]

array([[-0.243905,  0.00633 ],
       [-0.308713, -0.043968]])

In [14]:
ytr[:2]

array([-0.63708 , -1.370977])

In [15]:
sp.stats.describe(xtr), sp.stats.describe(ytr), sp.stats.describe(xtrs)

(DescribeResult(nobs=200, minmax=(array([-0.4918  , -0.498359]), array([0.498653, 0.499338])), mean=array([ 0.03404111, -0.01626617]), variance=array([0.08983553, 0.0814516 ]), skewness=array([-0.10501448,  0.06588417]), kurtosis=array([-1.32532621, -1.15687566])),
 DescribeResult(nobs=200, minmax=(-2.8718060000000003, 2.79705), mean=0.13663843499999997, variance=1.538453799337664, skewness=-0.26844152860305714, kurtosis=-0.6982110009273508),
 DescribeResult(nobs=200, minmax=(array([-1.95163294, -2.0511674 ]), array([1.79153929, 1.94057495])), mean=array([-8.8817842e-18,  4.4408921e-18]), variance=array([1.00502513, 1.00502513]), skewness=array([-0.09764032,  0.04835513]), kurtosis=array([-1.22763991, -1.05712457])))

In [16]:
sp.stats.describe(xte), sp.stats.describe(yte), sp.stats.describe(xtes)

(DescribeResult(nobs=100, minmax=(array([-0.493968, -0.49695 ]), array([0.475156, 0.492752])), mean=array([-0.0239291 , -0.02267433]), variance=array([0.08565048, 0.07263082]), skewness=array([ 0.19348207, -0.01646564]), kurtosis=array([-1.16935103, -1.10594633])),
 DescribeResult(nobs=100, minmax=(-2.412807, 2.433282), mean=-0.15687814000000003, variance=1.5712673881122632, skewness=0.18538635906677978, kurtosis=-1.1092645483068613),
 DescribeResult(nobs=100, minmax=(array([-1.61417756, -1.7686943 ]), array([1.71392617, 1.92215555])), mean=array([-1.55431223e-17,  3.33066907e-17]), variance=array([1.01010101, 1.01010101]), skewness=array([ 0.19348207, -0.01646564]), kurtosis=array([-1.16935103, -1.10594633])))

In [None]:
plot_3d_scatter(x=[xtr[:, 0], xte[:, 0]], y=[xtr[:, 1], xte[:, 1]], z=[ytr, yte], labels=['train', 'test'], figsize=(500, 500))

In [None]:
plot_3d_mesh(xtr[:, 0], xtr[:, 1], ytr, figsize=(500, 500), opacity=.3)

In [None]:
plot_3d_mesh(xte[:, 0], xte[:, 1], yte, opacity=.7)

# My implementation

In [123]:
class LinearRegression:
    """
    Ordinary Linear Regression (RidgeRegression for alpha = 0)
    """
    def __init__(self):
        self.w = None

    def fit(self, X, y):
        """
        Calculates closed form solution of ordinary LR
        :param X: N x p + 1 array of observations where the first predictor is set to 1 to accommodate for w0 
        :param y: Array with N labels 
        :return: Closed form solution of ordinary LR
        """
        w = np.linalg.inv(X.T @ X) @ X.T @ y
        self.w = w
        return w

    def predict(self, X):
        """
        :param X: N x p + 1 array of observations where the first predictor is set to 1 to accommodate for w0 
        :return: 
        """
        assert self.w is not None, 'Please train the model with fit(X, y) before making predictions.'
        predictions = self.w @ X.T
        return predictions

    def score(self, X, y):
        """
        :param X: N x p + 1 array of observations where the first predictor is set to 1 to accommodate for w0 
        :param y: Array with N labels 
        :return: 
        """
        predictions = self.predict(X)
        return r2_score(y, predictions)

    def error(self, X, y):
        """
        :param X: N x p + 1 array of observations where the first predictor is set to 1 to accommodate for w0 
        :param y: Array with N labels 
        :return: 
        """
        predictions = self.predict(X)
        return mean_squared_error(y, predictions)



class RidgeRegression:
    def __init__(self, algorithm='lu_factorization'):
        assert algorithm in ['inverse_matrix', 'lu_factorization']
        self.algorithm = algorithm
        self.w = None
        self.alpha = None
        self.alpha_scores = None

    def fit(self, X, y, alphas, folds=10):
        """
        Uses cross validation to find best alpha in alphas.
        Uses r squared score.
        :param X: N x p + 1 array of observations where the first predictor is set to 1 to accommodate for w0
        :param y: Array with N labels
        :param alphas:
        :param folds:
        :return:
        """
        alpha_scores = np.zeros(alphas.shape[0])

        for fold in range(folds):
            xtr, ytr, xvl, yvl = split_train_validation(X, y, fold, folds)
            for i_alpha in range(alphas.shape[0]):
                self.w = self.__fit(xtr, ytr, alphas[i_alpha])
                score = self.score(xvl, yvl)
                alpha_scores[i_alpha] += score

        self.alpha_scores = alpha_scores / folds
        self.alpha = alphas[alpha_scores.argmax()]
        self.w = self.__fit(X, y, self.alpha)

    def __fit(self, X, y, alpha):
        """
        Closed form solution.
        w = inv(X.T X + alpha * I) * X.T y
        Uses inverse matrix or solves system of linear eqs.

        linalg.solve: scipy/linalg/basic.py
        LU decomposition and solve: scipy/linalg/decomp_lu.py
        TODO: implement LU decomposition
        TODO: implement Gauss Jordan
        """
        # covariance matrix
        A = X.T @ X
        I = np.identity(X.T.shape[0])
        Xy = X.T @ y
        if self.algorithm == 'inverse_matrix':
            w = np.linalg.inv(A + alpha * I) @ Xy
        elif self.algorithm == 'lu_factorization':
            # Scikit-learn uses this
            # This is supposed to be faster and more numericaly stable than inv. matrix.
            # Uses LU factorization.
            w = np.linalg.solve(A + alpha * I, Xy).T
        return w

    def predict(self, X):
        assert self.w is not None, 'Please train the model with fit(X, y) before making predictions.'
        predictions = self.w @ X.T
        return predictions

    def score(self, X, y):
        predictions = self.predict(X)
        return r2_score(y, predictions)

    def error(self, X, y):
        predictions = self.predict(X)
        return mean_squared_error(y, predictions)


# How Well Does the Model Fit the data?
To evaluate the overall fit of a linear model, we use the **R-squared** value

- R-squared is the proportion of variance explained
    - It is the proportion of variance in the observed data that is explained by the model, or the reduction in error over the null model
    - The null model just predicts the mean of the observed response, and thus it has an intercept and no slope (`y_null = mean(ytr)`)
- R-squared is between 0 and 1
    - Higher values are better because it means that more variance is explained by the model.

# Cross validation data split.

Stratification by categorical column is easy using:
- sklearn.model_selection.train_test_split(stratify = data[‘variable’])
- slkearn.model_selection.StratifiedKfold
- sklearn.model_selection.KFold

But, for linear regression, we're stratifying a continous variable...
- The solution is simple, break the data in bins (value range) and use them as the classes for categorical stratification.
-  https://danilzherebtsov.medium.com/continuous-data-stratification-c121fc91964b


# Ordinary Linear Regression - Testing My implementation

In [133]:
lr = LinearRegression()

In [134]:
%timeit lr.fit(xtr_, ytr)

The slowest run took 14.78 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 30 µs per loop


In [135]:
%timeit lr.predict(xte_)

The slowest run took 34.39 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 1.96 µs per loop


In [136]:
lr.w

array([0.14667096, 0.44839178, 1.5551459 ])

In [87]:
y_pred_lr = lr.predict(xte_)

In [88]:
mse_lr = mean_squared_error(yte, y_pred_lr)

In [89]:
mse_lr

1.4357709782337664

In [90]:
lr.error(xte_, yte)

1.4357709782337664

### The coefficient of determination (R-squared): 1 is perfect prediction

In [124]:
lr_score = lr.score(xte_, yte)

In [92]:
lr_score

0.0770038719316789

In [95]:
plot_3d_linear_regression(xte_, yte, [lr], labels=['Ordinary LR'], figsize=(700, 600), opacity=.3, colorscales=['ice'])

# Ridge

In [137]:
ridge = RidgeRegression()
ridge_alphas = np.arange(0, 7.5, .1)

In [138]:
ridge.fit(xtr_, ytr, alphas=ridge_alphas)

In [139]:
plot_ridge_accuracy(ridge_alphas, ridge.alpha_scores, ridge.alpha, lr_score)

# Question: Is linear regression a high variance/low bias model, or a low variance/high bias model?

_Answer:_

- Low variance/high bias
- Under repeated sampling, the line will stay roughly in the same place (low variance)
    - But the average of those models won't do a great job capturing the true relationship (high bias)
- Note that low variance is a useful characteristic when you don't have a lot of training data

In [140]:
ridge.alpha

2.9000000000000004

In [141]:
ridge.score(xte_, yte)

0.07520555709435717

In [142]:
ridge.error(xte_, yte)

1.438568355356634

In [132]:
%timeit ridge.fit(xtr_, ytr, alphas=ridge_alphas)

10 loops, best of 5: 42.8 ms per loop


### Standardizing
- **Centering/scaling does not affect your statistical inference in regression models**

- **The estimates are adjusted appropriately and the 𝑝-values will be the same.**

In [None]:
ridge_z = RidgeRegression()
alphas_z = np.arange(26, 52, .1)
ridge_z.fit(xtrs, ytr, alphas=alphas_z)

In [None]:
plot_ridge_accuracy(alphas_z, ridge_z.alpha_scores, ridge_z.alpha)

In [None]:
ridge_z.alpha

39.40000000000019

In [None]:
ridge_z.score(standardize(xte), yte)

0.07149001074919548

In [None]:
ridge_z.error(standardize(xte), yte)

1.4443480909897946

### Centering

In [None]:
ridge_c = RidgeRegression()
alphas_c = np.arange(0, 8, .1)
ridge_c.fit(xtrc, ytr, alphas=alphas_c)

In [None]:
plot_ridge_accuracy(alphas_c, ridge_c.alpha_scores, ridge_c.alpha)

In [None]:
ridge_c.alpha

3.2

In [None]:
ridge_c.score(center(xte), yte)

0.06210671457838701

In [None]:
ridge_c.error(center(xte), yte)

1.4589443215833229

In [None]:
plot_3d_linear_regression(
    xte, 
    yte, 
    [lr, ridge, ridge_c, ridge_z], 
    labels=['Ordinary', 'Ridge', 'Ridge Centered', 'Ridge Z-score'], 
    figsize=(800, 600), 
    opacity=.2, 
    colorscales=['sunset', 'ice', 'rainbow', 'greens'],
    plot_test=False
)

In [None]:
def build_ridge(X, y, alpha):
    ridge = RidgeRegression()
    ridge.fit(X, y, alphas=alpha)
    return ridge
alphas = np.arange(0, 400, 20)
ridge_lrs = [build_ridge(xtrp, ytrp, alpha) for alpha in alphas]
labels = [str(alpha) for alpha in alphas]

In [None]:
plot_3d_linear_regression(xte, yte, ridge_lrs, labels=labels, figsize=(600, 600), plot_test=False)

### Solve comparison

In [None]:
ridge_im = RidgeRegression(algorithm='inverse_matrix')
ridge_lu = RidgeRegression(algorithm='lu_factorization')

In [None]:
%timeit ridge_im.fit(xtrp, ytrp, alphas=[1])

1000 loops, best of 5: 1.43 ms per loop


In [None]:
ridge_im.w

array([0.14501507, 0.43323265, 1.4661977 ])

In [None]:
%timeit ridge_lu.fit(xtrp, ytrp, alphas=1)

1000 loops, best of 5: 1.25 ms per loop


In [None]:
ridge_lu.w

array([0.14501507, 0.43323265, 1.4661977 ])

# scikit-learn comparisson

In [None]:
! pip show scikit-learn

Name: scikit-learn
Version: 0.22.2.post1
Summary: A set of python modules for machine learning and data mining
Home-page: http://scikit-learn.org
Author: None
Author-email: None
License: new BSD
Location: /usr/local/lib/python3.7/dist-packages
Requires: scipy, numpy, joblib
Required-by: yellowbrick, umap-learn, textgenrnn, sklearn, sklearn-pandas, pynndescent, mlxtend, lucid, lightgbm, librosa, imbalanced-learn, fancyimpute


In [143]:
from sklearn import linear_model
from sklearn.metrics import mean_squared_error as mean_squared_error_skl, r2_score as r2_score_skl

In [144]:
lr_skl = linear_model.LinearRegression()

In [145]:
%timeit lr_skl.fit(xtr, ytr)

The slowest run took 13.15 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 5: 337 µs per loop


In [146]:
predictions_skl = lr_skl.predict(xte)

In [147]:
%timeit lr_skl.predict(xte)

The slowest run took 5.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 45.4 µs per loop


In [148]:
lr_skl.coef_, lr_skl.intercept_

(array([0.44839178, 1.5551459 ]), 0.14667095656891727)

In [149]:
lr.w

array([0.14667096, 0.44839178, 1.5551459 ])

In [None]:
mse_skl = mean_squared_error_skl(yte, predictions_skl)

In [None]:
mse_skl

1.4357709782337664

In [None]:
lr_skl_score = lr_skl.score(xte, yte)

In [None]:
lr_skl_score

0.0770038719316789

In [None]:
lr_skl_score - lr_score

0.0

In [None]:
(predictions_skl == predictions_lr).sum(), (predictions_skl - predictions_lr).max()

(5, 4.440892098500626e-16)

# Ridge SKL

In [None]:
# solver='cholesky' uses the standard scipy.linalg.solve function to obtain a closed-form solution.
ridge_skl = linear_model.Ridge(alpha=1, solver='cholesky', fit_intercept=False)

In [None]:
ridge_skl.fit(prepend_col(xtr, 1), ytr)

Ridge(alpha=1, copy_X=True, fit_intercept=False, max_iter=None, normalize=False,
      random_state=None, solver='cholesky', tol=0.001)

In [None]:
ridge_skl.coef_

array([0.14501507, 0.43323265, 1.4661977 ])

In [None]:
ridge.w

array([0.14501507, 0.43323265, 1.4661977 ])

In [None]:
predictions_ridge_skl = ridge_skl.predict(prepend_col(xte, 1))

In [None]:
mse_ridge_skl = mean_squared_error_skl(yte, predictions_ridge_skl)

In [None]:
mse_ridge_skl

1.4358700176022388

In [None]:
ridge_skl_score = ridge_skl.score(prepend_col(xte, 1), yte)

In [None]:
ridge_skl_score

0.07694020373179744

In [None]:
ridge_skl_score - ridge_score

0.0

In [None]:
(predictions_ridge_skl == predictions_ridge).sum(), (predictions_ridge_skl - predictions_ridge).max()

(100, 0.0)