# Linear Regression
Using any programming language, implement linear least square regression with the
penalty term 0.5λwT w. Do not use any library such as scikit-learn that already has linear regression or cross validation implemented. Implementing linear regression and cross validation from scratch will be a good exercise
to make sure that you fully understand those algorithms. Feel free to use general libraries for array and matrix
operations such as numpy. Feel free to verify the correctness of your implementation with existing libraries such
as scikit-learn.

Download the dataset posted on the course web page. The output space is continuous (i.e., y ∈ <). Determine the best λ by 10-fold cross validation.

Find the best λ by 10-fold cross validation. Draw a graph that shows the cross validation accuracy as λ increases from 0 to 4 in increments of 0.1.

Dataset for linear regression: **regression-dataset.zip**
- Problem: this data corresponds to samples from a 2D surface that you can plot to visualize how linear regression is working.
- Format: there is one row per data instance and one column per attribute. The targets are real values.
- The training set is already divided into 10 subsets for 10-fold cross validation.

# How Well Does the Model Fit the data?
To evaluate the overall fit of a linear model, we use the **R-squared** value

- R-squared is the proportion of variance explained
    - It is the proportion of variance in the observed data that is explained by the model, or the reduction in error over the null model
    - The null model just predicts the mean of the observed response, and thus it has an intercept and no slope
- R-squared is between 0 and 1
    - Higher values are better because it means that more variance is explained by the model.

# Question: Is linear regression a high variance/low bias model, or a low variance/high bias model?

_Answer:_

- Low variance/high bias
- Under repeated sampling, the line will stay roughly in the same place (low variance)
    - But the average of those models won't do a great job capturing the true relationship (high bias)
- Note that low variance is a useful characteristic when you don't have a lot of training data

# Numpy -> TensorFlow migration

0. Convert numpy array from `pd.read_csv(...).values` to `tf.Tensor`
1. Used `tf.pad` to add a column of 1s to the x tensors. (since numpy mask assignment style is not supported)
```python
padding = [[0,0],[1,0]] 
tf.pad(matrix, padding, constant_values=value)
```

2. Matrix inverse, using `tf.linalg.inv` instead of `np.matrix.I`
3. In numpy transposes are memory-efficient constant time operations as they simply return a new view of the same data with adjusted strides.
TensorFlow does not support strides, so `transpose returns a new tensor` with the items permuted.
```python
tf.transpose(tensor)
```
Precompute if used multiple times!
4. TF does not support numpy mask assignment... use `tf.where`
```python
norm_v = tf.where(norm_v == 0, 1, norm_v)
# instead of
norm_v[norm_v == 0] = 1
```
5. Not TF related but decided to use y as a tensor of shape (n, 1) instead of (n, )
6. 'retval_' has dtype int32 in the main branch, but dtype float64 in the else branch.
Chnaged return type to always be float64.
```python
    if numerator == 0:
        return tnp.float64(1)
    if denominator == 0:
        return tnp.float64(0)
```
7. Retracing...
```
WARNING:tensorflow:5 out of the last 12 calls to <function tf_device.<locals>.decorator.<locals>.applicator at 0x7f5518092e60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
```
8. Decorators & Trace, bad combination
```python
def tensorflow_f(device=None, device_num=0):
    """
    Bad idea, messes up the trace cache
    """
    # Order of precedence: TPU, GPU and CPU
    default_device = "tpu" if tf.config.list_logical_devices("TPU") else "gpu" if tf.config.list_physical_devices("GPU") else "cpu"

    def decorator(f):
        device_type = device or default_device
        tf_device = f"/{device_type}:{device_num}"
        print(f'Defining function <{f.__name__}> to run on device: {tf_device}')
        @tf.function
        def applicator(*args, **kwargs):
            with tf.device(tf_device):
                return f(*args, **kwargs)

        return applicator
    return decorator
```
9. Numpy like methods
```
AttributeError: 
        'EagerTensor' object has no attribute 'ravel'.
        If you are looking for numpy-related methods, please run the following:
        import tensorflow.python.ops.numpy_ops.np_config
        np_config.enable_numpy_behavior()
```
10. Inside a `@tf.function`, use `tf.shape(x)` (dynamic, runtime) instead of `x.shape` (build time)
11. Inside a `@tf.function`, use `tf.print()` to print info during Graph execution, regular `print(...)` is only called when tracing / retracing. 

In [3]:
import logging
import sys
import numpy as np
import scipy as sp
import pandas as pd
import sklearn as skl

import tensorflow as tf
import tensorflow.experimental.numpy as tnp
from  tensorflow.python.ops.numpy_ops import np_config 
import tensorboard

import plotly.express as px
import plotly.graph_objects as pgo

## Enable numpy-related methods

In [5]:
np_config.enable_numpy_behavior()

In [6]:
%load_ext tensorboard

# GPU
- First TF call using GPU will take longer, since it initializes the GPU.
- It's a good idea to "warm up" the GPU with some initialization call

In [7]:
tf.config.list_physical_devices()

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

# TPU initialization
- TPUs are typically Cloud TPU workers, which are different from the local process running the user's Python program. Thus, you need to do some initialization work to connect to the remote cluster and initialize the TPUs. 
- Note that the tpu argument to tf.distribute.cluster_resolver.TPUClusterResolver is a special address just for Colab. 
- If you are running your code on Google Compute Engine (GCE), you should instead pass in the name of your Cloud TPU.

In [8]:
try:
    # tf.distribute.cluster_resolver.TPUClusterResolver is a special address just for Colab.
    resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
    tf.config.experimental_connect_to_cluster(resolver)
    # This is the TPU initialization code that has to be at the beginning.
    tf.tpu.experimental.initialize_tpu_system(resolver)
    tf.config.list_logical_devices()
except Exception as e:
    print("Unable to initialize TPU!\n", repr(e))

Unable to initialize TPU!
 ValueError('Please provide a TPU Name to connect to.')


In [9]:
! pip show tensorflow keras numpy scipy sklearn pandas plotly | grep -E 'Name|Version'

Name: tensorflow
Version: 2.5.0
Name: Keras
Version: 2.4.3
Name: numpy
Version: 1.19.5
Name: scipy
Version: 1.4.1
Name: sklearn
Version: 0.0
Name: pandas
Version: 1.1.5
Name: plotly
Version: 4.4.1


In [10]:
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('notebook')

## Utils

In [36]:
def read_csv(file_name_prefix, partition=''):
    return pd.read_csv(f'{file_name_prefix}{partition}.csv', header=None).values

def read_train_data(num_partitions):
    train_data = [read_csv('trainInput', i + 1) for i in range(num_partitions)]
    train_labels = [read_csv('trainTarget', i + 1) for i in range(num_partitions)]
    return train_data, train_labels

def read_test_data():
    test_data = read_csv('testInput')
    test_labels = read_csv('testTarget')
    return test_data, test_labels

@tf.function(
    input_signature=[
                    tf.TensorSpec(shape=None, dtype=tf.float64), 
                    tf.TensorSpec(shape=None, dtype=tf.float64), 
                    tf.TensorSpec(shape=(), dtype=tf.int32),
                    tf.TensorSpec(shape=(), dtype=tf.int32),
                    ]
            )
def split_train_validation_tf(x, y, fold, folds):
    """
    Splits input data into train and validation sets.
    Used in k-fold cross validation.
    """
    print("tracing split_train_validation_tf.")
    n = tf.shape(x)[0]
    fold_size = n // folds
    fold_start = fold * fold_size
    fold_end = (fold + 1) * fold_size
    train_x = tf.concat([x[:fold_start], x[fold_end:]], axis=0)
    train_y = tf.concat([y[:fold_start], y[fold_end:]], axis=0)
    validation_x, validation_y = x[fold_start:fold_end], y[fold_start:fold_end]
    return train_x, train_y, validation_x, validation_y

def norm(v):
    norm_v = tf.norm(v, axis=1)[:, tnp.newaxis]
    norm_v = tf.where(norm_v == 0, 1, norm_v)
    return v / norm_v    

def center(v):
    """
    Centers variable. (subtracts the mean)
    The mean of the variable becomes 0 (centered)
    """
    return v - v.mean(axis=0)

def scale(v, factor):
    """
    This is not really a useful method... just to illustrate what scaling is...
    """
    return v * factor

def standardize(v):
    """
    Subtracts mean and divides by standard deviation. (centers and scales)
    - The convention that you standardize predictions primarily exists so that the units of the regression coefficients are the same.
    - Centering/scaling does not affect your statistical inference in regression models.
    - The estimates are adjusted appropriately and the 𝑝-values will be the same.
    https://stats.stackexchange.com/a/29783
    """
    return scale(center(v), 1 / v.std(axis=0))

def tf_default_device(device_num=0):
    """
    Order of precedence: TPU, GPU and CPU
    """
    device_type = "tpu" if tf.config.list_logical_devices("TPU") else "gpu" if tf.config.list_physical_devices("GPU") else "cpu"
    return f'{device_type}:{device_num}'

# Plotting

In [37]:
def plot_3d_mesh(x, y, z, labels=[], figsize=(500, 500), title='Graph', opacity=.5, color='lightpink', **kwargs):
    fig = pgo.Figure(data=[pgo.Mesh3d(x=x, y=y, z=z, color=color, opacity=opacity)])
    fig.update_layout(
        title=title, 
        autosize=True,
        width=figsize[0], 
        height=figsize[1],
    )
    fig.show()

def plot_3d_linear_regression(X, y, lr_list, labels=[], figsize=(500, 500), title='Linear Regression', opacity=.2, colorscales=['blues'], plot_Xy=True, **kwargs):
    mesh_size = .02
    margin = 0

    # Create a mesh grid on which we will run our model
    x1 = X[:, 0]
    x2 = X[:, 1]
    x1_min, x1_max = x1.min() - margin, x1.max() + margin
    x2_min, x2_max = x2.min() - margin, x2.max() + margin
    x1_range = np.arange(x1_min, x1_max, mesh_size)
    x2_range = np.arange(x2_min, x2_max, mesh_size)
    xx1, xx2 = np.meshgrid(x1_range, x2_range)

    # Generate the plot
    fig = pgo.Figure()
    fig.update_layout(title=title, autosize=True, width=figsize[0], height=figsize[1])
    if plot_Xy:
        z = y[:, 0]
        fig.add_trace(pgo.Scatter3d(x=x1, y=x2, z=z, name='Train', mode='markers', marker=dict(size=5, color='blue')))
        fig.add_trace(pgo.Mesh3d(x=x1, y=x2, z=z, opacity=opacity, color='blue'))
    for i, lr in enumerate(lr_list):
        # Run model
        pred = lr.predict(np.c_[xx1.ravel(), xx2.ravel()])
        pred = pred.reshape(xx1.shape)
        # plot model
        label = labels[i] if i < len(labels) else f'LR {i}'
        colorscale = colorscales[i] if i < len(colorscales) else 'sunset'
        fig.add_traces(pgo.Surface(x=x1_range, y=x2_range, z=pred, name=labels[i], opacity=opacity + .5, colorscale=colorscale, showscale=False))
    fig.show()

def plot_3d_scatter(x, y, z, labels=[], figsize=(500, 500), title=''):
    fig = pgo.Figure()
    for i in range(len(x)):
        # mode = [markers, lines, lines+markers]
        fig.add_trace(pgo.Scatter3d(x=x[i], y=y[i], z=z[i], name=labels[i], mode='markers', marker=dict(size=5)))
    fig.update_layout(title=title, autosize=True, width=figsize[0], height=figsize[1],)
    fig.show()

def plot_ridge_accuracy(alphas, scores, lr_score=None):
    argmax = scores.numpy().argmax()
    ridge_scores_fig = pgo.Figure()
    ridge_scores_fig.add_trace(pgo.Scatter(x=alphas, y=scores, mode='lines', name='Ridge Accuracy'))
    ridge_scores_fig.add_trace(pgo.Scatter(x=[alphas[argmax]], y=[scores[argmax]], mode='markers', name='Best Alpha'))
    if lr_score:
        ridge_scores_fig.add_trace(pgo.Scatter(x=[alphas[argmax]], y=[lr_score], mode='markers', name='Ordinary LR'))
    ridge_scores_fig.update_layout(title='Ridge Regression Accuracy', autosize=True, width=500, height=500,)
    ridge_scores_fig.update_xaxes(title_text='Alpha')
    ridge_scores_fig.update_yaxes(title_text='Accuracy (R square)')
    ridge_scores_fig.show()

## Data

In [38]:
# read data
xtrp, ytrp = read_train_data(num_partitions=10)
xtr, ytr = np.concatenate(xtrp), np.concatenate(ytrp)

xte, yte = read_test_data()

In [39]:
xtr.shape, ytr.shape, xte.shape, yte.shape

((200, 2), (200, 1), (100, 2), (100, 1))

In [40]:
sp.stats.describe(xtr), sp.stats.describe(ytr)

(DescribeResult(nobs=200, minmax=(array([-0.4918  , -0.498359]), array([0.498653, 0.499338])), mean=array([ 0.03404111, -0.01626617]), variance=array([0.08983553, 0.0814516 ]), skewness=array([-0.10501448,  0.06588417]), kurtosis=array([-1.32532621, -1.15687566])),
 DescribeResult(nobs=200, minmax=(array([-2.871806]), array([2.79705])), mean=array([0.13663843]), variance=array([1.5384538]), skewness=array([-0.26844153]), kurtosis=array([-0.698211])))

In [41]:
sp.stats.describe(xte), sp.stats.describe(yte)

(DescribeResult(nobs=100, minmax=(array([-0.493968, -0.49695 ]), array([0.475156, 0.492752])), mean=array([-0.0239291 , -0.02267433]), variance=array([0.08565048, 0.07263082]), skewness=array([ 0.19348207, -0.01646564]), kurtosis=array([-1.16935103, -1.10594633])),
 DescribeResult(nobs=100, minmax=(array([-2.412807]), array([2.433282])), mean=array([-0.15687814]), variance=array([1.57126739]), skewness=array([0.18538636]), kurtosis=array([-1.10926455])))

# My implementation

In [None]:
class LinearRegression:
    def __init__(self):
        self.tf_device = tf_default_device()
        self.w = None

    def fit(self, X, y):
        self.w = self.fit_tf(X, y)
        return self.w

    @tf.function
    def fit_tf(self, X, y):
        with tf.device(self.tf_device):
            x = prepend_col(X, 1)
            xt = tf.transpose(x)
            w = tf.linalg.inv(xt @ x) @ xt @ y
            return w

    def predict(self, X):
        self.assert_trained()
        return self.predict_tf(X, self.w)

    @tf.function
    def predict_tf(self, X, w):
        with tf.device(self.tf_device):
            x = prepend_col(X, 1)
            predictions = x @ w
            return predictions

    def score(self, X, y):
        self.assert_trained()
        return self.score_tf(X, y, self.w)
    
    @tf.function
    def score_tf(self, X, y, w):
        """
        R-squared score
        """
        with tf.device(self.tf_device):
            predictions = self.predict_tf(X, w)
            return r2_score(y, predictions)

    def assert_trained(self):
        assert self.w is not None, 'Please train the model with fit(X, y) before making predictions.'

class RidgeRegression:
    """
    Ordinary LinearRegression = RidgeRegression for alpha = 0
    """
    def __init__(self, algorithm='lu_factorization', device=None):
        assert algorithm in ['inverse_matrix', 'lu_factorization']
        self.w = None
        self.alpha = None
        self.alpha_scores = None
        self.algorithm = algorithm
        self.device = device or tf_default_device()
        
    def fit(self, X, y, alphas, folds=10):
        self.alpha, self.w, self.alpha_scores = self.fit_tf(X, y, alphas, folds)

    @tf.function(
        input_signature=[
                         tf.TensorSpec(shape=None, dtype=tf.float64), 
                         tf.TensorSpec(shape=None, dtype=tf.float64), 
                         tf.TensorSpec(shape=None, dtype=tf.float64),
                         tf.TensorSpec(shape=(), dtype=tf.int32),
                         ]
                 )
    def fit_tf(self, X, y, alphas, folds):
        """
        Trains model using cross validation.
        TF based implementation is faster than numpy.
        """
        alphas_size = tf.size(alphas)
        scores = tf.TensorArray(tf.float64, size=alphas_size)
        scores = scores.unstack(tf.zeros_like(alphas))
                                                                                                                             
        for fold in tf.range(folds, dtype=tf.int32):
            # splits train and validation sets
            xtr, ytr, xvl, yvl = split_train_validation_tf(X, y, fold, folds)
            for i in tf.range(alphas_size):
                # fits model using trainig set
                w = self.solve_w(xtr, ytr, alphas[i]) 
                # calculates score using validation set
                score = self.score_tf(xvl, yvl, w)
                # accumulates score of each alpha over all folds
                scores = scores.write(i, scores.read(i) + score)
        
        alpha_scores = scores.stack() / folds
        alpha = alphas[tf.argmax(alpha_scores)]
        # trains with best alpha using all train data
        w = self.solve_w(X, y, alpha)
        return alpha, w, alpha_scores

    @tf.function(
        input_signature=[
                         tf.TensorSpec(shape=None, dtype=tf.float64), 
                         tf.TensorSpec(shape=None, dtype=tf.float64), 
                         tf.TensorSpec(shape=(), dtype=tf.float64)
                         ]
                 )
    def solve_w(self, X, y, alpha):
        """
        Closed form solution.
        w = inv(X.T X + alpha * I) * X.T y
        Uses inverse matrix or solves system of linear eqs.
        """
        print('tracing fit')
        with tf.device(self.device):
            # tf.print(tf.shape(X)[0])
            x = prepend_col(X, 1)
            xt = tf.transpose(x)
            # covariance matrix
            A = xt @ x
            I = tf.eye(tf.shape(xt)[0])
            if self.algorithm == 'inverse_matrix':
                w = tf.linalg.inv(A + alpha * I) @ xt @ y
            elif self.algorithm == 'lu_factorization':
                # Uses LU factorization.
                w = tf.linalg.solve(A + alpha * I, xt @ y)
            return w

    def predict(self, X):
        self.assert_trained()
        return self.predict_tf(X, self.w)

    @tf.function(
        input_signature=[
                         tf.TensorSpec(shape=None, dtype=tf.float64), 
                         tf.TensorSpec(shape=None, dtype=tf.float64)
                         ]
                 )
    def predict_tf(self, X, w):
        print('tracing predict')
        with tf.device(self.device):
            x = prepend_col(X, 1)
            predictions = x @ w
            return predictions

    def score(self, X, y):
        self.assert_trained()
        return self.score_tf(X, y, self.w)

    @tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float64), tf.TensorSpec(shape=None, dtype=tf.float64), tf.TensorSpec(shape=None, dtype=tf.float64)])
    def score_tf(self, X, y, w):
        print('tracing score')
        with tf.device(self.device):
            predictions = self.predict_tf(X, w)
            return r2_score(y, predictions)

    def error(self, X, y):
        self.assert_trained()
        return self.error_tf(X, y, self.w)

    @tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float64), tf.TensorSpec(shape=None, dtype=tf.float64), tf.TensorSpec(shape=None, dtype=tf.float64)])
    def error_tf(self, X, y, w):
        with tf.device(self.device):
            predictions = self.predict_tf(X, w)
            return mean_squared_error(y, predictions)

    def assert_trained(self):
        assert self.w is not None, 'Please train the model with fit(X, y) before making predictions.'


def mean_squared_error(y, y_pred):
    mse = tnp.average((y - y_pred) ** 2)
    return mse


def r2_score(y, y_pred):
    """
    "R squared" provides a measure of how well observed outcomes are replicated by the model, 
    based on the proportion of total variation of outcomes explained by the model.
    """
    numerator = tnp.sum((y - y_pred) ** 2)
    denominator = tnp.sum((y - tnp.average(y)) ** 2)
    score = tf.where(
        numerator == 0,
        tnp.float64(1),
        tf.where(
            denominator == 0,
            tnp.float64(0),
            1. - numerator / denominator
        )
    )
    return score


def prepend_col(matrix, value):
    # padding = [[p11, p12], [p21, p22]]
    # p11 - padding before the first dimension
    # p12 - padding after the first dimension
    # p21 - padding before the second dimension
    # p22 - padding after the second dimension
    padding = [[0,0],[1,0]] 
    return tf.pad(matrix, padding, constant_values=value)

# Ordinary Linear Regression - Testing My implementation

In [49]:
lr = LinearRegression()

In [50]:
%time lr.fit(xtr, ytr)

CPU times: user 67.8 ms, sys: 295 µs, total: 68.1 ms
Wall time: 72.2 ms


<tf.Tensor: shape=(3, 1), dtype=float64, numpy=
array([[0.14667096],
       [0.44839178],
       [1.5551459 ]])>

In [51]:
print(lr.fit_tf.pretty_printed_concrete_signatures())

fit_tf(X, y)
  Args:
    X: float64 Tensor, shape=(200, 2)
    y: float64 Tensor, shape=(200, 1)
  Returns:
    float64 Tensor, shape=(3, 1)


In [52]:
lr.w.numpy()

array([[0.14667096],
       [0.44839178],
       [1.5551459 ]])

In [53]:
%time y_pred_lr = lr.predict(xte)

CPU times: user 43.1 ms, sys: 2.81 ms, total: 45.9 ms
Wall time: 47.5 ms


In [54]:
print(lr.predict_tf.pretty_printed_concrete_signatures())

predict_tf(X, w)
  Args:
    X: float64 Tensor, shape=(100, 2)
    w: float64 Tensor, shape=(3, 1)
  Returns:
    float64 Tensor, shape=(100, 1)


### The coefficient of determination (R-squared): 1 is perfect prediction

In [55]:
%time lr_score = lr.score(xte, yte)

CPU times: user 69.8 ms, sys: 1.98 ms, total: 71.8 ms
Wall time: 72.5 ms


In [56]:
print(lr.score_tf.pretty_printed_concrete_signatures())

score_tf(X, y, w)
  Args:
    X: float64 Tensor, shape=(100, 2)
    y: float64 Tensor, shape=(100, 1)
    w: float64 Tensor, shape=(3, 1)
  Returns:
    float64 Tensor, shape=()


In [57]:
lr_score.numpy()

0.0770038719316789

In [58]:
plot_3d_linear_regression(xte, yte, [lr], labels=['Ordinary LR'], figsize=(700, 600), opacity=.3, colorscales=['ice'])

# Ridge

In [59]:
ridge = RidgeRegression()

In [60]:
ridge_alphas = np.arange(0, 7.5, .1)
ridge.fit(xtr, ytr, alphas=ridge_alphas, folds=10)

tracing fit
tracing score
tracing predict


In [61]:
ridge.alpha.numpy()[0], ridge.alpha_scores.numpy().max()

(2.9000000000000004, 0.07823787906800092)

In [62]:
ridge_score = ridge.score(xte, yte)
ridge_score.numpy()

0.07520555709435717

In [63]:
ridge_error = ridge.error(xte, yte)
ridge_error.numpy()

1.438568355356634

In [64]:
plot_ridge_accuracy(ridge_alphas, ridge.alpha_scores, lr_score)

In [65]:
%timeit ridge.fit(xtr, ytr, alphas=ridge_alphas, folds=10)

1 loop, best of 5: 829 ms per loop


In [66]:
print(ridge.fit_tf.pretty_printed_concrete_signatures())
print(ridge.solve_w.pretty_printed_concrete_signatures())
print(ridge.score_tf.pretty_printed_concrete_signatures())
print(ridge.predict_tf.pretty_printed_concrete_signatures())
print(split_train_validation_tf.pretty_printed_concrete_signatures())

fit_tf(X, y, alphas, folds)
  Args:
    X: float64 Tensor, shape=<unknown>
    y: float64 Tensor, shape=<unknown>
    alphas: float64 Tensor, shape=<unknown>
    folds: int32 Tensor, shape=()
  Returns:
    (<1>, <2>, <3>)
      <1>: float64 Tensor, shape=<unknown>
      <2>: float64 Tensor, shape=(None, None)
      <3>: float64 Tensor, shape=<unknown>
solve_w(X, y, alpha)
  Args:
    X: float64 Tensor, shape=<unknown>
    y: float64 Tensor, shape=<unknown>
    alpha: float64 Tensor, shape=()
  Returns:
    float64 Tensor, shape=(None, None)
score_tf(X, y, w)
  Args:
    X: float64 Tensor, shape=<unknown>
    y: float64 Tensor, shape=<unknown>
    w: float64 Tensor, shape=<unknown>
  Returns:
    float64 Tensor, shape=()
predict_tf(X, w)
  Args:
    X: float64 Tensor, shape=<unknown>
    w: float64 Tensor, shape=<unknown>
  Returns:
    float64 Tensor, shape=<unknown>
tracing split_train_validation_tf.
split_train_validation_tf(x, y, fold, folds)
  Args:
    x: float64 Tensor, shape=<u

In [67]:
ridge_train = ridge.fit_tf.get_concrete_function(tf.constant(xtr), tf.constant(ytr), tf.constant(ridge_alphas), tf.constant(10))

for node in ridge_train.graph.as_graph_def().node:
    print(f'{node.input} -> {node.name}')

[] -> X
[] -> y
[] -> alphas
[] -> folds
['alphas'] -> Size
[] -> TensorArrayV2/element_shape
['TensorArrayV2/element_shape', 'Size'] -> TensorArrayV2
['alphas'] -> zeros_like
[] -> TensorArrayUnstack/TensorListFromTensor/element_shape
['zeros_like', 'TensorArrayUnstack/TensorListFromTensor/element_shape'] -> TensorArrayUnstack/TensorListFromTensor
[] -> range/start
[] -> range/delta
['range/start', 'folds', 'range/delta'] -> range
['folds', 'range/start'] -> sub
['sub', 'range/delta'] -> floordiv
['sub', 'range/delta'] -> mod
[] -> zeros_like_1
['mod', 'zeros_like_1'] -> NotEqual
['NotEqual'] -> Cast
['floordiv', 'Cast'] -> add
[] -> zeros_like_2
['add', 'zeros_like_2'] -> Maximum
[] -> while/maximum_iterations
[] -> while/loop_counter
['while/loop_counter', 'while/maximum_iterations', 'range/start', 'TensorArrayUnstack/TensorListFromTensor', 'folds', 'X', 'y', 'Size', 'alphas', 'range/delta'] -> while
[] -> TensorArrayV2Stack/TensorListStack/element_shape
['while:3', 'TensorArrayV2St

### Standardizing (this can be wrong...)
- **Centering/scaling does not affect your statistical inference in regression models**

- **The estimates are adjusted appropriately and the 𝑝-values will be the same.**

In [68]:
ridge_z = RidgeRegression()
alphas_z = np.arange(26, 52, .1)
ridge_z.fit(standardize(xtr), ytr, alphas=alphas_z)

tracing fit
tracing score
tracing predict


In [69]:
plot_ridge_accuracy(alphas_z, ridge_z.alpha_scores)

In [70]:
ridge_z.alpha

<tf.Tensor: shape=(1,), dtype=float64, numpy=array([33.5])>

In [71]:
ridge_z.score(standardize(xte), yte)

<tf.Tensor: shape=(), dtype=float64, numpy=0.07171920839096024>

In [72]:
ridge_z.error(standardize(xte), yte)

<tf.Tensor: shape=(), dtype=float64, numpy=1.4439915615176568>

### Centering (this can be wrong...)

In [73]:
ridge_c = RidgeRegression()
alphas_c = np.arange(0, 8, .1)
ridge_c.fit(center(xtr), ytr, alphas=alphas_c)

tracing fit
tracing score
tracing predict


In [74]:
plot_ridge_accuracy(alphas_c, ridge_c.alpha_scores)

In [75]:
ridge_c.alpha

<tf.Tensor: shape=(1,), dtype=float64, numpy=array([3.])>

In [76]:
ridge_c.score(center(xte), yte)

<tf.Tensor: shape=(), dtype=float64, numpy=0.06376230521634274>

In [77]:
ridge_c.error(center(xte), yte)

<tf.Tensor: shape=(), dtype=float64, numpy=1.4563689597616138>

In [78]:
plot_3d_linear_regression(
    xte, 
    yte, 
    [lr, ridge, ridge_c, ridge_z], 
    labels=['Ordinary', 'Ridge', 'Ridge Centered', 'Ridge Z-score'], 
    figsize=(800, 600), 
    opacity=.2, 
    colorscales=['sunset', 'ice', 'rainbow', 'greens'],
    plot_Xy=True
)

In [79]:
def build_ridge(X, y, alpha):
    ridge = RidgeRegression()
    ridge.fit(X, y, alphas=alpha)
    return ridge
alphas = np.arange(0, 400, 20)
ridge_lrs = [build_ridge(xtr, ytr, [alpha]) for alpha in alphas]
labels = [str(alpha) for alpha in alphas]

tracing fit
tracing score
tracing predict




tracing fit
tracing score
tracing predict




tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict
tracing fit
tracing score
tracing predict


In [80]:
plot_3d_linear_regression(xte, yte, ridge_lrs, labels=labels, figsize=(600, 600), plot_Xy=True)

### Solve comparison

In [81]:
ridge_im = RidgeRegression(algorithm='inverse_matrix')

In [82]:
%timeit ridge_im.fit(xtr, ytr, alphas=[1])

tracing fit
tracing score
tracing predict
The slowest run took 24.49 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 5: 19.2 ms per loop


In [83]:
ridge_im.w.numpy()

array([[0.14501507],
       [0.43323265],
       [1.4661977 ]])

In [84]:
%timeit ridge.fit(xtr, ytr, alphas=[1])

10 loops, best of 5: 20.5 ms per loop


In [85]:
ridge.w.numpy()

array([[0.14501507],
       [0.43323265],
       [1.4661977 ]])

# scikit-learn comparisson

In [86]:
! pip show scikit-learn

Name: scikit-learn
Version: 0.22.2.post1
Summary: A set of python modules for machine learning and data mining
Home-page: http://scikit-learn.org
Author: None
Author-email: None
License: new BSD
Location: /usr/local/lib/python3.7/dist-packages
Requires: numpy, joblib, scipy
Required-by: yellowbrick, sklearn, sklearn-pandas, mlxtend, lightgbm, librosa, imbalanced-learn


In [87]:
from sklearn import linear_model
from sklearn.metrics import mean_squared_error as mean_squared_error_skl, r2_score as r2_score_skl

In [88]:
lr_skl = linear_model.LinearRegression()

In [89]:
%timeit lr_skl.fit(xtr, ytr)

The slowest run took 90.66 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 5: 264 µs per loop


In [90]:
predictions_skl = lr_skl.predict(xte)

In [91]:
%timeit lr_skl.predict(xte)

10000 loops, best of 5: 39.1 µs per loop


In [92]:
lr_skl.coef_, lr_skl.intercept_

(array([[0.44839178, 1.5551459 ]]), array([0.14667096]))

In [93]:
lr.w

<tf.Tensor: shape=(3, 1), dtype=float64, numpy=
array([[0.14667096],
       [0.44839178],
       [1.5551459 ]])>

In [94]:
mse_skl = mean_squared_error_skl(yte, predictions_skl)

In [95]:
mse_skl

1.4357709782337664

In [96]:
lr_skl_score = lr_skl.score(xte, yte)

In [97]:
lr_skl_score

0.0770038719316789

In [98]:
lr_skl_score - lr_score

<tf.Tensor: shape=(), dtype=float64, numpy=0.0>

In [99]:
(predictions_skl == predictions_lr).sum(), (predictions_skl - predictions_lr).max()

NameError: ignored

# Ridge SKL

In [None]:
# solver='cholesky' uses the standard scipy.linalg.solve function to obtain a closed-form solution.
ridge_skl = linear_model.Ridge(alpha=1, solver='cholesky', fit_intercept=False)

In [None]:
ridge_skl.fit(prepend_col(xtr, 1), ytr)

In [None]:
ridge_skl.coef_

In [None]:
ridge.w

In [None]:
predictions_ridge_skl = ridge_skl.predict(prepend_col(xte, 1))

In [None]:
mse_ridge_skl = mean_squared_error_skl(yte, predictions_ridge_skl)

In [None]:
mse_ridge_skl

In [None]:
ridge_skl_score = ridge_skl.score(prepend_col(xte, 1), yte)

In [None]:
ridge_skl_score

In [None]:
ridge_skl_score - ridge_score

In [None]:
(predictions_ridge_skl == predictions_ridge).sum(), (predictions_ridge_skl - predictions_ridge).max()