# Linear Regression


In [None]:
import math
from typing import List, Tuple

import numpy as np


Correlation tells us the strength of a linear relationship between two variables. But what if we want to predict the value of one variable given the value of another? For example, suppose we want to predict the price of a house given its size. We can do this using linear regression.

We begin by hypothesising the existence a linear model:

$$
y_i = \beta x_i + \alpha
$$

where $y_i$ is the price of the house $i$, $x_i$ is the size of the house $i$, $\beta$ is the slope of the line, $\alpha$ is the intercept. Which parameters would result in the best fit line? We can use the least squares method to find the best fit line. The least squares method minimises the **_sum of the squared errors_** (or residuals). This is also known as our cost function, $S$:

$$
S = \sum_{i=1}^n (y_i - \hat{y_i})^2
$$

where $\hat{y_i}$ is the predicted value of $y_i$.

We can subsitute our model for the predicted value of $y_i$:

$$
\begin{align}
S &= \sum_{i=1}^n (y_i - (\beta x_i + \alpha))^2 \\
&= \sum_{i=1}^n (y_i - \beta x_i - \alpha)^2
\end{align}
$$

To minimize our cost function, S, we must find where the first derivative of S is equal to 0 with respect to $\alpha$ and $\beta$. The closer $\alpha$ and $\beta$ are to 0, the less the total error for each point is. Let’s start with the partial derivative of $\alpha$ first.

$$
\frac{\partial S}{\partial \alpha}[\sum_{i=1}^n (y_i - \beta x_i - \alpha)]
$$

We


If we have determined $\alpha$ and $\beta$, we can make predictions like so:


In [None]:
def predict(alpha: float, beta: float, x_i: float) -> float:
    return beta * x_i + alpha


## Deriving $\alpha$ and $\beta$


Any choice of $\alpha$ and $\beta$ gives us a predicted output for each input $x_i$. Since we know the actual output $y_i$, we can compute the error for each pair:


In [None]:
def error(alpha: float, beta: float, x_i: float, y_i: float) -> float:
    """
    The error from predicting beta * x_i + alpha when the actual value is y_i
    """
    return predict(alpha, beta, x_i) - y_i

We’d like to know the total error over the entire dataset. But we don’t want to just add the errors — if the prediction for $x_1$ is too high and the prediction for $x_2$ is too low, the errors may just cancel out. So instead we add up the squared errors:


In [None]:
def sum_of_sqerrors(alpha: float, beta: float, x: List[float], y: List[float]) -> float:
    return sum(error(alpha, beta, x_i, y_i) ** 2 for x_i, y_i in zip(x, y))


The least squares solution is to choose the $\alpha$ and $\beta$ that make `sum_of_sqerrors` as small as possible. Using calculus (or tedious algebra), the error-minimizing alpha and beta are given by:


In [None]:
def dot(v: List[float], w: List[float]) -> float:
    """Computes v_1 * w_1 + ... + v_n * w_n"""
    assert len(v) == len(w), "vectors must be same length"

    return sum(v_i * w_i for v_i, w_i in zip(v, w))


def sum_of_squares(v: List[float]) -> float:
    """Returns v_1 * v_1 + ... + v_n * v_n"""
    return dot(v, v)


def mean(xs: List[float]) -> float:
    return sum(xs) / len(xs)


def de_mean(xs: List[float]) -> List[float]:
    """Translate xs by subtracting its mean (so the result has mean 0)"""
    x_bar = mean(xs)
    return [x - x_bar for x in xs]


def covariance(xs: List[float], ys: List[float]) -> float:
    assert len(xs) == len(ys), "xs and ys must have same number of elements"

    return dot(de_mean(xs), de_mean(ys)) / (len(xs) - 1)


def variance(xs: List[float]) -> float:
    """Almost the average squared deviation from the mean"""
    assert len(xs) >= 2, "variance requires at least two elements"

    deviations = de_mean(xs)
    return sum_of_squares(deviations) / (len(xs) - 1)


def standard_deviation(xs: List[float]) -> float:
    """The standard deviation is the square root of the variance"""
    return math.sqrt(variance(xs))


def correlation(xs: List[float], ys: List[float]) -> float:
    """Measures how much xs and ys vary in tandem about their means"""
    stdev_x = standard_deviation(xs)
    stdev_y = standard_deviation(ys)
    if stdev_x > 0 and stdev_y > 0:
        return covariance(xs, ys) / stdev_x / stdev_y
    else:
        return 0  # if no variation, correlation is zero


In [None]:
def least_squares_fit(x: List[float], y: List[float]) -> Tuple[float, float]:
    """
    Given two vectors x and y, find the least-squares values of alpha and beta
    """
    beta = correlation(x, y) * standard_deviation(y) / standard_deviation(x)
    alpha = mean(y) - beta * mean(x)
    return alpha, beta

In [None]:
X = np.array([1, 2, 3, 4, 5])  # Independent variable (feature)
y = np.array([2, 4, 5, 7, 9])  # Dependent variable (target)

In [None]:
least_squares_fit(X, y)

(0.3000000000000007, 1.6999999999999997)

In [None]:
X_mean = np.mean(X)
y_mean = np.mean(y)

In linear regression the slope $\beta$ is the correlation between $x$ and $y$ multiplied by the standard deviation of $y$ divided by the standard deviation of $x$:


In [None]:
numerator = np.sum((X - X_mean) * (y - y_mean))
denominator = np.sum((X - X_mean) ** 2)
m = numerator / denominator

b = y_mean - m * X_mean

In [None]:
y_pred = m * X + b


In [None]:
print(f"Slope (m): {m}")
print(f"Y-intercept (b): {b}")
print(f"Predicted values (y_pred): {y_pred}")


Slope (m): 1.7
Y-intercept (b): 0.3000000000000007
Predicted values (y_pred): [2.  3.7 5.4 7.1 8.8]
