# Chapter 14: Simple Linear Regression



Goal: given a set of features $X$ and outputs $Y$ we want find parameters $\alpha$ and $\beta$ such that $y_i = \beta x_i + \alpha + \varepsilon_i$. We will find these parameters by minimizing the sum of squared errors defined below:

$SQE = \displaystyle \sum^n_i (y_i - \hat{y_i})^2$

where $\hat{y_i}$ is the predicted value.

We will use Graident Descent to find these parameters. For simplicity we will write $\theta = [\alpha, \beta]$

In [1]:
# Import NotebookLoader
%run -i AddNBL.py
import Chapter8 as Ch8

def predict(alpha, beta, x_i):
    return beta * x_i + alpha

def error(alpha, beta, x_i, y_i):
    """ error from predicting beta * x_i + alpha when actual is y_i """
    return y_i - predict(alpha, beta, x_i)

def sum_of_squared_error(alpha, beta, x, y):
    return sum(error(alpha, beta, x_i, y_i) ** 2 for x_i, y_i in zip(x, y))

def squared_error(x_i, y_i, theta):
    alpha, beta = theta
    return error(alpha, beta, x_i, y_i) ** 2

def squared_error_gradient(x_i, y_i, theta):
    alpha, beta = theta
    return [-2 * error(alpha, beta, x_i, y_i),       # alpha partial derivative
            -2 * error(alpha, beta, x_i, y_i) * x_i] # beta partial derivative

importing Jupyter notebook from Chapter8.ipynb
importing Jupyter notebook from Chapter4.ipynb


### Run on some sample data

Generate data according to $y = 2x_1 + 4$

In [2]:
squared_error(2, 2, [1, 2])

9

In [3]:
if __name__ == "__main__":
    import random
    random.seed(0)
    truth_alpha, truth_beta = [4, 2]
    
    x_truth = [10*random.random() for _ in range(10)]
    y_truth = [predict(truth_alpha, truth_beta, x_i) for x_i in x_truth]
    
    
    # choose random value to start
    theta = [random.random(), random.random()]
    
    alpha, beta = Ch8.minimize_stochastic(squared_error, 
                                          squared_error_gradient,
                                          x_truth,
                                          y_truth,
                                          theta, 
                                          .001
                                          )
    print("Truth alpha is {0}, solved is {1}".format(truth_alpha, alpha) +
          "\nTruth beta is {0}, solved is {1}".format(truth_beta, beta))

Truth alpha is 4, solved is 3.9999999999997886
Truth beta is 2, solved is 2.000000000000036
