# Linear Regression - Notes 

## 1. Define Z-Score function -- Scale the data

$$z = \frac{X - \mu}{\sigma}$$

Where:  

- z = standardized score, scaled value  

- X = A single observation of independent variable X  

- $\mu$ = mean of all X values  

- $ \sigma $ = standard deviation of all X values  


In [1]:
from pylab import *
import numpy as np
%matplotlib inline

In [2]:
def scaleData(z):
   mean = z.mean(axis=0)
   stdDev = z.std(axis=0)
   z = (z - mean)/stdDev
   return z, mean, stdDev

In [3]:
# Data: Small example dataset (House price, # bedrooms, # squarefeet)
# This is the sample data which will be used to train the model.

trainData = np.array([[100000,2,1600],
                     [200000,4,2500],
                     [250000,4,3000],
                     [150000,3,2000]])

In [4]:
# First thing we do, is convert the data into z-scores. This is called scaling the data.
trainData, mean, stdDev = scaleData(trainData)

## 2. Split the dataset -- y & X

$$\text{Linear Regression:}$$
$$f(x) = \theta_{0} + \theta_{1}*X_{1} + \theta_{2}*X_{2} + … + \theta_{n}*X_{n}$$

y = f(x) = the dependent variable that we intend to predict, in this case 'house price'  

X = the array of independent variables that we will train on  

$\theta_{0}$ = a constant included in the linear regression formula, this must be added to our dataset and by convention these values are all set to '1'.

In [5]:
# Slice the first column (house prices), and store appropriately as y
y = np.matrix(trainData[:,0])

# By default, y will be row vector above, we want to ensure it is a column vector instead
y = y.T

# Slice the rest of the columns (# bedrooms, # squarefeet), and store appropriatel as X
X = np.matrix(trainData[:,1:])

In [6]:
# m = # of training samples in X ; Store this value 
m = y.size

# Now append a new column, size 'm', of 1's to X. This represents the constant 'theta' as described above. 
constTheta = np.ones(shape=(m, 1))
X = np.append(constTheta,X,1)

In [8]:
# Examine X for the appended theta column. We should see a matrix of (1, z-score # bedrooms, z-score # sq feet)
X

matrix([[ 1.        , -1.50755672, -1.28280871],
        [ 1.        ,  0.90453403,  0.4276029 ],
        [ 1.        ,  0.90453403,  1.37783158],
        [ 1.        , -0.30151134, -0.52262577]])

## 3. Cost Function

The cost function calculates the accuracy of our $\theta$s. Our goal is minimize this function. When our predicted value is very close to the actual value, our model is then considered to be accurate. When this occurs, 'J' becomes very small. 

$$J = \frac{1}{2m} \sum_{i=1}^{m}( \hat{y} - y)^2 $$ 

Where:

- J = cost function value, aka squared error function

- y = actual value in our training set (house price)

- $\hat{y}$ = is the predicted value of y, based on the current $\theta$s

- m = number of training samples


In [11]:
# Define a function to calculate the cost function (J) 
# Note: For optimization, loops are avoided by matrix multiplication
def compute_cost(X, y, theta):
    m = y.size
    y_hat = X.dot(theta)
    J = (1.0/2*m)* (y_hat - y).T.dot((y_hat - y))  
    return J

## Gradient Descent

This is the step of minimizing the cost function 'J', computed above. We use a specific algorithm for this minimization known as 'batch gradient descent'.

The following is repeated, until converged:

$$\theta = \theta-\alpha\frac{\partial}{\partial\theta}J(\theta)$$

Where:

- 