This notebook is based on Andrew Ng's machine learning course on coursera

## 1. Linear regression with one variable

Import packages

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Import the dataset. Warning: Please put the 'ex1data1.txt' in the same file location as this jupyter notebook or you'd better specify your data file location.


In [None]:
path =  'ex1data1.txt' #specify your data file location
data = pd.read_csv(path, header=None, names=['Population', 'Profit'])
data.head()  #data overview

In [None]:
data.describe()

scatter plot

In [None]:
data.plot(kind='scatter', x='Population', y='Profit', figsize=(12,8))
plt.show()

Now let's use the gradient descent to achieve the linear regression model by minimizing the cost funcion $J(\theta)$

First, let's build the cost function with parameter $\theta$:
$$J\left( \theta  \right)=\frac{1}{2m}\sum\limits_{i=1}^{m}{{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}^{2}}}$$
where \\[{{h}_{\theta }}\left( x \right)={{\theta }^{T}}X={{\theta }_{0}}{{x}_{0}}+{{\theta }_{1}}{{x}_{1}}+{{\theta }_{2}}{{x}_{2}}+...+{{\theta }_{n}}{{x}_{n}}\\] 

In [None]:
def computeCost(X, y, theta):
    # your code here  (appro ~ 2 lines)
    #theta: 1*2
    #X: m*2 where m is the number of data points
    #y: m*1
    #cost: value

    
    return None

Add a column with all ones for matrix computation.

In [None]:
data.insert(0, 'Ones', 1)

Preprocess the data to get training data and target variable

In [None]:
# set X (training data) and y (target variable)
cols = data.shape[1]
X = data.iloc[:,0:cols-1]
y = data.iloc[:,cols-1:cols]

In [None]:
X.head()

In [None]:
y.head()

Transform X and y into matrix and Initialize theta.

In [None]:
X = np.matrix(X.values)
y = np.matrix(y.values)
# your code here  (appro ~ 1 lines)
# theta is an 1*2 matrix
theta = None

Make sure that the shapes are correct

In [None]:
X.shape, theta.shape, y.shape

Test the cost function given theta are all zeros.

In [None]:
computeCost(X, y, theta)

# 2.batch gradient decent
$${{\theta }_{j}}:={{\theta }_{j}}-\alpha \frac{\partial }{\partial {{\theta }_{j}}}J\left( \theta  \right)$$
where,
$$\frac{\partial{J}}{\partial{\theta}} = \frac{1}{m} \sum_{i=1}^{m} (h_{\theta}(x^{(i)}) - y^{(i)})*x^{(i)}$$ 


In [None]:
def gradientDescent(X, y, theta, alpha, iters):
    parameters = int(theta.shape[1])
    cost = np.zeros(iters)
    
    for i in range(iters):
        # your code here  (appro ~ 1 lines)
   
        
        for j in range(parameters):
            # your code here  (appro ~ 2 lines)

            
        # your code here  (appro ~ 1 lines)    

        
    return theta, cost

Initialize the learning rate 'alpha' and the number of iterations 'iters'

In [None]:
alpha = 0.01
iters = 1000

Now let's use the gradient descent to get the parameter $\theta$ bases on the training data


In [None]:
g, cost = gradientDescent(X, y, theta, alpha, iters)
g

Now we can calculate the cost based on the estimated $\hat{\theta}$

In [None]:
computeCost(X, y, g)

Let's plot and see how how model fits the data

In [None]:
x = np.linspace(data.Population.min(), data.Population.max(), 100)
f = g[0, 0] + (g[0, 1] * x)

fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()

Let's plot and see the cost of each iteration during the training process

In [None]:
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters), cost, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

## 3. Linear regression with multiple variables

In [None]:
path =  'ex1data2.txt'
data2 = pd.read_csv(path, header=None, names=['Size', 'Bedrooms', 'Price'])
data2.head()

Preprocessing: feature normalization

In [None]:
data2 = (data2 - data2.mean()) / data2.std()
data2.head()

Let's repeat the first and second part and train a new linear regression model based on the new dataset

In [None]:
# add ones column
data2.insert(0, 'Ones', 1)

# set X (training data) and y (target variable)
cols = data2.shape[1]
X2 = data2.iloc[:,0:cols-1]
y2 = data2.iloc[:,cols-1:cols]

# convert to matrices and initialize theta
X2 = np.matrix(X2.values)
y2 = np.matrix(y2.values)
theta2 = np.matrix(np.array([0.0,0.0,0.0]))

# perform linear regression on the data set
g2, cost2 = gradientDescent(X2, y2, theta2, alpha, iters)

# get the cost (error) of the model
computeCost(X2, y2, g2)

Let's plot and see how the cost changes during the gradient descent training process

In [None]:
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters), cost2, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

# 4. normal equation

Except for gradient descent, we can also calculate $\theta$ by solving the equation $\frac{\partial }{\partial {{\theta }_{j}}}J\left( {{\theta }_{j}} \right)=0$. 

Let's assume our feature matrix is X (including $x_{0}=1$) and our target variable vector is y. Therefore, $\theta ={{\left( {{X}^{T}}X \right)}^{-1}}{{X}^{T}}y$

Given that the time complexity of inverse computation is $O(n3)$, the normal equation is not very computational efficient when there is a large dataset especially when $n>10000$. Also, normal equation can only be applied to the linear regression model.


In [None]:
# Normal equation
def normalEqn(X, y):
    # your code here  (appro ~ 1 lines)

    return theta

In [None]:
final_theta2=normalEqn(X, y)
final_theta2

In [None]:
#while the result of the gradien descent is: matrix([[-3.24140214,  1.1272942 ]])