# Introduction 
This is a summary(or maybe a cheatsheet) from week 2

**we will start by importing some libraries**

In [1]:
# used for manipulating directory paths
import os

# Scientific and vector computation for python
import numpy as np

# Plotting library
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D  # needed to plot 3-D surfaces

# Linear regression with mutiple variables 

## Loading data with mutiple features 

In [8]:
data = np.loadtxt(os.path.join('Data', 'ex1data2.txt'), delimiter=',')

We will then put the features in X(which are in this case: the size of the house and the number of bedrooms)

In [9]:
X = data[:, :2]

and then the expected outcome y (which is in this case : the price of the house in y)

In [10]:
y = data[:, 2]

We will see now what the data looks like 

In [11]:
# m represents the number of training examples 
m = y.size

# print out some data points
print('{:>8s}{:>8s}{:>10s}'.format('X[:,0]', 'X[:, 1]', 'y'))
print('-'*26)
for i in range(10):
    print('{:8.0f}{:8.0f}{:10.0f}'.format(X[i, 0], X[i, 1], y[i]))

  X[:,0] X[:, 1]         y
--------------------------
    2104       3    399900
    1600       3    329900
    2400       3    369000
    1416       2    232000
    3000       4    539900
    1985       4    299900
    1534       3    314900
    1427       3    198999
    1380       3    212000
    1494       3    242500


## feature normalization 

### purpose 
normalizing the features will make, the J (cost function) converge rapidly 

### normalizing features 
the following function that we will normalize will have as an input : 
- `X` : which is the features dataset

and as an output : 

- `X_norm` : the normalized version of the features 
- `mu` : which is an array of the average of each feature 
- `sigma` : also an array of the standard deviation of each feature 

In [19]:
def  featureNormalize(X):
    # we will start by intializing the array and matrices 
    
    X_norm = X.copy()
    mu = np.zeros(X.shape[1])
    sigma = np.zeros(X.shape[1])
    
    # for each feature we will calculte the average and standard deviation 
    # then we will normalize that feature 
    for curr_feat in range(X.shape[1]): 
        
        mu[curr_feat] = np.mean(X[:,curr_feat])
        
        sigma[curr_feat] = np.std(X[:,curr_feat])
        
        X_norm[:, curr_feat] = (X_norm[:, curr_feat] - mu[curr_feat])/ sigma[curr_feat]
        
    return X_norm, mu, sigma

## outcome 

In [20]:
X_norm, mu, sigma = featureNormalize(X)

print('Computed mean:', mu)
print('Computed standard deviation:', sigma)

Computed mean: [2000.68085106    3.17021277]
Computed standard deviation: [7.86202619e+02 7.52842809e-01]


After the `featureNormalize` function is tested, we now add the intercept term to `X_norm`, more explicitly we will add the column of ones that will make us able to multiply `X` by `theta` without leaving behind the `theta_0` 

In [21]:
X = np.concatenate([np.ones((m, 1)), X_norm], axis=1)

## Gradient descent 
<div class="alert alert-block alert-warning">
**Implementation Note:** In the multivariate case, the cost function can
also be written in the following vectorized form:

$$ J(\theta) = \frac{1}{2m}(X\theta - \vec{y})^T(X\theta - \vec{y}) $$

where 

$$ X = \begin{pmatrix}
          - (x^{(1)})^T - \\
          - (x^{(2)})^T - \\
          \vdots \\
          - (x^{(m)})^T - \\ \\
        \end{pmatrix} \qquad \mathbf{y} = \begin{bmatrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \\\end{bmatrix}$$
</div>

### Computing the cost function J 
for this we will implement the function `computeCostMulti`

In [23]:
def computeCostMulti(X, y, theta):
    m = y.shape[0] # number of training examples
    
    J = 0
    
    # calculating the hypothesis the outcome from the hypothesis function 
    h = X @ theta 
    
    J = (((h - y).T)@((h - y)))/ (2*m)
    
    return J
    

## Implementing Gradient descent 

The objective of linear regression is to minimize the cost function

$$ J(\theta) = \frac{1}{2m} \sum_{i=1}^m \left( h_{\theta}(x^{(i)}) - y^{(i)}\right)^2$$

where the hypothesis $h_\theta(x)$ 
$$ h_\theta(x) = \theta^Tx$$

Recall that the parameters of your model are the $\theta_j$ values. These are
the values you will adjust to minimize cost $J(\theta)$. One way to do this is to
use the batch gradient descent algorithm. In batch gradient descent, each
iteration performs the update

$$ \theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m \left( h_\theta(x^{(i)}) - y^{(i)}\right)x_j^{(i)} \qquad \text{simultaneously update } \theta_j \text{ for all } j$$

With each step of gradient descent, your parameters $\theta_j$ come closer to the optimal values that will achieve the lowest cost J($\theta$).

In [26]:
def gradientDescentMulti(X, y, theta, alpha, num_iters):

    m = y.shape[0]  # number of training examples
    
    theta_copy = theta.copy()
    
    n_plus = X.shape[1]
    
    J_history = []
    
    for i in range(num_iters):
        h = X @ theta - y
        for cur_feat in range(n_plus): 
            theta_copy[cur_feat] = theta[cur_feat] - (alpha/m)*np.dot(h, X[:,cur_feat]) 
        theta = theta_copy.copy()   
        J_history.append(computeCost(X, y, theta))
    
    return theta, J_history

### Testing 

In [27]:
alpha = 0.1
num_iters = 400

# init theta and run gradient descent
theta = np.zeros(3)
theta, J_history = gradientDescentMulti(X, y, theta, alpha, num_iters)

print('theta computed from gradient descent: {:s}'.format(str(theta)))
normalized_x = [1, ((1650-mu[0])/sigma[0]), ((3-mu[1])/sigma[1])]
price = np.dot(normalized_x, theta) 

print('Predicted price of a 1650 sq-ft, 3 br house (using gradient descent): ${:.0f}'.format(price))

NameError: name 'computeCost' is not defined