# Linear regression with multiple features/variables
Let's extend the idea of linear regression to work with multiple independent variables (Multivariate Linear Regression).

## Problem context:
We want to sell houses and based on some characteristics you want to know what a good market price would be.
We have gathered information on the recent houses that were sold in the area.

It is your job to predict housing prices based on these features.

The file ex1data2.txt contains a training set of housing prices in Portland, Oregon.

The data is organized as
- 1st column : size of the house (in square feet),
- 2nd column is the number of bedrooms,
- 3rd column is the price of the house.

The only difference with the previous example is that we now have more than one independent variables (but the concepts you have learnt in the previous section applies here as well).

In [42]:
# #################################################
# FILL IN THE NECESSARY CODE.
# Import the necessary packages
# ps: We need to be able to manipulate dataframes and perform matrix calculation
import numpy as np
import pandas as pd

In [43]:
# #################################################
# FILL IN THE NECESSARY CODE.
# Read the data from the file "ex1data2.txt" (available on Canvas) and display the top 5 rows of this data.
data = pd.read_csv('Data/ex1data2.txt', header=None, delimiter=',')
data.head()

Unnamed: 0,0,1,2
0,2104,3,399900
1,1600,3,329900
2,2400,3,369000
3,1416,2,232000
4,3000,4,539900


Fill in the following. Make sure that we have numpy arrays to continue our calculations

In [44]:
# #################################################
# FILL IN THE NECESSARY CODE.
# We need
#       X to hold all the features
#       y to hold the labels
#       m to hold the number of samples

X = data.iloc[:, 0:2].to_numpy()  # read first two columns into X
# X = data.iloc[:, 0:-1].to_numpy()  # read first two columns into X

y = data.iloc[:, -1].to_numpy()  # read the third/last column into y
# X = X.to_numpy() #[:,np.newaxis] # convert to numpy array
# y = y.to_numpy() #[:,np.newaxis] # convert to numpy array
m = len(y)  # no. of training samples


## Feature Normalization
Looking at the data, the house sizes are about 1000 times the number of bedrooms. 
So the house size is orders of magnitude larger, so the best is to "scale the features" or "normalize", 
this will have as benefit that gradient descent converge much more quickly.

We will
  - calculate the mean of each of the features (Xo, X1)
  - subtract the mean value of each feature from the dataset.
  - divide each feature values by their respective standard deviations

In [45]:
# #################################################
# FILL IN THE NECESSARY CODE.
# Normalize X
# Should we normalize y?
# Convert X into the variable X_norm
# Print out both the mean of X and the normalized output 
X_std = np.std(X, axis=0)
X_mean = np.mean(X,axis=0)
print('X_mean: \n', X_mean)
X_norm = (X - X_mean) / X_std
print('X_norm: \n', X_norm)

X_mean: 
 [2000.68085106    3.17021277]
X_norm: 
 [[ 1.31415422e-01 -2.26093368e-01]
 [-5.09640698e-01 -2.26093368e-01]
 [ 5.07908699e-01 -2.26093368e-01]
 [-7.43677059e-01 -1.55439190e+00]
 [ 1.27107075e+00  1.10220517e+00]
 [-1.99450507e-02  1.10220517e+00]
 [-5.93588523e-01 -2.26093368e-01]
 [-7.29685755e-01 -2.26093368e-01]
 [-7.89466782e-01 -2.26093368e-01]
 [-6.44465993e-01 -2.26093368e-01]
 [-7.71822042e-02  1.10220517e+00]
 [-8.65999486e-04 -2.26093368e-01]
 [-1.40779041e-01 -2.26093368e-01]
 [ 3.15099326e+00  2.43050370e+00]
 [-9.31923697e-01 -2.26093368e-01]
 [ 3.80715024e-01  1.10220517e+00]
 [-8.65782986e-01 -1.55439190e+00]
 [-9.72625673e-01 -2.26093368e-01]
 [ 7.73743478e-01  1.10220517e+00]
 [ 1.31050078e+00  1.10220517e+00]
 [-2.97227261e-01 -2.26093368e-01]
 [-1.43322915e-01 -1.55439190e+00]
 [-5.04552951e-01 -2.26093368e-01]
 [-4.91995958e-02  1.10220517e+00]
 [ 2.40309445e+00 -2.26093368e-01]
 [-1.14560907e+00 -2.26093368e-01]
 [-6.90255715e-01 -2.26093368e-01]
 [ 6.

Adding the intercept term and initializing parameters

In [46]:
# #################################################
# FILL IN THE NECESSARY CODE.
# Augment the matrix X so that we can perform a X*theta in 1 go.
# Convert X_norm into the variable X_aug
# So include the possibility to multiply X*theta in 1 go, including theta0 (the intercept).

ones = np.ones((m, 1))
X_aug = np.hstack((ones, X_norm))  # add 1 column for x0 and convert to numpy array

In [47]:
# #################################################
# FILL IN THE NECESSARY CODE.
# Initialize all the hyperparameters of our model
alpha = 0.03
num_iters = 100
theta = np.zeros((3, 1))
y = y[:,np.newaxis] # convert to numpy array

Computing the cost

In [48]:
# #################################################
# FILL IN THE NECESSARY CODE.
# Calculate the MSE.
def computeCostMulti(X, y, theta):
    temp = np.dot(X, theta) - y
    return np.sum(np.power(temp, 2)) / (2 * m)

In [49]:
J = computeCostMulti(X_aug, y, theta)
print(J) 

65591548106.45744


<b>Check your calculation with default initialization</b>

Cost :  65591548106.45744

## Finding the optimal parameters using Gradient Descent

In [50]:
# #################################################
# FILL IN THE NECESSARY CODE.
#    X contains the multivaraite input (all samples)
#    y contain the labels
#    alpha : the learning rate
#    num_iters : number of iterations
#    return parameter (theta) : parameters list of our linear model
def gradientDescentMulti(X, y, theta, alpha, iterations):
    m = len(y)
    for _ in range(iterations):
        temp = np.dot(X, theta) - y
        temp = np.dot(X.T, temp)
        theta = theta - (alpha / m) * temp
    return theta

In [51]:
theta = gradientDescentMulti(X_aug, y, theta, alpha, num_iters)
print(theta)

[[339642.90450832]
 [105377.32631522]
 [ -2514.95084841]]


<pre>
(results with alpha = 0.03)
   
Theta with    100 iterations : [[3.24225184e+05]  [1.16949487e+05]  [5.37867886e+01]]
Theta with    200 iterations : [[3.39642905e+05]  [1.44952956e+05]  [5.98800848e+01]]
Theta with    400 iterations : [[3.40410919e+05]  [1.53263978e+05]  [4.64918053e+01]]
Theta with   1000 iterations : [[3.40412660e+05]  [1.53769414e+05]  [-6.77111420e+00]]
Theta with   5000 iterations : [[340412.65957447] [153769.70114155] [-363.65654995]]
Theta with  10000 iterations : [[340412.65957447] [153769.94033767] [-809.74547603]]
</pre>
We now have the optimized value of theta. Use these values in the above cost function.

In [52]:
J = computeCostMulti(X_aug, y, theta)
print(J)

2050854463.919625


<b>Check your calculation with optimized theta's (learning rate 0.01)</b>


<pre>
Cost with    100 iterations :   11987463482.983088
Cost with    200 iterations :   3937401740.3302574
Cost with    400 iterations :   2202721589.853654
Cost with   1000 iterations :   2058558366.0358064
Cost with   5000 iterations :   2058132543.3909454
Cost with  10000 iterations :   2058132101.1394727
</pre>

## Inference step
What would be the price for a 2480 square foot, 3 bedroom house?

How confident are you about this prediction?

In [54]:
# #################################################
# FILL IN THE NECESSARY CODE.
# There is a reasanable amount of information available for houses this size. Confidence is high. 
# So confidence level for this one will be lower.

X_pred = [2480, 3]
X_pred_norm = (X_pred - X_mean) / X_std
X_pred_aug= np.hstack((1, X_pred_norm))
y_pred = np.dot(X_pred_aug, theta)
print (y_pred)

[404456.24236419]


What would be the price for a 5500 square foot, 6 bedroom house?

How confident are you about this prediction?


In [55]:
# #################################################
# FILL IN THE NECESSARY CODE.
# There is only a limited amount of information available for house this size, and with 6 bedrooms. 
# So confidence level for this one will be lower.
X_pred = [5500, 6]
X_pred_norm = (X_pred - X_mean) / X_std
X_pred_aug= np.hstack((1, X_pred_norm))
y_pred = np.dot(X_pred_aug, theta)
print (y_pred)

[799214.97202752]
