## 3 Linear regression with multiple variables

In this part, you will implement linear regression with multiple variables to predict the prices of houses. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

The file ``ex1data2.txt`` contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

The ``ex1_multi.m`` script has been set up to help you step through this exercise.

In [1]:
%% Load Data
data = load('/home/roger/Workspace/MOOC/Coursera/MLearning/Week 2/Assignment/ex1/ex1data2.txt');
X = data(:, 1:2);
y = data(:, 3);
m = length(y);

### 3.1 Feature Normalization

The ``ex1_multi.m`` script will start by loading and displaying some values from this dataset. By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent converge much more quickly.

Your task here is to complete the code in ``featureNormalize.m`` to
  * Subtract the mean value of each feature from the dataset.
  * After subtracting the mean, additionally scale (divide) the feature values by their respective "standard deviations."

In [73]:
function [X_norm, mu, sigma] = featureNormalize(X)
    % Instructions: First, for each feature dimension, compute the mean
    %               of the feature and subtract it from the dataset,
    %               storing the mean value in mu. Next, compute the 
    %               standard deviation of each feature and divide
    %               each feature by it's standard deviation, storing
    %               the standard deviation in sigma. 
    %
    %               Note that X is a matrix where each column is a 
    %               feature and each row is an example. You need 
    %               to perform the normalization separately for 
    %               each feature. 
    %
    % Hint: You might find the 'mean' and 'std' functions useful.
    X_norm = X;
    mu = mean(X);
    sigma = std(X);

    for i = 1:size(X, 2)
        X_norm(:,i) = X(:,i) - mu(i);
        X_norm(:,i) = X_norm(:,i) / sigma(i);
    end
end
[X mu sigma] = featureNormalize(X);

## 3.2 Gradient Descent

Previously, you implemented gradient descent on a univariate regression problem. The only difference now is that there is one more feature in the matrix X. The hypothesis function and the batch gradient descent update rule remain unchanged.

You should complete the code in ``computeCostMulti.m`` and ``gradientDescentMulti.m`` to implement the cost function and gradient descent for linear regression with multiple variables. If your code in the previous part (single variable) already supports multiple variables, you can use it here too.

Make sure your code supports any number of features and is well-vectorized. You can use 'size(X, 2)' to find out how many features are present in the dataset.

In [85]:
function J = computeCostMulti(X, y, theta)
    %COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
    %   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
    %   parameter for linear regression to fit the data points in X and y

    % Initialize some useful values
    m = length(y); % number of training examples

    % You need to return the following variables correctly 
    J = 0;

    m = size(X, 1);         % number of training examples
    predictions = X*theta;  % predictions of hypothesis on all m examples
    sqrErrors = (predictions-y).^2; %squared errors

    J = 1/(2*m) * sum(sqrErrors);
end

computeCostMulti(X, y, theta)

ans =    6.5592e+10


Implementation Note: In the multivariate case, the cost function can also be written in the following vectorized form:

$$J(\theta) = \frac{1}{2m}(X\theta-y)^T(X\theta-y)$$

The vectorized version is efficient when you’re working with numerical computing tools like Octave/MATLAB. If you are an expert with matrix operations, you can prove to yourself that the two forms are equivalent.

In [83]:
function J = computeCostMulti(X, y, theta)
    %COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
    %   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
    %   parameter for linear regression to fit the data points in X and y

    % Initialize some useful values
    m = length(y); % number of training examples

    % You need to return the following variables correctly 
    J = 0;

    m = size(X, 1);                                     % number of training examples
    sqrErrors = ((X * theta) - y)' * ((X * theta) - y); %squared errors
    J = 1/(2*m) * sqrErrors;
end

computeCostMulti(X, y, theta)

ans =    6.5592e+10


In [None]:
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
    %GRADIENTDESCENTMULTI Performs gradient descent to learn theta
    %   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
    %   taking num_iters gradient steps with learning rate alpha

    % Initialize some useful values
    m = length(y); % number of training examples
    J_history = zeros(num_iters, 1);

    for iter = 1:num_iters
        predictions = X * theta;       % predictions of hypothesis on all m examples
        errors = (predictions - y);    %errors
        delta = 1/(m) * X' * errors;
        theta = theta - alpha * delta;

        % Save the cost J in every iteration    
        J_history(iter) = computeCostMulti(X, y, theta);
    end
end

## 3.3 Normal Equations

In the lecture videos, you learned that the closed-form solution to linear regression is:

$$\theta = (X^TX)^{-1}X^Ty$$

Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no ''loop until convergence'' like in gradient descent. Complete the code in `normalEqn.m` to use the formula above to calculate $\theta$. Remember that while you don't need to scale your features, we still
need to add a column of 1's to the X matrix to have an intercept term ($\theta_0$).

The code in `ex1.m` will add the column of 1's to X for you.

In [2]:
function [theta] = normalEqn(X, y)
    %NORMALEQN Computes the closed-form solution to linear regression 
    %   NORMALEQN(X,y) computes the closed-form solution to linear 
    %   regression using the normal equations.
    theta = zeros(size(X, 2), 1);
    theta = pinv(X'*X)*X'*y;
end

theta = normalEqn(X, y)

theta =

   1.4086e+02
   1.6978e+04

