# Linear Regression

## Model Representation

A linear model makes a "hypothesis" about the true nature of the underlying function - that it is linear. We express this hypothesis in the univariate case as 
$h_\theta$(x)=ax+b

Our simple example above was an example of "univariate regression" - i.e. just one variable (or "feature") - number of hours studied. Below we will have more than one feature ("multivariate regression") which is given by 
$h_\theta$(x)=$a^TX$ 

Here  a  is a vector of learned parameters, and  X  is the "design matrix" with all the data points. In this formulation the intercept term has been added to the design matrix as the first column (of all ones).

<img src="../lr/1.png" />

<img src="../lr/2.png" />

To establish notation for future use, we’ll use $x^i$ to denote the “input” variables (living area in this example), also called input features, and $y^i$ to denote the “output” or target variable that we are trying to predict (price). A pair ($x^i$ , $y^i$) is called a training example, and the dataset that we’ll be using to learn—a list of m training examples ($x^i$ , $y^i$ ); i = 1, . . . , m —is called a training set. Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = ℝ.

## Cost Function

<img src="../lr/4.png" />

We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.


<img src="../lr/8.png" />

To break it apart, it is 1/2 x where x is the mean of the squares of $h_\theta(x_i) - y_i$, or the difference between the predicted value and the actual value.

This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved (1/2) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 1/2 term. 

<img src="../lr/5.png" />

# Multivariate Linear Regression

Linear regression with multiple variables is also known as "multivariate linear regression".

We now introduce notation for equations where we can have any number of input variables.

<img src="../lr/11.png" />

The multivariable form of the hypothesis function accommodating these multiple features is as follows:

<img src="../lr/9.png" />

In order to develop intuition about this function, we can think about $\theta_0$ as the basic price of a house, $\theta_1$ as the price per square meter, $\theta_2$ as the price per floor, etc. $x_1$ will be the number of square meters in the house, $x_2$ the number of floors, etc.

# Gradient Descent for Multiple Variables

The gradient descent equation itself is generally the same form; we just have to repeat it for our 'n' features:

<img src="../lr/12.png" />

In other words: 

<img src="../lr/13.png" />