# Training Models
* Generally, training a model means setting its parameters so that the model best fits the training set.

# A. Linear Regression:

* Generally, a linear model makes a prediction by simply computing a weighted sum of the input features, plus a constant called the **bias** term (also called the **intercept** term):

    * $\hat{y}\ =\ \theta_0\ +\ \theta_1x_1\ +\ ...\ +\ \theta_nx_n$ where:
        * $\hat{y}$ = predicted value.
        * $n$ = number of features.
        * $x_i$ = $i^{th}$ feature value.
        * $\theta_j$ = $j^{th}$ model parameter, including the bias term $\theta_0$ and the feature weights $\theta_1,\theta_2,...,\theta_n$.

    * In vectorized form: $\hat{y}\ =\ h_\theta(x)\ =\ \theta.x$, where:
        * $h_\theta$ = hypothesis function, using the model parameter $\theta$.
        * $\theta$ = model's parameter vector, containing the bias term $\theta_0$ and feature weights $\theta_1$ to $\theta_n$.
        * $x$ is the instance's feature vector, containing $x_0$ to $x_n$, with $x_0$ always equal to 1.
        * $\theta.x$ = dot product of the vectors $\theta$ and $x$, which is equal to $\theta_0x_0 +\theta_1x_1 + \theta_2x_2 +...+ \theta_nx_n$

* In machine learning, vectors are often represented as column vectors, which are 2D arrays with a single column.
    * If $\theta$ and $x$ are column vectors, then the prediction is $\hat{y}=\theta^Tx$ where $\theta^T$ is the transpose of $\theta$ (row vector instead of a column vector) and $\theta^Tx$ is the matrix multiplication of $\theta^T$ and $x$

* How to train a regression model?
    * Measure how well (or poorly) the model fits the training data. The most common performance measure of a regression model is the root mean square error (RMSE).
    * To train a linear regression model, find the value of $\theta$ that minimizes the RMSE.
        * In practice, it is simpler to minimize the mean squared error (MSE) than the RMSE, and it leads to the same result (because the value that minimizes a positive function also minimizes its square root).
    * The MSE of a linear regression hypothesis $h_\theta$ on a training set $X$ is calculated using:
        * $MSE(X,h_\theta)\ =\ \frac{1}{m}\sum_{i=1}^{m}(\theta^Tx^i\ -\ y^i)^2$