# Understanding Deep Learning  
## Chapter 2 - Supervised Learning
---

##### Supervised Learning Model

A **supervised learning** model defines a mapping from one or more inputs to one or more outputs.

##### Example

- **Input**: `car_year`, `car_mileage` - These variables describe the year and mileage of the car.
- **Output**: `car_value` - This is the predicted value of the car.

`car_mileage` and `car_year` are variables used in an equation to determine the `car_value`.

##### Inference

Inference is the process by which the model computes an output based on the input. The model equation represents a family of potential relationships between inputs and outputs, where the **parameters** define the specific relationship.

##### Parameters

Parameters are a formal representation of latent variables, adjusted during the model's training to optimize accuracy based on the received input data.

##### Parameters Example

Consider a model designed to identify landscapes in pictures. It uses the RGB color spectrum as a parameter; grass is typically green, and the model uses this knowledge as part of its decision-making process.

##### Training

Training a model involves discovering the parameters that best describe the actual relationship between inputs and outputs. The training process involves an algorithm iterating over a set of input/output pairs, adjusting the parameters to minimize the difference between the predicted and actual outputs.


### 2.1 Overview of Supervised Learning

Consider the task of predicting the price of a Toyota Prius based on its year and mileage:
- **Input ($x$)**: A numerical vector describing the year and mileage.
- **Output ($y$)**: A numerical prediction of the vehicle's value. In some cases, like this one, the output may be a single number.
- **Data Structure**: The data is structured, with each vector having a consistent order and number of elements, which is crucial for supervised learning models.

Our supervised learning model takes the form of an equation where $f$ denotes the model, often referred to as a function in mathematics:

$$ y = f(x) $$

This means that $y$ (the predicted price of the Prius) is a function of $x$ (the year and mileage).

**Inference** is the step where the model computes $f$ with an input of $x$ to get $y$.

When we incorporate model parameters, denoted by $\phi$ (Greek letter phi), our model's equation becomes:

$$ y = f(x, \phi) $$

Here, $y$ is the output generated by transforming the input $x$ using the model's internal parameters $\phi$.

**Training the Model**: The model learns or 'trains' to find the best parameters ($\phi$) that produce accurate outputs from the inputs. This is done using a training dataset of $I$ input-output example pairs $\{(x_i, y_i)\}$. $I$ represents the number of examples in the training set. For instance, if $I=10$, then $(x_5, y_5)$ represents the fifth pair in our dataset.

**Loss Function (L)**: This scalar value measures the model's prediction error. It quantifies how well the model predicts the training outputs from their inputs given the parameters $\phi$.

$$ \hat{\phi} = arg\min_{\phi} L(\phi) $$

- The hat symbol ($\hat{}$) over $\phi$ indicates the estimated parameter values that minimize $L(\phi)$.
- The $argmin$ operator identifies the value of $\phi$ that results in the lowest loss.
- $L(\phi)$, the loss function, reflects the cost or error associated with a specific choice of $\phi$.
- Square brackets are used to encapsulate function arguments, distinguishing them from other mathematical operations.

A lower loss indicates a more accurate model. After training, the model is evaluated on its ability to **generalize** to new, unseen data.


### 2.2 Linear regression example

#### 2.2.1 1D linear regression model
A 1D linear regression model describes the relationship between input $x$ and output $y$ as a straight line: $y = f[x,\phi]$

<center> Which mathematically translates to... </center>

$$
y = \phi_0+\phi_1x
$$

The formula structure for a straight line is: $y=mx+b$
- $y$ stands for the vertical coordinate or the dependent variable.
- $m$ is the slope of the line, which tells you how steep the line is.
- $x$ represents the horizontal coordinate or the independent variable.
- $b$ is the y-intercept, which is value of $y$ when $x$ equals 0.

<center>With phi as our parameters, the model's equation becomes...</center>

$$
y=\phi_0x+\phi_1
$$

Our parameters are the $y$ intercept and the slope of the line. Our input is the $x$ or horizontal coordinate. If we change the parameters, we get different lines:
<div>
    <center><img src="../images/linear-parameters.png" width="300"/></center>
</div>
The objective is for the parameters to form a line that closely lines up with the data if its coordinates were to be plotted all over the graph.

#### 2.2.2 Loss
**Loss** - The numerical value assigned to each choice of parameters that quantifies the degree of mismatch between the model and the data.  

The *lower* the loss, the *better* the line fits the data, the *better* the model is at predicting good outputs.

This is the formula used to calculate loss for our linear regression model:

$L[\phi] = \sum_{i=1}^{I} (f[x_i; \phi] - y_i)^2$  
- $L[\phi]$ means the loss function of the model parameters.
- $\sum_{i=1}^{I}$ means we have to sum whatever comes next starting at $i=1$ and ending at $I$ which is the last set of parameters in the overall set.
- $(f[x_i; \phi] - y_i)^2$ is a function of inputs and parameters minus the correct output and then the whole thing is squared.
    - Simplifies to the predicted output minus the correct output and then the whole thing is squared.

So the sum of these differences squared is the loss of the model.

Example




In [1]:
import matplotlib.pyplot as plt
