# Linear and Logistic Regression, AWS Machine Learning: Data Scientist

## Linear Regression via Least Squares Minimization

### Types of Machine Learning

* **Supervised**. Learns a function that approximates the relationship between input and output data
* **Unsupervised**. Finds structure in data without using explicit labels.
    * **Classification**. Predicts discrete labels/categories. Example, Logistic Regression
    * **Regression**. Predicts continuous values. Example, Ordinary Least Squares

### Introduction to Linear Models

* **What is it?**. In linear modeling, the relationship between each individual input variables and the output is a straight line. Slopes of such lines become the coefficients of the linear equation.
* **An example of a linear notation**. $y_i = a_0 + a_1x_1 + a_2x_2 + ...$.
* **Why use linear models?**.     
    * Interpretable
    * Low complexity
    * Scalable
    * Good baseline

### Definition of 1-Dimension Ordinary Least Squares (OLS)

* $i$ - an observation. Assume $i = 1, 2, ..., N$.
* $x$ - independent variable, feature
* $y$ - dependent variable, output 

The linear relationship looks like this

\begin{equation*}
y = ax + b
\end{equation*}

where $a, b$ are constants.  

The OLS problem is to solve for $a$ and $b$ in order to achieve $y_i = ax_i + b + \epsilon$, where $\epsilon$ is the noise (there is always a noise/error: no matter how well we find $a$ and $b$, the indipendent variable never fully explains the dependent variable). We usually skip $\epsilon$ and instead write

\begin{equation*}
\hat{y}_i = ax_i + b
\end{equation*}

where  $\hat{y}_i$ are our estimates of $y_i$.

We solve for $a$ and $b$ by defining a loss function. A loss function is used whenever we are trying to optimize a given function. Let's define a loss function for linear regression.

\begin{equation*}
L = \sum_{i=1}^{N}(y_i - \hat{y_i})^2
\end{equation*}

### Solution of Ordinary Least Squares (OLS)

Find the $L$ minimum by equating the first derivatives of $L$ to zero, and solving for the values of $a$ and $b$ for which $L$ is minimum.

\begin{equation*}
L = \sum_{i=1}^{N}(y_i - \hat{y_i})^2 = \sum_{i=1}^{N}(y_i - ax_i - b)^2
\end{equation*}

Set $\frac{dL}{da} = 0$ and $\frac{dL}{db} = 0$

Let's do it

\begin{align*}
&\frac{dL}{db} = \sum_{i=1}^{N} 2(y_i - ax_i - b)(-1) = \sum_{i=1}^{N} (y_i - ax_i - b) = 0 \\
&\sum_{i=1}^{N} y_i- a\sum_{i=1}^{N}x_i - bN = 0 \\
& \boxed{b = \frac{1}{N} \sum_{i=1}^{N} y_i - \frac{a}{N} \sum_{i=1}^{N}x_i}
\end{align*}


\begin{align*}
& \frac{dL}{da} = \sum_{i=1}^{N} 2(y_i - ax_i - b)(-x_i) = \sum_{i=1}^{N} (y_i - ax_i - b)(x_i) = 0\\
& \sum_{i=1}^{N} x_iy_i - a\sum_{i=1}^{N}x_i^2 - b\sum_{i=1}^{N}x_i = 0 \\
& \sum_{i=1}^{N} x_iy_i = a\sum_{i=1}^{N}x_i^2 + \big( \frac{1}{N} \sum_{i=1}^{N} y_i - \frac{a}{N} \sum_{i=1}^{N}x_i  \big) \sum_{i=1}^{N}x_i \\
& \boxed{a = \frac{\sum_{i} x_iy_i - \frac{1}{N}(\sum_{i}x_i)(\sum_{i}y_i)}{\sum_{i}x_i^2 - \frac{1}{N}(\sum_{i}x_i)^2}} 
\end{align*}

So the exact solutions to the minimization problem are

\begin{align*}
a &= \frac{\sum_{i} x_iy_i - \frac{1}{N}(\sum_{i}x_i)(\sum_{i}y_i)}{\sum_{i}x_i^2 - \frac{1}{N}(\sum_{i}x_i)^2} \qquad \big[= \frac{Cov(X,Y)}{Var(X)} \big] \\
b &= \frac{1}{N}\sum_{i}y_i - \frac{a}{N}\sum_{i}x_i \qquad[ = E[Y] - aE[X]]
\end{align*}

### Interpretation

* Interpreting parameters in an OLS equation
* Limitations of using OLS through Anscombe’s Quartet
* Multivariate OLS models and an example problem
* Using matrix foundation to solve the OLS problem
* The pros and cons of using OLS regression