# Regression Analyses

Regression is the modelling a *target parameter* based on *invidual predictors*.

### Simple Linear Regression:
Simple linear regression &mdash; where there is 1 independent predictor variable and is linearly related to the dependent target variable.

<img src="images/linear-regression.png" width="50%">


We can approximate the relationship between the predictor and the target with a line of best fit with equation: $y=a_0 + a_1x$.

The linear regression algorithm's aim is to find the values $a_0$ and $a_1$ to get this line of best fit. It does this by minimising a loss/cost function such as the *mean squared error (MSE)* function: $L = \frac{1}{n} \sum_{i=1}^{n} (\texttt{prediction}_i - y_i)^2$, where each $i$  in $1 \leq i \leq n$ represents a single datapoint and $\texttt{prediction}_i$ is the corresponding prediction at the same x-value.

We can use *gradient descent* to force $a_0$ and $a_1$ to converge on values that will minimise $L=\frac{1}{n}\sum_{i=1}^{n} \big( a_0 + a_1 x_i - y_i \big)^2$.

Computing the partial derivatives with respect to $a_0$ and $a_1$:

$$\frac{\partial L}{\partial a_0}=\frac{2}{n} \sum_{i=1}^{n} \big( a_0 + a_1 x_i - y_i \big),$$

$$\frac{\partial L}{\partial a_1}=\frac{2}{n} \sum_{i=1}^{n} \big( a_0 + a_1 x_i - y_i \big)\cdot x_i.$$

A derivative like $\frac{\partial L}{\partial a_0}$ summarises how much $L$ changes when you 'nudge' the value of $a_0$ by an infinitesimally small amount.

Now, to apply gradient descent, we repeatedly apply the following updates for each prediction we make:
$$a_0 := a_0 - \eta \frac{\partial L}{\partial a_0},$$
$$a_1 := a_1 - \eta \frac{\partial L}{\partial a_1}.$$

Repeatedly updating the weights with gradient descent will force $a_0$ and $a_1$ to converge on values to minimise $L$.

<hr />


### Logistic Regression:
<img src="images/logistic-vs-linear-regression.png" width="50%">

The logistic regression algorithm involves passing the weighted sum (plus bias) through the logistic sigmoid activation function, mapping the weighted sum to a number between 0 and 1 &mdash; modelling a probability.


The following is a computational graph of the l

Forward propagation: calculating the weighted sum, plus bias, passing it through the logistic sigmoid function, then calculating the error with cross-entropy.

Backward propagation: computing the derivative of the loss function $\frac{\partial L}{\partial z}$ and the activation function $\frac{\partial z}{\partial s}$, then using that to proportionally attribute error to each weight $w_1$, $w_2$ and bias $b$. 

<img src="images/logistic-regression-backprop-diagram.png">

### Resources:
- <a href="https://towardsdatascience.com/introduction-to-machine-learning-algorithms-linear-regression-14c4e325882a">Linear regression</a>