# 1. Introduction
In the past we have looked at both logistic regression and binary classification. There, we would collect some data, and then try to predict 1 of 2 possible labels. For example, if we were dealing with an e-commerce stie, we could collect _**time spent on site**_ and _**number of pages viewed**_, and then try to predict whether someone is going to buy something on the site. 

In this case, we only have 2 dimensions. We will plot the information, and then try to use a straight line to classify the classes (buy or not buy):

$$\sigma \big( w_1*(\text{time spent on site}) + w_2 (\text{number pages viewed})\big)$$

If we are able to find a line that goes between the classes, they are _linearly seperable_. When we are dealing with data that is linearly seperable, logistic regression is fine, since it is a linear classifier. So, in 2 dimensions linearly seperable data can be separated via a line, in 3 dimensions a plane, and and 4+ dimensions a hyperplane. The point is, no matter how many dimensions we are dealing with, our decision boundary is going to be straight, not curved. 

## 1.1 Neural Networks Add Non-linearity
Now, as we get into the realm of Neural Networks, things begin to change. We can have non-linearly seperable variables, such as: 

<img src="images2/nonlinear-data.png" width="400">

Logistic regression would _not_ be appropriate for this, while neural networks would! Recall, a linear function has the form:

$$w_1x_1 + w_2x_2+...+w_nx_n$$

$$w^T x$$

Where, just a reminder, in the vector notation $w_T x$, the weights are transposed because by convention they are stored as a column vector, but we need to be able to perform matrix vector multiplicaton (akin to the dot product in this case) with the input vector $x$. 

So, we can see that anything that cannot be simplified into $w^Tx$ is nonlinear. Now, $x^2$ and $x^3$ are both nonlinear, but neural networks are nonlinear in a _very specific way_. Neural Networks achieve nonlinearity by:

> _Being a combination of multiple logistic regression units (neurons) put together._

That is going to be the focus of this section; determining how we can build a nonlinear classifier (in this case a neural network), by combining logistic regression units (neurons). We will then use this nonlinear classifier to make _**predictions**_

---

# 2. Logistic Regression $\rightarrow$ Neural Networks
We are now ready to start the transition from logistic regression to neural networks. Recall that logistic regression is a neuron, and we are going to be connecting many together to make a network of neurons. The most basic way to do this is the _**feed forward method**_. For logistic regression, we have a weight corresponding to every input:

<img src="images2/logistic-reg-unit.png" width="500">

This is seen clearly in the image above. We have two input features, $x_1$ and $x_2$, but of course there can be many more. Each input feature has a corresponding weight, $w_1$ and $w_2$. In order to determine the output $y$, we multiply each input by its weight, sum them all together, add a bias term, and put it through a sigmoid function:

$$z = x_1w_1 + x_2w_2 + bw_0$$

$$y = p(y \mid x) = \frac{1}{1 + e^{-z}}$$

$$prediction = round \big( p(y \mid x)\big)$$

If our prediction is greater than 0.5, we predict class 1, otherwise we predict class 0.

## 2.1 Extend to a Neural Network