# Pre-Lesson 1: Conceptual example of Deep Learning

By Francesco Civilini revised by Zhuocheng Jiang

## 1. Introduction to neural nets
A neural net is a set of connections uniting inputs and operations, usually displayed as a schematic of circles and lines. 

<img src="dl_intro_illustrator_edits\2L_nn.ai.png" style="width: 400px;">

There are many types of neural networks. This is a "feedforward network", which reads from left to right. Circles and lines correspond to neurons and synapses respectively. 

Each column of neurons is called a layer, of which there are 
- Input layer (3 neurons in above example)
- Computation or "hidden" layer (2 neurons)
- An output layer (1 neuron)

The number of layers in the network is 2. The input layer does not count towards the total because it does not contain any trainable parameters. 

These models are simulated by the brain, where neurons are connected by dandrites (which bring input into the neuron) and axons (which bring output out of the neuron). 


## 2. A closer look

<img src="dl_intro_illustrator_edits\1in_1hid.png" style="width: 400px;">

Each synapse and neuron has a strength, and we call these **weights** and **biases** respectively. Both weights and biases are also known as the general term **parameters**. 

Components:
- Input ($x$): Input value, for example, value of a pixel in an image. 
- Weight ($w$): Drives the scale of the input parameter
- Bias ($b$): Translates or shifts the input
- Computational node ($z$): Linear function combining inputs with weights and biases
- Activation function ($a$): Non-linear function that affects the amplitude of the signal exciting a computaiton neuron. Without we would only be able to do linear maps. 

There are three main steps for deep learning modeling using neural nets:

**(1) Define structure:**

For this neural net, the computational node $z$ is the dot product between the input and weights added to the bias, or $z=x*w + b$.

There are many different types of activation functions that can be used. A popular one is the sigmoid. Our output would then be as follows:

output $= a(z)=\frac{1}{1+e^-(z)}=\frac{1}{1+e^-(xw + b)}$


**(2) Error function:**

In order to know how good of a fit our model is, we need to compare our results with a source of error. An error function that is pertinent to the problem is selected. 


**(3) Parameter update:**

Continously update (modify) values of $w$ and $b$ such that error decreases. This process is done in deep learning by **stochastic gradient descent** and **backpropagation**. These processes use the gradient to determine the global minimum of the loss surface. 





## 3. Conceptual example


Let's take a very simple, yet valid special type of neural net called a **perceptron**. A perceptron is a neural net in which the output is binary. It returns True (output = 1) or False (output = 0). 

To make a clear example, we will simplify our neural net even further by (1) removing the activation function and (2) setting the bias as a threshold describing whether or not the neuron will fire.

<img src="dl_intro_illustrator_edits\perceptron.png" style="width: 400px;">

where:

$x*w <= b$ then output = 0

$x*w > b$ then output = 1


Assume that this is a model describing whether or not I want to go see a movie in a movie theater (remember when we used to do that?). In this example, $x$ is the only movie playing at the theater, $w$ describes how much I want to go see the movie, and $b$ is my "lazyness threshold". If my desire ($w$) to see the movie ($x$) is greater than my lazyness threshold ($b$), then we go to the movie theater. If not, we stay home. 

So what is $w$ mean in practice? Assume that $x$ is the movie *The Avengers*. The weight $w$ could represent how much I like superhero movies. If I like them a lot ($w$ is high), then it'll be more likely that I go to the theater. However, if I don't like superhero movies ($w$ is low), I'll probably stay home. 


Let's take a look at a two-movie example. 
<img src="dl_intro_illustrator_edits\2in_1hid.png" style="width: 350px;">

In this case:

$x_1*w_1 + x_2*w_2  <= b$ then we stay home

$x_1*w_1 + x_2*w_2  > b$ we go to the movies

If we make $x_1$ is *The Avengers* and $x_2$ is *A Dog's Purpose*, $w_1$ could be how much we like superheroes and $w_2$ could be how much I like dogs. If I really dislike superheroes ($w_1$ is low) but I really love dogs ($w_2$ is high), I might still go to the movie theater. 

**The point:**
If we have enough examples of movies playing in the movie theater and knowledge of whether or not I went to the movies (i.e. a training set), we can determine how much I like superheroes ($w_1$), dogs ($w_2$), and how lazy I am ($b$). Once we know these parameters, the neural net can be used to predict the outcome for a set of different movies. **This is the foundation of deep learning.**

*A short note:*

The inputs don't necessarily need to be related. For example, $x_1$ could be *The Avengers* and $x_2$ could be popcorn. Maybe I am not particularly excited about watching the movie, but I really want some popcorn, so I still go to the movie theater. Additionally, $x_1$ and $x_2$ could be vectors of values (i.e., lists of movies and concession snacks). 
