# Full Stack Deep Learning - W1 Notes
> A short summary of DL fundamentals.

- toc: true 
- badges: true
- comments: true
- categories: [deep-learning]
- image: images/chart-preview.png

# About

This notebook contains my notes for [FSDL Week 1: DL Fundamentals](https://fullstackdeeplearning.com/spring2021/lecture-1/).

Note: I have added additional information in addition to the lectures

## What is a perceptron?

- Perceptrons were originally brain models created to understand how the brain works. A perceptron as we know it encodes several principles about how the brain works and then evolved into an algorithm for supervised binary classification.

In the 1960's, Frank Rosenblatt published the book `Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms`. It is curious that this fundamental block for AI was, in the author's mind, a tool for understanding the human brain and not for pattern recognition (even though he encouraged this use as well):

> For this writer, the perceptron program is _not_ primarily concerned with the invention of devices for "artificial intelligence", but rather with investigating the physical structures and neurodynamic principles which underlie "natural intelligence". *A perceptron is first and foremost a brain model, not an invention for pattern recognition* [emphasis added]" (p. viii). 

In other words, the perceptron is actually a simplification and abstraction which has allowed us to discover principles for how the brain works. These same principles were then also used to create pattern recognition machines.

Rosenblatt explicitly recognizes that his model is a direct descendant of the model created by McCulloch and Pitts, and influenced by the theories of Hebb and Hayek. 

Main components of a perceptron in Rosenblatt's book:
- **Environment**: The environment generates the information that is initially passed on to the perceptron.


- **Signal generating units**: Each unit receives a signal and generates an output signal.


- **Signal propagation functions**: These are rules that define how signals are generated and transmitted.


- **Memory functions**: These are rules that define how properties of the perceptron can be changed in response to certain activity.

The definition of a single neuron evolves from the ideas above.

A neuron takes values from its environment (x1, x2, x3) and each of these get multiplied by a stored parameter (w1, w2, w3). The sum of each of these operations is then passed through an activation function.

In other words, it's as if we are trying to pass a signal through the neuron and all of these components work together to establish how the signal is transmitted. 

In the example below, we are randomly initializing the parameters. The activation function is a step-function with a threshold of 5. This means that the signal is only passed on as a unitary value if it is larger than the threshold.

In [5]:
import numpy as np

w1 = np.random.randint(1,10)
w2 = np.random.randint(1,10)
w3 = np.random.randint(1,10)

print(f'W1 is equal to: {w1}')
print(f'W2 is equal to: {w2}')
print(f'W3 is equal to: {w3}\n')

x1 = 5
x2 = 3
x3 = 2

b = 2

activation_function = lambda x: 1 if x > 5 else 0

output = activation_function(x1*w1 + x2*w2 + x3*w3)

print('Output:', output)

W1 is equal to: 1
W2 is equal to: 6
W3 is equal to: 9

Output: 1


## What is an activation function?

Going back to Rosenblatt's book, activation functions are essentially signal propagation functions. 

When a neuron receives a signal, the activation function decides if the signal is passed on and how strong the output signal becomes. 

We already learned about the step function. This activation is often not ideal for multiple reasons.

First of all, the result is binary. But sometimes we are more interested in also knowing the degree of certainty, so I probability might be better.

We might also want to have a wider range of values. When predicting age, for example, binary values of 0 or 1 will be of little value. 

A lot of different activation functions have been developed by researchers. Three common activation functions mentioned in the lecture are:

**Sigmoid function**
Has a great property of having outputs between 0 and 1 and therefor can be interpreted as probabilities. The function is also smooth and easy to differentiate which makes learning easier.

It can be problematic when the input signal has very large positive or negative values. At those points the derivative is very close to 0 and learning becomes very slow.

**Hyperbolic function**
This is another smooth function with a range of values between -1 and 1. This is interesting because sometimes a signal might have a reverse effect on the output and the hyperbolic function allows us to include this type of relationship in the network. 

**Rectified linear unit (ReLU)**
This function is very simple aned fast to compute. This allows us to work with larger and more complex models that, given enough data, can produce better results overall.

## What is a neural network?

## What is universality?