## Definitions 
- **neural network**: computational graph, a network of interconnected nodes through which data flows. Begins with input nodes, has some number of hidden or intermediate nodes, and ends with output nodes
- **node**: often called a neuron, a point in a neural network through which data flows, typically has weights, biases, and an activation function. Input & output nodes simply act as an entry and exit points for the data, often connected to all nodes in the previous and next layers.
- **weights**: factor by which inputs are multiplies, each input to a node has a weight; these values are modified during training
- **bias**: number added to the product of weights and inputs, provides a way to ensure neurons are passing on outputs; prevents breakage in the network
- **weighted sum**: multiply each input by the corresponding weight, add the bias and sum all of these results up
- **activation functions**: function applied to the weighted sum to transform output to a value that indicates whether or not a neuron fires or passes on its value. Usually there are two types:
    - **ReLu**: replace Sigmoid and TanH functions as it solves the vanishing gradient problem (nvalues approaching 0), if input < 0, output = 0; if input > 0, output is unmodified input
    - **TanH**: Inputs produce output between -1 and 1
    - **Sigmoid**: Inputs produce output between 0 and 1
    - **Softmax**: typically used in the final layer of probability and classification models

## Common ML structures

### Single Layer Feed Foward
- Inputs are modified through an activation function to feed directly to the output nodes
- No extra hidden layers
- Often called a perceptron
- Often trained through **delta rule** algorithm; calculate the difference between the expected and actual output and adjust weights in order to minimize difference, form of gradient descent (smaller difference = closer to correct answer)
- Supervised learning
<br>

- **Example**: image recognition and classification (1 input, 1 hidden and 1 output)

### Multi Layer Feed Forward
- More complex version of a perceptron with multiple hidden layers (at least 2) of interconnected nodes (each node is connected to every node in the next layer)
- Most common algorithm is **back-propagation**: similar to perceptron, compares actual and expected outputs and produces an error based on the total differences, then adjust weights and runs again during training
- Objective: minimize error and it does so using a non-linear gradient descent optimizer
- Supervised learning
- Better at complex problems; but slower with many hidden layers
<br>

- **Example**: Image recognition and classification (multiple hidden layers)

### Radial Basis Function (RBF)
- Structured similar to perceptron but has a hidden layer with neurons with radial basis activation functions
- Activation functions are **Gaussian**: neurons fire maximally when distance between weights are similar to inputs
- Excellent at detecting anomalies but not so good at extrapolation
- Makes them good at classification problems
- Supervised learning

### Convolutional Neural Network
- Structured similar to perceptron but has hidden layer(s) with convolution functions and pooling functions
- Helps transform inputs into smaller inputs
- Convolution functions create a complex pattern by processing smaller, less complex units (baby steps)
- Excellent as image recognition and classification
- Can be prone to **overfitting**: deducing patterns when there are none and losing sight of the actual pattern we are trying to find (playing tricks on ourselves) which gives the model an overconfidence
- Supervised learning

### Recurrent Neural Network
- No longer feed-forward, hidden layers often generally contain LSTM cells
- LSTM cells retain some memory or state and output is dependent on current input and current state
- Act somewhat similar to a cyclical multi-layer perceptron
- Excellent at anything text or speech related, especially when inputs and outputs are different lengths
- Supervised or reinforcement learning (depending on how you build the network)
<br>

- **Example**: Speech Recognition and language translation

### Modular Neural Network
- separated into two or more independent modules and managed by an intermediary
- modules process input separately and do not interact, usually performs a specific task
- Intermediary takes outputs from each module and puts them together without modifying them
- Can be efficient, better at each individual task and less prone to failure

### Sequence to Sequence Model
- Network in 3 parts: encoder, intermediary, and decoder
- Encodes all inputs in some format, often a map assigning a value to each possible part of an input
- Decodes encoded output, typically using an RNN, to some readable output
- Excellent at producing a different kind of input from output or when working with outputs of different lengths
- Supervised or reinforcement learning
- Good at language translation
<br>

-**Example**: image captioning, language translation