# **Fully Connected Neural Network Architecture**
## **Purpose:** This deals with how to arrange the different number of hidden layers and neurons.

## ($) 1./ Neural networks are usually represented without the learnable parameters.
![image.png](attachment:f470e8ce-fd09-4527-9651-07285c18f916.png)
### (@) In this example, the hidden layer has four neurons.
### (@) The output layer has one neuron.

## ($) 2./ We can make multi-class predictions using neural networks.
![image.png](attachment:32c1fbf3-6b2a-4ebd-a3cb-a3a26ecde2e2.png)
### (@) We just add more neurons to the output layer.
### (@) The process can be thought of as just replacing the output layer with a softmax function.
### (@) Here the output layer has three neurons for three classes.
![image.png](attachment:eae69e5f-7000-47ee-9e70-a4ddd8b3ed92.png)
### (@) For a given input, we obtain an output for each neuron.
### (@) We choose the class according to the index of the neuron that has the largest value.
### (@) In this case, neuron 2 has the largest value, so the output of our model is 2.
![image.png](attachment:ab4b5e4a-5357-423a-95c3-40e1af9271eb.png)
### (@) We have five neurons in the output layer.
### (@) One for each class Dog, Cat, and so on.

# **I./ Deep Neural Networks**
## **Discription:** We can add hidden layers. If we have more than one hidden layer, the neural network is called a deep neural network.
## **Problem may be met:** More neurons, or more layers, may lead to overfitting.
![image.png](attachment:b39681cd-87a4-4c91-9f1c-ab8b3e4553a2.png)

# **II./ Neurals & Dimensions**
## **Discription:** The output or activation of each layer is the same dimension as the number of neurons.
![image.png](attachment:57cdb1d4-b011-4687-9062-c3a31193b422.png)
### (@) This layer has three neurons. (Left)
### (@) The output or activation has three dimensions. (Right)

## ($) 3./ Each neuron is like a linear classifier.
![image.png](attachment:190b2b18-6654-4c71-96b7-3602368d8d2b.png)
### (@) Therefore, each neuron must have the same number of inputs as the previous layer.
### (@) In this case, the previous layer has three neurons, so this neuron has three inputs.

# **III./ Prediction in Neural Networks**
![image.png](attachment:d636f946-6102-42a1-8ae2-5eb8bf036628.png)
### (@) Consider the following input vector with four dimensions.
### (@) Each neuron in the first layer has four inputs. As there are three neurons, the activation has a dimension of 3.
### (@) Each neuron in the next layer has an input dimension of 3. As there is two neurons in the second layer, the output activation has a dimension of 2.

# **IV./ Method selecting Fully Connected NN Architecture**
## **Method:** using the validation data.
![image.png](attachment:b2e3a826-e745-4e1f-b5a0-c602760d59b0.png)
### (@) Consider the following network architecture A, B, and C.
### (@) We select the architecture with the best performance on the validation data, this case, C.

# **V./ Vanishing Gradient**
## **Reason:** If you recall, to perform gradient descent to obtain our learning parameters, we have to calculate the gradient. But the deeper the network, the smaller the gradient gets. This is called the **vanishing gradient.**
![image.png](attachment:d37d3cb5-1221-4a6e-9f94-5052a5515882.png)
### (@) Deep networks work better, but are hard to train.
### (@) As a result, it's harder to train the deeper layers of the network.

# **V./ Sigmoid Function's Drawback**
## 5./ One of the main drawbacks with using the sigmoid activation function is the vanishing gradient.
![image.png](attachment:c7bbdca5-44ef-46b7-bba6-3ec20436d5cd.png)
## **Rectified Linear Unit function, or ReLU function**
### **Notice:** ReLU is only used in the hidden layers.
![image.png](attachment:0015724a-3eae-4fb0-93f4-603f7996a8a5.png)
### (@) The value of the ReLU function is zero when its input is less than zero.
## ($) If the input z is larger than zero, the input of the function will equal its output.
![image.png](attachment:866f9c0b-2d39-49df-9a56-f2838bdc4ab6.png)
### (@) If the input z equals 5, the output equals 5.
### (@) If the input z equals 10, the output equals 10.

# **VI./ Networks have layers that help with training**
![image.png](attachment:2babbe34-dae9-4da3-804a-658ae43f2043.png)
### (@) We will skip the details, but some methods, like dropout, prevent overfitting.
### (@) Batch normalization to help with training.
### (@) Skip connections allow you to train deeper networks by connecting deeper layers during training.

# **VII./ The hidden layers of neural networks replace the kernels in SVMs**
![image.png](attachment:b7ec3693-df6c-4522-bcb3-50504b2e812e.png)
### (@) We can use the raw image or features like HOG.

# **Training neural networks is more of an art than a science**
![image.png](attachment:b9fc8897-e673-4839-b3ee-53e41a87760b.png)
### (@) So we will use the lab to try out different methods, generally.