# Simple Neural Network in Pytorch

Neural Networks are a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.

The Generic Neural Network architecture consists of the following:

1. **Input layer**: Data is fed into the network through the input layer. The number of neurons in the input layer is equivalent to the number of features in the data. The input layer is technically not regarded as one of the layers in the network because no computation occurs at this point. 
2. **Hidden layer**: The layers between the input and output layers are called hidden layers. A network can have an arbitrary number of hidden layers - the more hidden layers there are, the more complex the network. 
3. **Output layer**: The output layer is used to make a prediction. 
4. **Neurons**: Each layer has a collection of neurons interacting with neurons in other layers. 
5. **Activation function**: Performs non-linear transformations to help the model learn complex patterns from the data. 

![three_layer_neural_network_e50fa950bc.png](attachment:image.png)




https://www.datacamp.com/tutorial/pytorch-tutorial-building-a-simple-neural-network-from-scratch

## Neural Network Working

### Initializations

**Weight initialization**

Weight initialization is the first component in the neural network architecture. The initial weights we set to define the start point for the optimization process of the neural network model. 

**Zero initialization**

Zero initialization means that weights are initialized as zero. This is not a good solution as our neural network would fail to break symmetry - it will not learn. 

**Random initialization**

Random initialization breaks the symmetry, which means it’s better than zero initialization, but some factors may dictate the model's overall quality. 

**Xavier/Glorot initialization**

A Xavier or Glorot initialization - it goes by either name - is a heuristical approach used to initialize weights. It’s common to see this initialization approach whenever a tanh or sigmoid activation function is applied to the weighted average

**He/Kaiming initialization**

The He or Kaiming initialization is another heuristic approach. The difference with the He and Xavier heuristic is that He initialization uses a different scaling factor for the weights that consider the non-linearity of activation functions. 

### Forward propagation

Neural networks work by taking a weighted average plus a bias term and applying an activation function to add a non-linear transformation. In the weighted average formulation, each weight determines the importance of each feature (i.e., how much it contributes to predicting the output).

z = (X₁·W₁ + X₂·W₂ + ... + Xₙ·Wₙ) + b

The formula above is the weighted average plus a bias term where, 

*   z is the weighted sum of a neuron's input
*   Wn denotes the weights
*   Xn denotes the independent variables, and
*   b is the bias term.
