# Steps Involved In Building An MLP Neural Network:

### **1.Preprocess The Data**:

>- As per the needs of the goal, perform data cleaning such as deduplication, removing unnecessary elements such as urls, etc.(if dealing with textual data)
- Convert textual data into numerical vectors.(if dealing with textual data)
- Do not forget to normalize the data.

### **2.Choosing Architecture**:

>   - As per the needs of the goal:
      - Select an approriate number of layers of the MLP to build.
      - Select an approriate number of neurons in each layer.
      - Appropriate number of layers/neurons could be selected by performing hyperparameter tuning over various numbers of layers/neurons.

### **3.Weight Initialization**:

>  - Select an appropriate (using hyperparameter tuning) random  weight initialization scheme such as:
      - Start with all weights = 0 (rarely used)
      - Uniform Initialization (suitable with Sigmoid activation function)
      - Xavier/Glorot Initialization:
        - Uniform
        - Normal
      - He Initialization
        - Uniform
        - Normal
      - For more weight initialization schemes, check Keras documentation [here](https://keras.io/api/layers/initializers/).

### **4.Choosing Activaion Function**:

>   - Select an appropriate (using hyperparameter tuning) activaion function such as:
     - Sigmoid (should be avoided becuase of 'Vanishing Gradients' problem)
     - Tanh (should be avoided becuase of 'Vanishing Gradients' problem)
     - ReLu (preferred for regression tasks)
     - Softmax (preferred for classification tasks)
     - For more activation functions, check Keras documentation [here](https://keras.io/api/layers/activations/).


### **5.Choosing optimizer**:

> - We generally avoid using SGD for optimization purposes in deep learning (unlike in machine learning) due to the problem of 'Saddle points' in a curve, which are not handled properly by SGD.
- Hence, we select other optimization techniques such as:
  - Adam
  - Adadelta
  - Adagrad
  - Adamax
  - For more optimizers, check Keras documentation [here](https://keras.io/api/optimizers/).

### **6.Using Batch Normalization**(Optional):

> - Generally, batch normalization is used in the cases of very deep layered MLPs.
- A small change in the values of weights in the initial layers of the MLP could lead to large changes in the deep layers.
- Hence, before the small changes lead to large changes we batch-normalise the deep layers in an MLP.

### **7.Using Dropouts**(Optional):

> - Dropouts refers to dropping out a few number of neurons in a layer, which results in regularising the MLP.
- A certain number of neurons turn off at random, hence their outputs are not used in the model building.
- Regularization is used for 'Bias-Variance Trade-offs', which refers to 'Overfitting' or 'Underfitting' a model.

### **8.Choosing Loss Function**:

>  - Some of the loss functions used to compute the quantity that a model should seek to minimize during training are:
     - Cross-Entropy (preferred for classification tasks)
     - Mean Squared Error (preferred for regression tasks)
     - Mean Absolute Error
     - For more Loss functions, check Keras documentation [here](https://keras.io/api/losses/).

### **9.Gradient clipping**(If need be):

> - Some activation functions cause 'Vanishing Gradients' problem and 'Exploding Gradients' problem.
- To overcome these problems, we perform gradient clipping.

### **10.Plot Graphs**:

> - 'Test Loss vs Number of Epochs' should be plotted for model evaluation.