# Fashion MNIST Neural Network

Undertaking a classification task, the aim of this project is to develop skills in building neural networks from scratch, image preprocessing, and data augmentation. The dataset used is the Fashion MNIST dataset, which requires classifying instances into one of 10 different classes representing different clothing articles.

The key learning goals of this project understanding backpropagation, loss, training loop, and gradients. Nevertheless, other features will be added that will develop other skills as well.

**Main Objectives**:
- Importing the Data
- Cleaning the Data
- Train/Test/Split
- Exploratory Data Analysis (EDA) and Visualization
- Preprocessing the Data
- Data Augmentation
- Training the Model
- Hyperparameter Tuning
- Test set evaluation
- Metrics for performance - F1 score, precision, recall, confusion_matrix.
- Finding out what kind of images the model's most confident wrong and correct predictions corresponded to, as well as it's most uncertain predictions.

**Extra**:
- Implement a **Neural Network** from scratch.
- The network must have 1 input layer, 2 hidden layers, and an output layer.
- Implement the forward propagation and backpropagation algorithms.
- Use mini-batch gradient descent.
- Implement the Adam optimizer, dropout, and layer normalization.
- Add zerograd
- Regularization with weight decay.
- Softmax + Categorical Cross-Entropy
- Early Stopping
- Learning rate scheduler and momentum
- Visualize loss curves
- Implement ReLU
- Modular design: Linear, ReLU, Dropout, and Softmax as separate classes.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import seaborn as sns
import pandas as pd

## Neural Network Implementation

The implementation of a non-modular neural network.

This neural network will consist of an input layer, two hidden layers, and an output layer. Forward pass and backpropagation will be based on this model architecture, but this will change with the modular setup.

Mini-batch gradient descent with Adam will be used for optimization.

Dropout and weight decay will be used as regularizatoin.

Early stopping, learning rate scheduler, and momentum will all be used to help convergence. Layer normalization will be added as well.

Lastly, the softmax and ReLU activation functions have been implemented.

Dimensions (for entire dataset):
- $X$ = D x N
- $W^{(1)}$ = L x D  
- $b^{(1)}$ = L x 1
- $m$ = L x N
- $h$ = L x N
- $W^{(2)}$ = K x L
- $b^{(2)}$ = K x 1
- $z$ = K x N
- $y$ = K x N <br>

The forward

**Forward Propagation:**
$$m = W^{(1)}x + b^{(1)}$$
$$h = ReLU(m)$$
$$z = W^{(2)}h + b^{(2)}$$
$$y = softmax(z)$$
$$L = L_{CE}(y, t)$$<br>

**Backpropagation:**

For the a batch of data, the forward pass and backpropagation will look like:


In [1]:
class NeuralNetwork():
  """
  TBD
  """
  def __init__(self, dropout=0.8, learning_rate=0.01, regularization=None, lamb=0.2, batch_size=32, momentum=0.9):
    self.W1 = None
    self.W2 = None
    self.b1 = None
    self.b2 = None
    self.momentum = momentum
    self.batch_size = batch_size
    self.dropout = dropout
    self.learning_rate = learning_rate
    self.regularization = regularization
    self.lamb = lamb

  def fit(self, X, y):
    pass

  def forward_prop(self):
    pass

  def backwardpropagation(self):
    pass

  def gradient_descent(self):
    pass

  def compute_loss(self):
    pass

  def Adam_opitimizer(self):
    pass

  def predict(self):
    pass

  def relu(self, o):
    return np.maximum(0, o)

  def softmax(self, o):
    # Softmax function implementation, the keepdims is used for broadcasting purposes.
    z =  np.exp(o) / np.sum(np.exp(o), axis=1, keepdims=True)
    return z

  def mini_batch(self, t, X, N):
    batches = {}
    n_batches = N // self.batch_size
    # Create batches
    for i in range(n_batches):
        batches[i] = [X[i*self.batch_size:(i+1)*self.batch_size], t[i*self.batch_size:(i+1)*self.batch_size]]

    # Last batch should be compiled into its own batch, even if it's less than batch size
    if N % self.batch_size != 0 :
        batches[n_batches] = [X[n_batches*self.batch_size:], t[n_batches*self.batch_size:]]

    return batches

  def layer_normalization(self):
    pass




## Modular Implementation of a Neural Network

Divides the previous implementation k into several classes that can be combined to form the entire network. The main purpose of this is start writing modular code and create reusable classes for further projects.

## Importing the Data

## Train/Test/Split

## Exploratory Data Analysis (EDA)

## Data Augmentation

Common techniques for augmenting images are the following:
- tbd

## Data Preprocessing

## Building and Training the Model

## Hyperparameter Tuning

## Test Set Evaluation

## Other Metrics

## Analysis of Model Prediction

## Testing of Implementations