# Fully Connected Networks

One of the most common and wildly used architectures is Fully Connected Network (FCN) which is a stack of fully connected layers. Throughout this class, we will use fully connected layers in combination with more sophisticated layers such as Convolutional, Recurrent, Attention, etc. Therefore, it is crucial to understand fully connected layers and networks.

First, we start with implementing the affine layer (a.k.a dense or fully-connected layer) in ```nn``` folder. ```AffineLayer``` inherits ```Module``` class which is responsible for gathering parameters and changing the mode of its children layers. Remember that, when we assign an ```Array``` object to a module, if the ```Array``` object is a parameter (```is_parameter``` is ```True```) then ```Module``` class (which all the layers inherit) saves that ```Array``` object to its parameters for future use.

Initialize the weights using [Glorot initialization](https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf) (Assume that we use ```relu``` activation). 

> Complete ```AffineLayer``` in ```layers.py```

We can define a neural network for classification tasks using the activation functions and affine layers. You should use at least two affine layers. The number of hidden units and the number of layers is up to you for the rest of the Neural Networks.

> Complete ```FCN``` in ```layers.py``` (See ExampleFCN in ```layers.py```)

Second, we will experiment with multiple activation functions and compare them on the [FashionMnist](https://www.kaggle.com/datasets/zalando-research/fashionmnist) dataset. In order to start training, we need to have a data loader and a trainer class that takes care of logging, training, and evaluation.

> Complete ```Dataloader``` in ```loader.py```

> Complete ```Trainer``` in ```trainer.py```

> Complete ```SGD``` in ```optimization.py```

![fashion-mnist](fashion-mnist-sprite.png)

### Activation Function Experiments

- We start by dividing the dataset into the test, evaluation, and train sets.

- Run ```train_fcn``` function for each activation function and save the logs


In [59]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [60]:
import numpy as np

from nn.layers import FCN, BatchNormFCN, DropoutFCN, MaxoutFCN
from nn.loader import DataLoader, FashionMnistDataset
from nn.logger import Logger
from nn.optimization import SGD
from nn.trainer import Train

import autograd.functions


data = FashionMnistDataset.load()

indices = np.random.permutation(len(data["train_data"]))
train_loader = DataLoader(data=data["train_data"][indices[:40000]],
                          labels=data["train_labels"][indices[:40000]],
                          batch_size=32)
eval_loader = DataLoader(data=data["train_data"][indices[40000:]],
                         labels=data["train_labels"][indices[40000:]],
                         batch_size=32)
test_loader = DataLoader(data=data["test_data"],
                         labels=data["test_labels"],
                         batch_size=32)


def train_fcn(network: FCN, log_name: str, logger: Logger, lr: float, l2_reg_coeff: float, epoch: int) -> Train:
    """ Fully Connected Network Trainer

    Args:
        network (FCN): FCN or a network object that inherits FCN
        log_name (str): Name of the log file
        logger (Logger): Logger object that shows the training progress
        lr (float): Learning rate
        l2_reg_coeff (float): L2 regularization coefficient
        epoch (int): Number of epochs to train the model

    Returns:
        Train: Trainer object
    """
    logger.reset()
    optimizer = SGD(network.parameters, lr, l2_reg_coeff)
    train = Train(network, optimizer)
    train.fit(train_data_loader=train_loader,
              eval_data_loader=eval_loader, epochs=epoch, logger=logger)
    logger.save_logs(f"logs/{log_name}.json")
    return train


logger = Logger(verbose=False, live_figure_update=True)
logger.render()


HBox(children=(FigureWidget({
    'data': [{'mode': 'markers+lines',
              'name': 'accuracy',
       …

In [61]:
# Hyperparameters (You may need to tune them)
lr = 0.0025
l2_reg_coeff = 0.00


fn_names = ("relu", "tanh", "sigmoid", "leaky_relu")
for fn_name in fn_names:
    logger.reset()
    net = FCN(784, 10, activation_fn=getattr(autograd.functions, fn_name))
    train_fcn(net, fn_name, logger, lr, l2_reg_coeff, epoch=15)

You can start the experiments by running the above cell. Once all the trainings are completed and log files are saved, you can run the cell below to compare their accuracies.

In [62]:
Logger.compare({f"BN+{fn_name}": f"logs/{fn_name}.json" for fn_name in fn_names})

FigureWidget({
    'data': [{'legendgroup': 'BN+relu',
              'line': {'color': '#636EFA', 'dash': 'dash'},
              'mode': 'markers+lines',
              'name': 'BN+relu<br>train_accuracy',
              'type': 'scatter',
              'uid': '855c68a5-7e03-4146-8476-88585d8b2b88',
              'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]),
              'y': [0.6152, 0.7371, 0.771275, 0.791, 0.80235, 0.8099, 0.8165,
                    0.8213, 0.8251, 0.827175, 0.830675, 0.834, 0.835075, 0.835925,
                    0.839425]},
             {'legendgroup': 'BN+relu',
              'line': {'color': '#636EFA', 'dash': 'dot'},
              'mode': 'markers+lines',
              'name': 'BN+relu<br>eval_accuracy',
              'type': 'scatter',
              'uid': '6f3e10f5-306a-45fd-b4b9-64465d90c64c',
              'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]),
              'y': [0.7167, 0.7635, 0.7896, 0.8024

Keep the log files in your submissions.

## Report

Report your observations about the above experiment. Write your comments below



### Layer Experiments

In this experiment, we will implement and compare Fully Connected Networks with BatchNorm layer(s), Dropout layer(s), and Maxout layer(s).

See the [paper](http://proceedings.mlr.press/v37/ioffe15.pdf) for details.

> Complete ```BatchNorm``` layer in ```layers.py```

Now we can use BatchNorm layer in an FCN. We build a new FCN called BatchNormFCN.

> Complete ```BatchNormFCN``` network in ```layers.py```

Run BatchNormFCN.


In [63]:
# We will use the same logger for rest of the trainings in this experiment
logger = Logger(verbose=False, live_figure_update=True)
logger.render()

HBox(children=(FigureWidget({
    'data': [{'mode': 'markers+lines',
              'name': 'accuracy',
       …

In [64]:
# Hyperparameters (You may need to tune them)
lr = 0.0025
l2_reg_coeff = 0.0
activation_fn = autograd.functions.leaky_relu

network = BatchNormFCN(784, 10, activation_fn)

train_fcn(network, "FCN+BN", logger, lr=lr, l2_reg_coeff=l2_reg_coeff, epoch=15)

<nn.trainer.Train at 0x21674eba288>

Next, we will implement dropout layer and build an FCN with batchnorm and dropout layers.

See the [paper](https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf) for details.

> Complete ```Dropout``` in ```layers.py```

> Complete ```DropoutFCN``` in ```layers.py```



In [87]:
# Hyperparameters (You may need to tune them)
lr = 0.0025
l2_reg_coeff = 0.0
p_drop = 0.1  # Drop probability
activation_fn = autograd.functions.leaky_relu

network = DropoutFCN(784, 10, activation_fn, p_drop=p_drop)
train_fcn(network, "FCN+BN+Dropout", logger, lr=lr, l2_reg_coeff=l2_reg_coeff, epoch=15)

<nn.trainer.Train at 0x21600e2a848>

Finally, we will implement and train Maxout Network with batchnorm and dropout layers.

See the [paper](https://proceedings.mlr.press/v28/goodfellow13.pdf) for details.

> Complete ```MaxoutFCN``` in ```layers.py```

In [None]:
# Hyperparameters (You may need to tune them)
lr = 0.0025
l2_reg_coeff = 0.0
p_drop = 0.1  # Drop probability.
n_affine_outputs = 5  # Number of affine outputs to maxout

network = MaxoutFCN(784, 10, p_drop=p_drop, n_affine_outputs=n_affine_outputs)
train_fcn(network, "FCN+BN+Dropout+Maxout", logger, lr=lr, l2_reg_coeff=l2_reg_coeff, epoch=15)

We can compare the accuracies of the networks you trained so far. Run the cell below.

In [225]:
log_names = ["FCN+BN", "FCN+BN+Dropout", "FCN+BN+Dropout+Maxout"]
Logger.compare({net_name: f"logs/{net_name}.json" for net_name in log_names})

FigureWidget({
    'data': [{'legendgroup': 'FCN+BN',
              'line': {'color': '#636EFA', 'dash': 'dash'},
              'mode': 'markers+lines',
              'name': 'FCN+BN<br>train_accuracy',
              'type': 'scatter',
              'uid': 'e195c932-283a-444f-8b0e-0343fc97277a',
              'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]),
              'y': [0.42475, 0.5025, 0.54655, 0.59115, 0.6088, 0.621975, 0.637775,
                    0.66095, 0.7038, 0.729775, 0.7377, 0.749375, 0.75995, 0.760525,
                    0.7639]},
             {'legendgroup': 'FCN+BN',
              'line': {'color': '#636EFA', 'dash': 'dot'},
              'mode': 'markers+lines',
              'name': 'FCN+BN<br>eval_accuracy',
              'type': 'scatter',
              'uid': '5b523156-9e71-4af2-a2a9-5a44b078c95d',
              'x': array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]),
              'y': [0.4955, 0.51915, 0.59385, 0.60

Keep the log files in your submissions.

## Report

Report your observations about the above experiment. Write your comments below.



### Testing

Until this point, we did not use the test data. After the experiments, you probably have a preferred model and its hyperparameters. You can now run that model on the test data using the test data loader and create a confusion matrix. This confusion matrix represents the final performance.

In [None]:
logger = Logger(verbose=False, live_figure_update=True)
logger.render()

In [None]:
# Hyperparameters (You should tune these)
lr = 0.0025
l2_reg_coeff = 0.0

# Define a model
network = ?

trainer = train_fcn(network, "Final Model", logger, lr=lr, l2_reg_coeff=l2_reg_coeff, epoch=25)
predictions, labels = trainer.test(test_loader)
confusion_matrix = trainer.confusion_matrix(predictions, labels)

logger.render_confusion_matrix(confusion_matrix)



## Report

Explain why you choose the model and the hyperparameters you used in test. Write your comments below.

