# MNS - Biological Plausible Deep Learning
## Simple Backprop Baselines (DNNs and CNNs)

#### 1. Data Setup - Download and Loading-In
#### 2. PyTorch DNNs with Bayesian Optimization
#### 3. PyTorch CNNs with Bayesian Optimization
#### 4. Run Example 784/3072 - 500 - 10 Architecture on all Datasets 

In [None]:
!pip install -r requirements.txt --quiet

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [2]:
# Import Packages
import os
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms

# Import tf for tensorboard monitoring of training
import tensorflow as tf

# Import Network Architectures
from models.DNN import DNN, eval_dnn
from models.CNN import CNN, eval_cnn

# Import log-helper/learning plot functions
from utils.helpers import *
from utils.logger import *

# Import Bayesian Optimization Module
from utils.bayesian_opt import BO_NN
from sklearn.model_selection import train_test_split

In [3]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if torch.cuda.is_available():
    print("Torch Device: {}".format(torch.cuda.get_device_name(0)))
else:
    print("Torch Device: Local CPU")

Torch Device: GeForce GTX 1080


In [4]:
# Create all necessary directory if non-existent
global data_dir
data_dir = os.getcwd() +"/data"

if not os.path.exists(data_dir):
    os.makedirs(data_dir)
    print("Created New Data Directory")

# Create Log Directory or remove tensorboard log files in log dir
log_dir = os.getcwd() + "/logs"
if not os.path.exists(log_dir):
    os.makedirs(log_dir)
    print("Created New Log Directory")
else:
    filelist = [ f for f in os.listdir(log_dir) if f.startswith("events")]
    for f in filelist:
        os.remove(os.path.join(log_dir, f))
    print("Deleted Old TF/TensorBoard Log Files in Existing Log Directory")
    
models_dir = os.getcwd() + "/models"

Deleted Old TF/TensorBoard Log Files in Existing Log Directory


# Download, Import and Plot Datasets

In [5]:
download_data()

No download of MNIST needed.
No download of Fashion-MNIST needed.
No download of CIFAR-10 needed.


In [6]:
# MNIST dataset
X_mnist, y_mnist = get_data(num_samples=70000, dataset="mnist")
# MNIST dataset
X_fashion, y_fashion = get_data(num_samples=70000, dataset="fashion")
# MNIST dataset
X_cifar10, y_cifar10 = get_data(num_samples=60000, dataset="cifar10")

# Simple Feedforward Neural Net

### Run a Simple DNN

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X_mnist, y_mnist,
                                                    stratify=y_mnist,
                                                    random_state=0)

# Define batchsize for data-loading/Epochs for training
batch_size = 100
num_epochs = 5
learning_rate = 0.001

# Instantiate the model with layersize and Logging directory
dnn_model = DNN(h_sizes=[784, 500], out_size=10)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(dnn_model.parameters(), lr=learning_rate)

In [8]:
model = train_model("dnn", dnn_model, num_epochs,
                    X_train, y_train, batch_size,
                    device, optimizer, criterion,
                    log_freq = 20000,
                    model_fname ="temp_model_dnn_mnist",
                    verbose=True, logging=True)

# Get test error
score = get_test_error("dnn", device, model, X_test, y_test)
print("Test Accuracy: {}".format(score))

train| epoch  1| batch 20000/41996| acc: 0.9225| loss: 0.2695| time: 0.09
valid| epoch  1| batch 20000/41996| acc: 0.9190| loss: 0.2800| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 40000/41996| acc: 0.9477| loss: 0.1822| time: 0.12
valid| epoch  1| batch 40000/41996| acc: 0.9427| loss: 0.1993| time: 0.00
-------------------------------------------------------------------------
train| epoch  2| batch 20000/41996| acc: 0.9638| loss: 0.1296| time: 0.08
valid| epoch  2| batch 20000/41996| acc: 0.9577| loss: 0.1501| time: 0.00
-------------------------------------------------------------------------
train| epoch  2| batch 40000/41996| acc: 0.9714| loss: 0.1014| time: 0.10
valid| epoch  2| batch 40000/41996| acc: 0.9617| loss: 0.1289| time: 0.00
-------------------------------------------------------------------------
train| epoch  3| batch 20000/41996| acc: 0.9764| loss: 0.0818| time: 0.13
valid| epoch  3| batch 20000/41996| ac

### Compute Cross Validation Accuracy for all 3 Datasets

In [9]:
# Run 3-fold cross-validation on specific architecture for MNIST
eval_dnn("mnist", batch_size, learning_rate,
         num_layers=1, h_l_1=500,
         num_epochs=5, k_fold=3, verbose=True)

Dataset: mnist
Batchsize: 100
Learning Rate: 0.001
Architecture of Cross-Validated Network:
	 Layer 0: 784 Units
	 Layer 1: 500 Units
Cross-Validation Score Fold 1: 0.9724892208251564
Cross-Validation Score Fold 2: 0.9754425623050814
Cross-Validation Score Fold 3: 0.9727827625760712


0.9735715152354363

In [10]:
# Run 3-fold cross-validation on specific architecture for Fashion-MNIST
eval_dnn("fashion", batch_size, learning_rate,
         num_layers=1, h_l_1=500,
         num_epochs=5, k_fold=3, verbose=True)

Dataset: fashion
Batchsize: 100
Learning Rate: 0.001
Architecture of Cross-Validated Network:
	 Layer 0: 784 Units
	 Layer 1: 500 Units
Cross-Validation Score Fold 1: 0.8804627249357326
Cross-Validation Score Fold 2: 0.8805400771538793
Cross-Validation Score Fold 3: 0.8412773253321904


0.8674267091406008

In [11]:
# Run 3-fold cross-validation on specific architecture for CIFAR-10
eval_dnn("cifar10", batch_size, learning_rate,
         num_layers=1, h_l_1=500,
         num_epochs=5, k_fold=3, verbose=True)

Dataset: cifar10
Batchsize: 100
Learning Rate: 0.001
Architecture of Cross-Validated Network:
	 Layer 0: 3072 Units
	 Layer 1: 500 Units
Cross-Validation Score Fold 1: 0.4405
Cross-Validation Score Fold 2: 0.42475000000000007
Cross-Validation Score Fold 3: 0.3507


0.4053166666666667

### Run Bayesian Optimization on DNN Hyperparameters

In [12]:
# Define Search Hyperspace for Bayesian Optimization on DNN architectures
hyper_space_dnn = {'batch_size': (50, 500),
                   'learning_rate': (0.0001, 0.05),
                   'num_layers': (1, 6),
                   'h_l_1': (30, 500),
                   'h_l_2': (30, 500),
                   'h_l_3': (30, 500),
                   'h_l_4': (30, 500),
                   'h_l_5': (30, 500),
                   'h_l_6': (30, 500)}

bo_iters = 50

In [13]:
# Run Bayesian Optimization (UCB-Acquisition Fct) on DNN for MNIST
opt_log = BO_NN(bo_iters, eval_dnn, "dnn", "mnist", hyper_space_dnn,
                num_epochs=10, k_fold=3, logging=True, verbose=True)

Loaded previously existing Log with 0 BO iterations.
Start Logging to ./logs/bo_logs_dnn_mnist.json
BO iter  1 | cv-acc: 0.9601 | best-acc: 0.9601 | time: 32.10
BO iter  2 | cv-acc: 0.9682 | best-acc: 0.9682 | time: 39.05
BO iter  3 | cv-acc: 0.9280 | best-acc: 0.9682 | time: 29.31
BO iter  4 | cv-acc: 0.5533 | best-acc: 0.9682 | time: 85.45
BO iter  5 | cv-acc: 0.9651 | best-acc: 0.9682 | time: 140.76
BO iter  6 | cv-acc: 0.9226 | best-acc: 0.9682 | time: 27.52
BO iter  7 | cv-acc: 0.2709 | best-acc: 0.9682 | time: 36.87
BO iter  8 | cv-acc: 0.9618 | best-acc: 0.9682 | time: 73.92
BO iter  9 | cv-acc: 0.9625 | best-acc: 0.9682 | time: 73.54
BO iter 10 | cv-acc: 0.9210 | best-acc: 0.9682 | time: 86.24
BO iter 11 | cv-acc: 0.8198 | best-acc: 0.9682 | time: 75.88
BO iter 12 | cv-acc: 0.9254 | best-acc: 0.9682 | time: 36.45
BO iter 13 | cv-acc: 0.2719 | best-acc: 0.9682 | time: 143.51
BO iter 14 | cv-acc: 0.8761 | best-acc: 0.9682 | time: 29.99
BO iter 15 | cv-acc: 0.9216 | best-acc: 0.96

In [14]:
# Run Bayesian Optimization (UCB-Acquisition Fct) on DNN for Fashion-MNIST
opt_log = BO_NN(bo_iters, eval_dnn, "dnn", "fashion", hyper_space_dnn,
                num_epochs=10, k_fold=3, logging=True, verbose=True)

Loaded previously existing Log with 0 BO iterations.
Start Logging to ./logs/bo_logs_dnn_fashion.json
BO iter  1 | cv-acc: 0.8405 | best-acc: 0.8405 | time: 29.19
BO iter  2 | cv-acc: 0.8640 | best-acc: 0.8640 | time: 36.04
BO iter  3 | cv-acc: 0.7612 | best-acc: 0.8640 | time: 26.61
BO iter  4 | cv-acc: 0.8412 | best-acc: 0.8640 | time: 67.01
BO iter  5 | cv-acc: 0.4402 | best-acc: 0.8640 | time: 80.03
BO iter  6 | cv-acc: 0.7906 | best-acc: 0.8640 | time: 40.50
BO iter  7 | cv-acc: 0.1000 | best-acc: 0.8640 | time: 142.16
BO iter  8 | cv-acc: 0.7846 | best-acc: 0.8640 | time: 30.98
BO iter  9 | cv-acc: 0.8423 | best-acc: 0.8640 | time: 73.80
BO iter 10 | cv-acc: 0.7821 | best-acc: 0.8640 | time: 32.36
BO iter 11 | cv-acc: 0.8426 | best-acc: 0.8640 | time: 33.78
BO iter 12 | cv-acc: 0.8413 | best-acc: 0.8640 | time: 75.44
BO iter 13 | cv-acc: 0.8144 | best-acc: 0.8640 | time: 35.91
BO iter 14 | cv-acc: 0.7836 | best-acc: 0.8640 | time: 27.72
BO iter 15 | cv-acc: 0.8395 | best-acc: 0.8

In [15]:
# Run Bayesian Optimization (UCB-Acquisition Fct) on DNN for CIFAR-10
opt_log = BO_NN(bo_iters, eval_dnn, "dnn", "cifar10", hyper_space_dnn,
                num_epochs=10, k_fold=3, logging=True, verbose=True)

Loaded previously existing Log with 0 BO iterations.
Start Logging to ./logs/bo_logs_dnn_cifar10.json
BO iter  1 | cv-acc: 0.1000 | best-acc: 0.1000 | time: 29.39
BO iter  2 | cv-acc: 0.2598 | best-acc: 0.2598 | time: 34.08
BO iter  3 | cv-acc: 0.1000 | best-acc: 0.2598 | time: 24.57
BO iter  4 | cv-acc: 0.4387 | best-acc: 0.4387 | time: 130.70
BO iter  5 | cv-acc: 0.4203 | best-acc: 0.4387 | time: 128.69
BO iter  6 | cv-acc: 0.4001 | best-acc: 0.4387 | time: 128.14
BO iter  7 | cv-acc: 0.1000 | best-acc: 0.4387 | time: 132.18
BO iter  8 | cv-acc: 0.4198 | best-acc: 0.4387 | time: 130.23
BO iter  9 | cv-acc: 0.3732 | best-acc: 0.4387 | time: 37.11
BO iter 10 | cv-acc: 0.4403 | best-acc: 0.4403 | time: 71.43
BO iter 11 | cv-acc: 0.4493 | best-acc: 0.4493 | time: 76.09
BO iter 12 | cv-acc: 0.4449 | best-acc: 0.4493 | time: 75.25
BO iter 13 | cv-acc: 0.4520 | best-acc: 0.4520 | time: 75.65
BO iter 14 | cv-acc: 0.3498 | best-acc: 0.4520 | time: 41.24
BO iter 15 | cv-acc: 0.1000 | best-acc:

# Simple Convolutional Neural Network

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X_mnist, y_mnist,
                                                    stratify=y_mnist,
                                                    random_state=0)

# ConvNet Parameters
batch_size = 100
ch_sizes = [1, 16, 32]
k_sizes = [5, 5]
stride = 1
padding = 2
out_size = 10
num_epochs = 2
learning_rate = 0.001

# Instantiate the model with layersizes, Loss fct, optimizer
cnn_model = CNN(ch_sizes, k_sizes,
                stride, padding, out_size)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model.parameters(), lr=learning_rate)

In [11]:
model = train_model("cnn", cnn_model, num_epochs,
                    X_train, y_train, batch_size,
                    device, optimizer, criterion, log_freq=10000,
                    model_fname ="temp_model_cnn",
                    verbose=False, logging=True)

# Get test error
score = get_test_error("cnn", device, model, X_test, y_test)
print("Test Accuracy: {}".format(score))

Test Accuracy: 0.9860571428571427


In [12]:
# Run 3-fold cross-validation on specific architecture
eval_cnn("mnist", batch_size, learning_rate, num_layers=2,
         ch_1=16, ch_2=32, k_1=5, k_2=5,
         stride=1, padding=2,
         k_fold=2, verbose=True)

Batchsize: 100
Learning Rate: 0.001
Architecture of Cross-Validated Network:
	 Layer 1: 16 Channels, 5 Kernel Size
	 Layer 2: 32 Channels, 5 Kernel Size
cuda
Cross-Validation Score Fold 1: 0.97680205655527
Cross-Validation Score Fold 2: 0.9743121463275223


0.9755571014413962

In [13]:
# Run Bayesian Optimization (UCB-Acquisition Fct) on CNN
hyper_space_cnn = {'batch_size': (50, 500),
                   'learning_rate': (0.0001, 0.05),
                   'num_layers': (1, 5),
                   'ch_1': (3, 64),
                   'ch_2': (3, 64),
                   'ch_3': (3, 64),
                   'ch_4': (3, 64),
                   'ch_5': (3, 64),
                   'k_1': (2, 10),
                   'k_2': (2, 10),
                   'k_3': (2, 10),
                   'k_4': (2, 10),
                   'k_5': (2, 10),
                   'stride': (1, 3),
                   'padding': (1, 3)}

bo_iters = 50

In [22]:
# Run Bayesian Optimization (UCB-Acquisition Fct) on CNN for MNIST
opt_log = BO_NN(bo_iters, eval_cnn, "cnn", "mnist", hyper_space_cnn,
                num_epochs=10, k_fold=3, logging=True, verbose=True)

Merged JSONs - Total its: 26
Removed temporary log file.
Loaded previously existing Log with 26 BO iterations.
Start Logging to ./logs/bo_logs_cnn_mnist.json
BO iter 27 | cv-acc: 0.9746 | best-acc: 0.9874 | time: 122.81
BO iter 28 | cv-acc: 0.9853 | best-acc: 0.9874 | time: 107.32
BO iter 29 | cv-acc: 0.9880 | best-acc: 0.9880 | time: 78.56
BO iter 30 | cv-acc: 0.9779 | best-acc: 0.9880 | time: 84.03
BO iter 31 | cv-acc: 0.9671 | best-acc: 0.9880 | time: 72.48
BO iter 32 | cv-acc: 0.9659 | best-acc: 0.9880 | time: 72.05
BO iter 33 | cv-acc: 0.9709 | best-acc: 0.9880 | time: 77.33
BO iter 34 | cv-acc: 0.9867 | best-acc: 0.9880 | time: 83.39
BO iter 35 | cv-acc: 0.9818 | best-acc: 0.9880 | time: 103.17
BO iter 36 | cv-acc: 0.9891 | best-acc: 0.9891 | time: 134.62
BO iter 37 | cv-acc: 0.9828 | best-acc: 0.9891 | time: 66.65
BO iter 38 | cv-acc: 0.9738 | best-acc: 0.9891 | time: 69.37
BO iter 39 | cv-acc: 0.9904 | best-acc: 0.9904 | time: 128.27
BO iter 40 | cv-acc: 0.9849 | best-acc: 0.99

KeyboardInterrupt: 

In [23]:
# Run Bayesian Optimization (UCB-Acquisition Fct) on CNN for Fashion-MNIST
opt_log = BO_NN(bo_iters, eval_cnn, "cnn", "fashion", hyper_space_cnn,
                num_epochs=10, k_fold=3, logging=True, verbose=True)

Loaded previously existing Log with 0 BO iterations.
Start Logging to ./logs/bo_logs_cnn_fashion.json
BO iter  1 | cv-acc: 0.8738 | best-acc: 0.8738 | time: 67.24
BO iter  2 | cv-acc: 0.8378 | best-acc: 0.8738 | time: 36.99
BO iter  3 | cv-acc: 0.9020 | best-acc: 0.9020 | time: 84.46
BO iter  4 | cv-acc: 0.8587 | best-acc: 0.9020 | time: 73.96
BO iter  5 | cv-acc: 0.9090 | best-acc: 0.9090 | time: 100.30
BO iter  6 | cv-acc: 0.8785 | best-acc: 0.9090 | time: 37.44
BO iter  7 | cv-acc: 0.8761 | best-acc: 0.9090 | time: 40.28
BO iter  8 | cv-acc: 0.8598 | best-acc: 0.9090 | time: 47.36
BO iter  9 | cv-acc: 0.8945 | best-acc: 0.9090 | time: 112.12
BO iter 10 | cv-acc: 0.8796 | best-acc: 0.9090 | time: 41.00
BO iter 11 | cv-acc: 0.8792 | best-acc: 0.9090 | time: 42.57
BO iter 12 | cv-acc: 0.8885 | best-acc: 0.9090 | time: 56.85
BO iter 13 | cv-acc: 0.8942 | best-acc: 0.9090 | time: 54.09
BO iter 14 | cv-acc: 0.8889 | best-acc: 0.9090 | time: 97.76
BO iter 15 | cv-acc: 0.8789 | best-acc: 0.

In [26]:
# Run Bayesian Optimization (UCB-Acquisition Fct) on DNN for CIFAR-10
opt_log = BO_NN(bo_iters, eval_cnn, "cnn", "cifar10", hyper_space_cnn,
                num_epochs=10, k_fold=3, logging=True, verbose=True)

Loaded previously existing Log with 0 BO iterations.
Start Logging to ./logs/bo_logs_cnn_cifar10.json
BO iter  1 | cv-acc: 0.4114 | best-acc: 0.4114 | time: 76.55
BO iter  2 | cv-acc: 0.4905 | best-acc: 0.4905 | time: 44.21
BO iter  3 | cv-acc: 0.4727 | best-acc: 0.4905 | time: 111.66
BO iter  4 | cv-acc: 0.5002 | best-acc: 0.5002 | time: 66.17
BO iter  5 | cv-acc: 0.4077 | best-acc: 0.5002 | time: 100.36
BO iter  6 | cv-acc: 0.4560 | best-acc: 0.5002 | time: 76.43
BO iter  7 | cv-acc: 0.4286 | best-acc: 0.5002 | time: 67.64
BO iter  8 | cv-acc: 0.4915 | best-acc: 0.5002 | time: 74.41
BO iter  9 | cv-acc: 0.4737 | best-acc: 0.5002 | time: 64.61
BO iter 10 | cv-acc: 0.3810 | best-acc: 0.5002 | time: 73.67
BO iter 11 | cv-acc: 0.3558 | best-acc: 0.5002 | time: 71.98
BO iter 12 | cv-acc: 0.4082 | best-acc: 0.5002 | time: 63.59
BO iter 13 | cv-acc: 0.3024 | best-acc: 0.5002 | time: 61.24
BO iter 14 | cv-acc: 0.3865 | best-acc: 0.5002 | time: 58.96
BO iter 15 | cv-acc: 0.4678 | best-acc: 0.

### Train Full Models (10 Epochs, Logging every 5000 eps and all Data)

In [27]:
X_train, X_test, y_train, y_test = train_test_split(X_mnist, y_mnist,
                                                    stratify=y_mnist,
                                                    random_state=0)

# Define batchsize for data-loading/Epochs for training
batch_size = 100
num_epochs = 10
learning_rate = 0.001

# Instantiate the model with layersize and Logging directory
dnn_model = DNN(h_sizes=[784, 500], out_size=10)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(dnn_model.parameters(), lr=learning_rate)

model = train_model("dnn", dnn_model, 10,
                    X_train, y_train, batch_size,
                    device, optimizer, criterion,
                    log_freq = 5000,
                    model_fname ="temp_model_dnn_mnist",
                    verbose=True, logging=True)

train| epoch  1| batch 5000/41996| acc: 0.8682| loss: 0.4719| time: 0.07
valid| epoch  1| batch 5000/41996| acc: 0.8671| loss: 0.4712| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 10000/41996| acc: 0.9021| loss: 0.3535| time: 0.11
valid| epoch  1| batch 10000/41996| acc: 0.9020| loss: 0.3536| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 15000/41996| acc: 0.9132| loss: 0.3063| time: 0.11
valid| epoch  1| batch 15000/41996| acc: 0.9116| loss: 0.3076| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 20000/41996| acc: 0.9248| loss: 0.2665| time: 0.12
valid| epoch  1| batch 20000/41996| acc: 0.9233| loss: 0.2692| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 25000/41996| acc: 0.9309| loss: 0.2411| time: 0.09
valid| epoch  1| batch 25000/41996| acc:

train| epoch  5| batch 30000/41996| acc: 0.9902| loss: 0.0368| time: 0.09
valid| epoch  5| batch 30000/41996| acc: 0.9736| loss: 0.0889| time: 0.00
-------------------------------------------------------------------------
train| epoch  5| batch 35000/41996| acc: 0.9875| loss: 0.0416| time: 0.11
valid| epoch  5| batch 35000/41996| acc: 0.9704| loss: 0.0941| time: 0.00
-------------------------------------------------------------------------
train| epoch  5| batch 40000/41996| acc: 0.9901| loss: 0.0370| time: 0.11
valid| epoch  5| batch 40000/41996| acc: 0.9712| loss: 0.0939| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 5000/41996| acc: 0.9900| loss: 0.0357| time: 0.07
valid| epoch  6| batch 5000/41996| acc: 0.9708| loss: 0.0901| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 10000/41996| acc: 0.9897| loss: 0.0366| time: 0.09
valid| epoch  6| batch 10000/41996| acc:

train| epoch 10| batch 15000/41996| acc: 0.9959| loss: 0.0142| time: 0.11
valid| epoch 10| batch 15000/41996| acc: 0.9758| loss: 0.0877| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 20000/41996| acc: 0.9953| loss: 0.0158| time: 0.08
valid| epoch 10| batch 20000/41996| acc: 0.9754| loss: 0.0896| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 25000/41996| acc: 0.9973| loss: 0.0105| time: 0.10
valid| epoch 10| batch 25000/41996| acc: 0.9758| loss: 0.0853| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 30000/41996| acc: 0.9969| loss: 0.0135| time: 0.09
valid| epoch 10| batch 30000/41996| acc: 0.9741| loss: 0.0903| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 35000/41996| acc: 0.9950| loss: 0.0156| time: 0.07
valid| epoch 10| batch 35000/41996| ac

In [29]:
X_train, X_test, y_train, y_test = train_test_split(X_fashion, y_fashion,
                                                    stratify=y_fashion,
                                                    random_state=0)

# Instantiate the model with layersize and Logging directory
dnn_model = DNN(h_sizes=[784, 500], out_size=10)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(dnn_model.parameters(), lr=learning_rate)

model = train_model("dnn", dnn_model, 10,
                    X_train, y_train, batch_size,
                    device, optimizer, criterion,
                    log_freq = 5000,
                    model_fname ="temp_model_dnn_fashion",
                    verbose=True, logging=True)

train| epoch  1| batch 5000/42000| acc: 0.7247| loss: 0.7390| time: 0.07
valid| epoch  1| batch 5000/42000| acc: 0.7216| loss: 0.7539| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 10000/42000| acc: 0.8015| loss: 0.5863| time: 0.12
valid| epoch  1| batch 10000/42000| acc: 0.7919| loss: 0.5980| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 15000/42000| acc: 0.8222| loss: 0.5210| time: 0.12
valid| epoch  1| batch 15000/42000| acc: 0.8097| loss: 0.5384| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 20000/42000| acc: 0.8191| loss: 0.5189| time: 0.12
valid| epoch  1| batch 20000/42000| acc: 0.8090| loss: 0.5402| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 25000/42000| acc: 0.8348| loss: 0.4778| time: 0.12
valid| epoch  1| batch 25000/42000| acc:

train| epoch  5| batch 30000/42000| acc: 0.8950| loss: 0.2869| time: 0.07
valid| epoch  5| batch 30000/42000| acc: 0.8750| loss: 0.3439| time: 0.00
-------------------------------------------------------------------------
train| epoch  5| batch 35000/42000| acc: 0.8895| loss: 0.3030| time: 0.07
valid| epoch  5| batch 35000/42000| acc: 0.8706| loss: 0.3649| time: 0.00
-------------------------------------------------------------------------
train| epoch  5| batch 40000/42000| acc: 0.8961| loss: 0.2851| time: 0.07
valid| epoch  5| batch 40000/42000| acc: 0.8759| loss: 0.3460| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 5000/42000| acc: 0.8929| loss: 0.2927| time: 0.13
valid| epoch  6| batch 5000/42000| acc: 0.8747| loss: 0.3549| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 10000/42000| acc: 0.8964| loss: 0.2801| time: 0.09
valid| epoch  6| batch 10000/42000| acc:

train| epoch 10| batch 15000/42000| acc: 0.9154| loss: 0.2313| time: 0.12
valid| epoch 10| batch 15000/42000| acc: 0.8868| loss: 0.3230| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 20000/42000| acc: 0.9110| loss: 0.2416| time: 0.11
valid| epoch 10| batch 20000/42000| acc: 0.8814| loss: 0.3431| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 25000/42000| acc: 0.9054| loss: 0.2530| time: 0.09
valid| epoch 10| batch 25000/42000| acc: 0.8750| loss: 0.3515| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 30000/42000| acc: 0.9164| loss: 0.2251| time: 0.10
valid| epoch 10| batch 30000/42000| acc: 0.8873| loss: 0.3250| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 35000/42000| acc: 0.9059| loss: 0.2492| time: 0.08
valid| epoch 10| batch 35000/42000| ac

In [30]:
X_train, X_test, y_train, y_test = train_test_split(X_cifar10, y_cifar10,
                                                    stratify=y_cifar10,
                                                    random_state=0)

# Instantiate the model with layersize and Logging directory
dnn_model = DNN(h_sizes=[3072, 500], out_size=10)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(dnn_model.parameters(), lr=learning_rate)

model = train_model("dnn", dnn_model, 10,
                    X_train, y_train, batch_size,
                    device, optimizer, criterion,
                    log_freq = 5000,
                    model_fname ="temp_model_dnn_cifar",
                    verbose=True, logging=True)

train| epoch  1| batch 5000/36000| acc: 0.2032| loss: 2.1506| time: 0.75
valid| epoch  1| batch 5000/36000| acc: 0.1987| loss: 2.1553| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 10000/36000| acc: 0.2755| loss: 1.9940| time: 1.05
valid| epoch  1| batch 10000/36000| acc: 0.2681| loss: 2.0003| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 15000/36000| acc: 0.2938| loss: 1.9479| time: 0.91
valid| epoch  1| batch 15000/36000| acc: 0.2927| loss: 1.9535| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 20000/36000| acc: 0.3113| loss: 1.9014| time: 1.09
valid| epoch  1| batch 20000/36000| acc: 0.3114| loss: 1.9123| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 25000/36000| acc: 0.3398| loss: 1.8631| time: 0.97
valid| epoch  1| batch 25000/36000| acc:

train| epoch  6| batch 15000/36000| acc: 0.4373| loss: 1.5746| time: 1.37
valid| epoch  6| batch 15000/36000| acc: 0.4110| loss: 1.6315| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 20000/36000| acc: 0.4591| loss: 1.5372| time: 1.30
valid| epoch  6| batch 20000/36000| acc: 0.4363| loss: 1.5947| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 25000/36000| acc: 0.4442| loss: 1.5578| time: 0.91
valid| epoch  6| batch 25000/36000| acc: 0.4213| loss: 1.6132| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 30000/36000| acc: 0.4575| loss: 1.5420| time: 0.86
valid| epoch  6| batch 30000/36000| acc: 0.4280| loss: 1.6031| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 35000/36000| acc: 0.4488| loss: 1.5491| time: 1.15
valid| epoch  6| batch 35000/36000| ac

In [7]:
# ConvNet Parameters
batch_size = 100
ch_sizes = [1, 16, 32]
k_sizes = [5, 5]
stride = 1
padding = 2
out_size = 10
num_epochs = 10
learning_rate = 0.001

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X_mnist, y_mnist,
                                                    stratify=y_mnist,
                                                    random_state=0)

# Instantiate the model with layersizes, Loss fct, optimizer
cnn_model = CNN(ch_sizes, k_sizes,
                stride, padding, out_size)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model.parameters(), lr=learning_rate)

model = train_model("cnn", cnn_model, num_epochs,
                    X_train, y_train, batch_size,
                    device, optimizer, criterion, log_freq=5000,
                    model_fname = "mnist_cnn",
                    verbose=True, logging=True)

# Get test error
score = get_test_error("cnn", device, model, X_test, y_test)
print("Test Accuracy: {}".format(score))

train| epoch  1| batch 5000/41996| acc: 0.8965| loss: 0.4530| time: 0.48
valid| epoch  1| batch 5000/41996| acc: 0.8955| loss: 0.4779| time: 0.01
-------------------------------------------------------------------------
train| epoch  1| batch 10000/41996| acc: 0.9484| loss: 0.1844| time: 0.49
valid| epoch  1| batch 10000/41996| acc: 0.9483| loss: 0.1848| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 15000/41996| acc: 0.9513| loss: 0.1562| time: 0.45
valid| epoch  1| batch 15000/41996| acc: 0.9544| loss: 0.1531| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 20000/41996| acc: 0.9697| loss: 0.1012| time: 0.38
valid| epoch  1| batch 20000/41996| acc: 0.9666| loss: 0.1036| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 25000/41996| acc: 0.9752| loss: 0.0802| time: 0.44
valid| epoch  1| batch 25000/41996| acc:

train| epoch  5| batch 30000/41996| acc: 0.9901| loss: 0.0313| time: 0.54
valid| epoch  5| batch 30000/41996| acc: 0.9861| loss: 0.0426| time: 0.00
-------------------------------------------------------------------------
train| epoch  5| batch 35000/41996| acc: 0.9938| loss: 0.0233| time: 0.41
valid| epoch  5| batch 35000/41996| acc: 0.9899| loss: 0.0348| time: 0.00
-------------------------------------------------------------------------
train| epoch  5| batch 40000/41996| acc: 0.9930| loss: 0.0245| time: 0.43
valid| epoch  5| batch 40000/41996| acc: 0.9876| loss: 0.0400| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 5000/41996| acc: 0.9924| loss: 0.0264| time: 0.47
valid| epoch  6| batch 5000/41996| acc: 0.9873| loss: 0.0400| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 10000/41996| acc: 0.9927| loss: 0.0251| time: 0.39
valid| epoch  6| batch 10000/41996| acc:

train| epoch 10| batch 15000/41996| acc: 0.9935| loss: 0.0205| time: 0.36
valid| epoch 10| batch 15000/41996| acc: 0.9856| loss: 0.0454| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 20000/41996| acc: 0.9955| loss: 0.0160| time: 0.37
valid| epoch 10| batch 20000/41996| acc: 0.9885| loss: 0.0373| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 25000/41996| acc: 0.9966| loss: 0.0126| time: 0.33
valid| epoch 10| batch 25000/41996| acc: 0.9893| loss: 0.0331| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 30000/41996| acc: 0.9957| loss: 0.0147| time: 0.42
valid| epoch 10| batch 30000/41996| acc: 0.9879| loss: 0.0362| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 35000/41996| acc: 0.9955| loss: 0.0148| time: 0.35
valid| epoch 10| batch 35000/41996| ac

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X_fashion, y_fashion,
                                                    stratify=y_fashion,
                                                    random_state=0)

# Instantiate the model with layersizes, Loss fct, optimizer
cnn_model = CNN(ch_sizes, k_sizes,
                stride, padding, out_size)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model.parameters(), lr=learning_rate)

model = train_model("cnn", cnn_model, num_epochs,
                    X_train, y_train, batch_size,
                    device, optimizer, criterion, log_freq=5000,
                    model_fname ="fashion_cnn",
                    verbose=True, logging=True)

# Get test error
score = get_test_error("cnn", device, model, X_test, y_test)
print("Test Accuracy: {}".format(score))

train| epoch  1| batch 5000/42000| acc: 0.7731| loss: 0.6494| time: 0.43
valid| epoch  1| batch 5000/42000| acc: 0.7710| loss: 0.6689| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 10000/42000| acc: 0.8107| loss: 0.5071| time: 0.43
valid| epoch  1| batch 10000/42000| acc: 0.8087| loss: 0.5035| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 15000/42000| acc: 0.8341| loss: 0.4451| time: 0.45
valid| epoch  1| batch 15000/42000| acc: 0.8342| loss: 0.4442| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 20000/42000| acc: 0.8497| loss: 0.4128| time: 0.49
valid| epoch  1| batch 20000/42000| acc: 0.8470| loss: 0.4137| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 25000/42000| acc: 0.8667| loss: 0.3751| time: 0.39
valid| epoch  1| batch 25000/42000| acc:

train| epoch  5| batch 30000/42000| acc: 0.9151| loss: 0.2381| time: 0.54
valid| epoch  5| batch 30000/42000| acc: 0.9007| loss: 0.2765| time: 0.00
-------------------------------------------------------------------------
train| epoch  5| batch 35000/42000| acc: 0.9140| loss: 0.2429| time: 0.51
valid| epoch  5| batch 35000/42000| acc: 0.8969| loss: 0.2874| time: 0.00
-------------------------------------------------------------------------
train| epoch  5| batch 40000/42000| acc: 0.9136| loss: 0.2365| time: 0.35
valid| epoch  5| batch 40000/42000| acc: 0.8955| loss: 0.2785| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 5000/42000| acc: 0.9145| loss: 0.2408| time: 0.41
valid| epoch  6| batch 5000/42000| acc: 0.8990| loss: 0.2903| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 10000/42000| acc: 0.9082| loss: 0.2524| time: 0.38
valid| epoch  6| batch 10000/42000| acc:

train| epoch 10| batch 15000/42000| acc: 0.9239| loss: 0.2023| time: 0.44
valid| epoch 10| batch 15000/42000| acc: 0.8990| loss: 0.2822| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 20000/42000| acc: 0.9299| loss: 0.1914| time: 0.41
valid| epoch 10| batch 20000/42000| acc: 0.9034| loss: 0.2684| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 25000/42000| acc: 0.9245| loss: 0.2023| time: 0.38
valid| epoch 10| batch 25000/42000| acc: 0.8979| loss: 0.2873| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 30000/42000| acc: 0.9313| loss: 0.1924| time: 0.47
valid| epoch 10| batch 30000/42000| acc: 0.9041| loss: 0.2749| time: 0.00
-------------------------------------------------------------------------
train| epoch 10| batch 35000/42000| acc: 0.9317| loss: 0.1914| time: 0.49
valid| epoch 10| batch 35000/42000| ac

In [10]:
ch_sizes = [3, 16, 32]

X_train, X_test, y_train, y_test = train_test_split(X_cifar10, y_cifar10,
                                                    stratify=y_cifar10,
                                                    random_state=0)

# Instantiate the model with layersizes, Loss fct, optimizer
cnn_model = CNN(ch_sizes, k_sizes,
                stride, padding, out_size)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn_model.parameters(), lr=learning_rate)

model = train_model("cnn", cnn_model, num_epochs,
                    X_train, y_train, batch_size,
                    device, optimizer, criterion, log_freq=5000,
                    model_fname ="cifar10_cnn",
                    verbose=True, logging=True)

# Get test error
score = get_test_error("cnn", device, model, X_test, y_test)
print("Test Accuracy: {}".format(score))

train| epoch  1| batch 5000/36000| acc: 0.2691| loss: 2.0108| time: 1.08
valid| epoch  1| batch 5000/36000| acc: 0.2681| loss: 2.0182| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 10000/36000| acc: 0.3194| loss: 1.9002| time: 1.23
valid| epoch  1| batch 10000/36000| acc: 0.3078| loss: 1.9136| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 15000/36000| acc: 0.3172| loss: 1.8941| time: 1.15
valid| epoch  1| batch 15000/36000| acc: 0.3103| loss: 1.9075| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 20000/36000| acc: 0.3483| loss: 1.8277| time: 1.22
valid| epoch  1| batch 20000/36000| acc: 0.3384| loss: 1.8469| time: 0.00
-------------------------------------------------------------------------
train| epoch  1| batch 25000/36000| acc: 0.3542| loss: 1.8162| time: 1.00
valid| epoch  1| batch 25000/36000| acc:

train| epoch  6| batch 15000/36000| acc: 0.4872| loss: 1.4475| time: 1.34
valid| epoch  6| batch 15000/36000| acc: 0.4478| loss: 1.5344| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 20000/36000| acc: 0.4908| loss: 1.4344| time: 1.33
valid| epoch  6| batch 20000/36000| acc: 0.4519| loss: 1.5275| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 25000/36000| acc: 0.5005| loss: 1.4103| time: 1.17
valid| epoch  6| batch 25000/36000| acc: 0.4627| loss: 1.5025| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 30000/36000| acc: 0.4842| loss: 1.4434| time: 1.23
valid| epoch  6| batch 30000/36000| acc: 0.4479| loss: 1.5429| time: 0.00
-------------------------------------------------------------------------
train| epoch  6| batch 35000/36000| acc: 0.5034| loss: 1.4018| time: 0.96
valid| epoch  6| batch 35000/36000| ac