# Mini-Project 1

Recap. session 03.04.2019: Discussion with teaching assistant. 

Do 3 different architectures:
1. Fully-Connected Networks (Deep & Shallow)
2. Convolution Neural Network 
3. Residual Neural Network

For each architecture: 
- output 0 or 1 (directly predict if number is smaller or not)
- output class (give digit of each image and do comparison)
- output 0 or 1 + class of digit (enables 3 different loss functions)

Note:
<br> - Do weight sharing: each image enters a different building block 

In [1]:
from helper_functions import *
from torch import nn
from torch.autograd import Variable

## Load datasets

In [2]:
N = 1000
train_input, train_target, train_classes, test_input, test_target, test_classes = generate_pair_sets(N)

Name | Tensor dimension | Type | Content
-----|-----|-----|-----
`train_input` | N × 2 × 14 × 14 | float32 | Images
`train_target` | N | int64 | Class to predict ∈ {0, 1}
`train_classes` | N × 2 | int64 | Classes of the two digits ∈ {0, . . . , 9}
`test_input` | N × 2 × 14 × 14 | float32 | Images
`test_target` | N | int64 | Class | to predict ∈ {0, 1}
`test_classes` | N × 2 | int64 | Classes of the two digits ∈ {0, . . . , 9}

# Fully-Connected Networks

## 1. Output 0 or 1

### Preprocess Data

In [3]:
train_input, test_input = reshape_data(train_input, test_input)

### Define Models

In [4]:
def create_shallow_model():
    return nn.Sequential(
        nn.Linear(196, 50),
        nn.BatchNorm1d(50),
        nn.ReLU(),
        nn.Linear(50, 1))

In [5]:
def create_deep_model():
    return nn.Sequential(
        nn.Linear(196, 4),
        nn.BatchNorm1d(4),
        nn.ReLU(),
        nn.Linear(4, 8),
        nn.BatchNorm1d(8),
        nn.ReLU(),
        nn.Linear(8, 16),
        nn.BatchNorm1d(16),
        nn.ReLU(),
        nn.Linear(16, 32),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.Linear(32, 64),
        nn.BatchNorm1d(64),
        nn.ReLU(),
        nn.Linear(64, 128),
        nn.BatchNorm1d(128),
        nn.ReLU(),
        nn.Linear(128, 1))

In [6]:
models = [create_shallow_model, create_deep_model]

### Test Models

In [7]:
# stds = [-1, 1e-3, 1e-2, 1e-1, 1e-0, 1e1]
# for model in models:
#     compute_errors(m=model, 
#                    train_input=train_input, 
#                    train_classes=, 
#                    test_input=test_input,
#                    test_classes=,
#                    stds=stds)

## 2. Output Classes

### Preprocess Data

In [8]:
train_input, test_input = reshape_data(train_input, test_input)
train_input, test_input, train_classes, test_classes = split_img_data(train_input, test_input, train_classes, test_classes)

train_input1 = train_input[0]
train_input2 = train_input[1]

test_input1 = test_input[0]
test_input2 = test_input[1]

train_classes1 = train_classes[0]
train_classes2 = train_classes[1]

test_classes1 = test_classes[0]
test_classes2 = test_classes[1]

test_input1 = 0.9*test_input1
test_input2 = 0.9*test_input2

train_input1 = 0.9*train_input1
train_input2 = 0.9*train_input2

test_input1 = 0.9*test_input1
test_input2 = 0.9*test_input2

train_classes1 = convert_to_one_hot_labels(train_input1, train_classes1)
train_classes2 = convert_to_one_hot_labels(train_input2, train_classes2)

test_classes1 = convert_to_one_hot_labels(test_input1, test_classes1)
test_classes2 = convert_to_one_hot_labels(test_input2, test_classes2)

train_input1, test_classes1 = normalize(train_input1, test_classes1)
train_input2, test_classes2 = normalize(train_input2, test_classes2)

In [9]:
train_input1, train_classes1 = Variable(train_input1), Variable(train_classes1)
test_input1, test_classes1 = Variable(test_input1), Variable(test_classes1)

### Define Models

In [10]:
def create_shallow_model():
    return nn.Sequential(
        nn.Linear(196, 50),
        nn.BatchNorm1d(50),
        nn.ReLU(),
        nn.Linear(50, 10))

In [11]:
def create_deep_model():
    return nn.Sequential(
        nn.Linear(196, 4),
        nn.BatchNorm1d(4),
        nn.ReLU(),
        nn.Linear(4, 8),
        nn.BatchNorm1d(8),
        nn.ReLU(),
        nn.Linear(8, 16),
        nn.BatchNorm1d(16),
        nn.ReLU(),
        nn.Linear(16, 32),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.Linear(32, 64),
        nn.BatchNorm1d(64),
        nn.ReLU(),
        nn.Linear(64, 128),
        nn.BatchNorm1d(128),
        nn.ReLU(),
        nn.Linear(128, 10))

In [12]:
models = [create_shallow_model, create_deep_model]

### Test Models

In [13]:
stds = [-1, 1e-3, 1e-2, 1e-1, 1e-0, 1e1]
for model in models:
    compute_errors(m=model, 
                   train_input=train_input1, 
                   train_classes=train_classes1, 
                   test_input=test_input1,
                   test_classes=test_classes1,
                   stds=stds,
                   one_hot_encoded=True)

std -1.000000 create_shallow_model train_error 0.00% test_error 14.10%
std 0.001000 create_shallow_model train_error 0.00% test_error 18.20%
std 0.010000 create_shallow_model train_error 0.00% test_error 16.40%
std 0.100000 create_shallow_model train_error 0.00% test_error 14.50%
std 1.000000 create_shallow_model train_error 4.40% test_error 20.80%
std 10.000000 create_shallow_model train_error 44.10% test_error 49.20%
std -1.000000 create_deep_model train_error 0.00% test_error 25.40%
std 0.001000 create_deep_model train_error 87.00% test_error 88.60%
std 0.010000 create_deep_model train_error 87.00% test_error 88.60%
std 0.100000 create_deep_model train_error 4.10% test_error 34.80%
std 1.000000 create_deep_model train_error 47.90% test_error 69.70%
std 10.000000 create_deep_model train_error 66.40% test_error 77.10%


In [14]:
# Xavier initialization
for model in models:
    compute_errors(m=model, 
                   train_input=train_input1, 
                   train_classes=train_classes1, 
                   test_input=test_input1,
                   test_classes=test_classes1,
                   stds=None,
                   one_hot_encoded=True)

Computed standard deviation according to 'Xavier initialization': 0.183
std 0.182574 create_shallow_model train_error 0.00% test_error 14.80%
Computed standard deviation according to 'Xavier initialization': 0.120
std 0.120386 create_deep_model train_error 26.70% test_error 47.40%


# Architecture: CNN 