# Mini-Project 1

Recap. session 03.04.2019: Discussion with teaching assistant. 

Do 3 different architectures:
1. Fully-Connected Networks (Deep & Shallow)
2. Convolution Neural Network 
3. Residual Neural Network

For each architecture: 
- output 0 or 1 (directly predict if number is smaller or not)
- output class (give digit of each image and do comparison)
- output 0 or 1 + class of digit (enables 3 different loss functions)

Note:
<br> - Do weight sharing: each image enters a different building block 

In [1]:
from helper_functions import *
from torch import nn
from torch.autograd import Variable

## Load datasets

In [2]:
N = 1000
train_input, train_target, train_classes, test_input, test_target, test_classes = generate_pair_sets(N)

Name | Tensor dimension | Type | Content
-----|-----|-----|-----
`train_input` | N × 2 × 14 × 14 | float32 | Images
`train_target` | N | int64 | Class to predict ∈ {0, 1}
`train_classes` | N × 2 | int64 | Classes of the two digits ∈ {0, . . . , 9}
`test_input` | N × 2 × 14 × 14 | float32 | Images
`test_target` | N | int64 | Class | to predict ∈ {0, 1}
`test_classes` | N × 2 | int64 | Classes of the two digits ∈ {0, . . . , 9}

# Fully-Connected Networks

## 1. Output 0 or 1

Name | Tensor dimension | Type | Content
-----|-----|-----|-----
`x_train` | N × 392 | float32 | Images
`y_train` | N | int64 | Class to predict ∈ {0, 1}
`x_test` | N × 392| float32 | Images
`y_test` | N | int64 | Class | to predict ∈ {0, 1}

### Preprocess Data

In [3]:
reshape = True
one_hot_encoded = False
split = False
normalized = True

In [4]:
x_train = train_input
y_train = train_target
x_test = test_input
y_test = test_target

In [5]:
x_train, y_train, x_test, y_test = preprocess_data(x_train, 
                                                   y_train, 
                                                   x_test, 
                                                   y_test, 
                                                   reshape, 
                                                   one_hot_encoded, 
                                                   split, 
                                                   normalized)

In [6]:
x_train, y_train = Variable(x_train), Variable(y_train)
x_test, y_test = Variable(x_test), Variable(y_test)

In [7]:
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

torch.Size([1000, 392])
torch.Size([1000, 1])
torch.Size([1000, 392])
torch.Size([1000, 1])


### Define Models

In [8]:
def create_shallow_model():
    return nn.Sequential(
        nn.Linear(392, 50),
        nn.BatchNorm1d(50),
        nn.ReLU(),
        nn.Linear(50, 2))
# TODO ask why output must be 2

In [9]:
def create_deep_model():
    return nn.Sequential(
        nn.Linear(392, 4),
        nn.BatchNorm1d(4),
        nn.ReLU(),
        nn.Linear(4, 8),
        nn.BatchNorm1d(8),
        nn.ReLU(),
        nn.Linear(8, 16),
        nn.BatchNorm1d(16),
        nn.ReLU(),
        nn.Linear(16, 32),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.Linear(32, 64),
        nn.BatchNorm1d(64),
        nn.ReLU(),
        nn.Linear(64, 128),
        nn.BatchNorm1d(128),
        nn.ReLU(),
        nn.Linear(128, 2))

In [10]:
models = [create_shallow_model, create_deep_model]

### Test Models

In [11]:
stds = [-1, 1e-3, 1e-2, 1e-1, 1e-0, 1e1]
for model in models:
    compute_errors(model, 
                   x_train, 
                   y_train, 
                   x_test,
                   y_test,
                   stds,
                   one_hot_encoded)

std -1.000000 create_shallow_model train_error 0.00% test_error 24.60%
std 0.001000 create_shallow_model train_error 0.00% test_error 26.40%
std 0.010000 create_shallow_model train_error 0.00% test_error 25.80%
std 0.100000 create_shallow_model train_error 0.00% test_error 25.90%
std 1.000000 create_shallow_model train_error 14.10% test_error 26.00%
std 10.000000 create_shallow_model train_error 36.90% test_error 39.30%
std -1.000000 create_deep_model train_error 0.10% test_error 29.10%
std 0.001000 create_deep_model train_error 44.90% test_error 47.40%
std 0.010000 create_deep_model train_error 44.90% test_error 47.40%
std 0.100000 create_deep_model train_error 2.90% test_error 29.60%
std 1.000000 create_deep_model train_error 24.50% test_error 51.40%
std 10.000000 create_deep_model train_error 16.30% test_error 45.90%


## 2. Output Classes

Name | Tensor dimension | Type | Content
-----|-----|-----|-----
`x_train` | N × 2 × 14 × 14 | float32 | Images
`y_train` | N × 2 | int64 | Classes of the two digits ∈ {0, . . . , 9}
`x_test` | N × 2 × 14 × 14 | float32 | Images
`y_test` | N × 2 | int64 | Classes of the two digits ∈ {0, . . . , 9}
`x_train1` | N × 196 | float32 | Images
`x_train2` | N × 196 | float32 | Images
`y_train1` | N | int64 | Classes of the two digits ∈ {0, . . . , 9}
`y_train2` | N | int64 | Classes of the two digits ∈ {0, . . . , 9}

### Preprocess Data

In [12]:
reshape = True
one_hot_encoded = True
split = True
normalized = True

In [13]:
x_train = train_input
y_train = train_classes
x_test = test_input
y_test = test_classes

In [14]:
x_train, y_train, x_test, y_test = preprocess_data(x_train, 
                                                   y_train, 
                                                   x_test, 
                                                   y_test, 
                                                   reshape, 
                                                   one_hot_encoded, 
                                                   split, 
                                                   normalized)

In [15]:
x_train1, y_train1 = Variable(x_train[0]), Variable(y_train[0])
x_test1, y_test1 = Variable(x_test[0]), Variable(y_test[0])

In [16]:
print(x_train1.shape)

torch.Size([1000, 196])


### Define Models

In [17]:
def create_shallow_model():
    return nn.Sequential(
        nn.Linear(196, 50),
        nn.BatchNorm1d(50),
        nn.ReLU(),
        nn.Linear(50, 10))

In [18]:
def create_deep_model():
    return nn.Sequential(
        nn.Linear(196, 4),
        nn.BatchNorm1d(4),
        nn.ReLU(),
        nn.Linear(4, 8),
        nn.BatchNorm1d(8),
        nn.ReLU(),
        nn.Linear(8, 16),
        nn.BatchNorm1d(16),
        nn.ReLU(),
        nn.Linear(16, 32),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.Linear(32, 64),
        nn.BatchNorm1d(64),
        nn.ReLU(),
        nn.Linear(64, 128),
        nn.BatchNorm1d(128),
        nn.ReLU(),
        nn.Linear(128, 10))

In [19]:
models = [create_shallow_model, create_deep_model]

### Test Models

In [20]:
stds = [-1, 1e-3, 1e-2, 1e-1, 1e-0, 1e1]
for model in models:
    compute_errors(model, 
                   x_train1, 
                   y_train1, 
                   x_test1,
                   y_test1,
                   stds,
                   one_hot_encoded)

std -1.000000 create_shallow_model train_error 0.00% test_error 14.10%
std 0.001000 create_shallow_model train_error 0.00% test_error 16.00%
std 0.010000 create_shallow_model train_error 0.00% test_error 15.90%
std 0.100000 create_shallow_model train_error 0.00% test_error 14.70%
std 1.000000 create_shallow_model train_error 6.10% test_error 18.60%
std 10.000000 create_shallow_model train_error 48.00% test_error 50.30%
std -1.000000 create_deep_model train_error 0.00% test_error 26.30%
std 0.001000 create_deep_model train_error 87.00% test_error 88.60%
std 0.010000 create_deep_model train_error 87.00% test_error 88.60%
std 0.100000 create_deep_model train_error 6.30% test_error 34.70%
std 1.000000 create_deep_model train_error 34.80% test_error 69.80%
std 10.000000 create_deep_model train_error 65.30% test_error 78.70%


In [21]:
# Xavier initialization
for model in models:
    compute_errors(model, 
                   x_train1, 
                   y_train1, 
                   x_test1,
                   y_test1,
                   None,
                   one_hot_encoded)

Computed standard deviation according to 'Xavier initialization': 0.183
std 0.182574 create_shallow_model train_error 0.00% test_error 14.90%
Computed standard deviation according to 'Xavier initialization': 0.120
std 0.120386 create_deep_model train_error 9.50% test_error 36.80%
