This notebook contains experiments for:

* Loss functions
* Learning rate decay
* Weight initialization
* Optimizers
* Dropout

# `lincoln` imports

In [4]:
import numpy as np

In [3]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
import lincoln
from lincoln.layers import Dense
from lincoln.losses import SoftmaxCrossEntropy, MeanSquaredError
from lincoln.optimizers import Optimizer, SGD, SGDMomentum
from lincoln.activations import Sigmoid, Tanh, Linear, ReLU
from lincoln.network import NeuralNetwork
from lincoln.train import Trainer
from lincoln.utils import mnist
from lincoln.utils.np_utils import softmax

In [4]:
# 使用MNIST加载器从lincoln库中加载数据
X_train, y_train, X_test, y_test = mnist.load()

In [5]:
# 查看训练数据的标签数
num_labels = len(y_train)
num_labels

60000

In [6]:
# one-hot encode
# 对训练数据标签进行one-hot编码。
# one-hot编码是将整数标签转换为二进制矢量的过程，例如5转换为[0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
num_labels = len(y_train)
train_labels = np.zeros((num_labels, 10))
for i in range(num_labels):
    train_labels[i][y_train[i]] = 1

# 对测试数据标签进行one-hot编码
num_labels = len(y_test)
test_labels = np.zeros((num_labels, 10))
for i in range(num_labels):
    test_labels[i][y_test[i]] = 1

# MNIST Demos

# Scale data to mean 0, variance 1

In [7]:
X_train, X_test = X_train - np.mean(X_train), X_test - np.mean(X_train)

In [8]:
np.min(X_train), np.max(X_train), np.min(X_test), np.max(X_test)

(-33.318421449829934,
 221.68157855017006,
 -33.318421449829934,
 221.68157855017006)

In [9]:
X_train, X_test = X_train / np.std(X_train), X_test / np.std(X_train)

In [10]:
np.min(X_train), np.max(X_train), np.min(X_test), np.max(X_test)

(-0.424073894391566, 2.821543345689335, -0.424073894391566, 2.821543345689335)

In [5]:
# 定义一个函数来计算模型在给定测试集上的准确率
def calc_accuracy_model(model, test_set):
    # 使用模型进行预测
    predictions = model.forward(test_set, inference=True)
    
    # 使用argmax找到每个预测向量中的最大值的索引，然后与真实的y_test标签进行比较
    # np.equal会返回一个布尔数组，表示预测值是否与真实值匹配
    # 使用sum来计算匹配的数量，并除以测试集的大小来计算准确率
    accuracy = np.equal(np.argmax(predictions, axis=1), y_test).sum() * 100.0 / test_set.shape[0]
    
    # 打印准确率
    return print(f'''The model validation accuracy is: {accuracy:.2f}%''')

## Softmax cross entropy

### Trying sigmoid activation

In [12]:

# 均方误差（MSE）:

# 在第三章中，使用了MSE作为损失函数。
# MSE具有凸性质。这意味着预测值与目标值之间的差距越大，损失函数传递给网络层的初始梯度就越陡，从而参数收到的所有梯度也会越大。
# 分类问题与softmax交叉熵损失:

# 对于分类问题，我们可以做得比MSE更好。
# 在分类问题中，网络的输出值应该被解释为概率。这意味着每个值应该在0和1之间，而且每个观察到的概率向量的总和应该为1。
# softmax交叉熵损失的优势:

# softmax交叉熵损失函数利用了上述特点，比相同输入的MSE产生更陡的梯度。
# 这个函数有两个组成部分：softmax函数和交叉熵损失。

model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh()),
            Dense(neurons=10, 
                  activation=Sigmoid())],
            loss = MeanSquaredError(normalize=False), 
seed=20190119)

trainer = Trainer(model, SGD(0.1))
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 50,
            eval_every = 10,
            seed=20190119,
            batch_size=60);
print()
calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.611
Validation loss after 20 epochs is 0.424
Validation loss after 30 epochs is 0.388
Validation loss after 40 epochs is 0.372
Validation loss after 50 epochs is 0.364

The model validation accuracy is: 72.80%


Note: even if we normalize the outputs of a classification model with mean squared error loss, it still doesn't help:

In [13]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh()),
            Dense(neurons=10, 
                  activation=Sigmoid())],
            loss = MeanSquaredError(normalize=True), 
seed=20190119)

trainer = Trainer(model, SGD(0.1))
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 50,
            eval_every = 10,
            seed=20190119,
            batch_size=60);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.952

Loss increased after epoch 20, final loss was 0.952, 
using the model from epoch 10
The model validation accuracy is: 41.73%


The reason is that we should be using softmax cross entropy loss!

#### Trying sigmoid activation

In [14]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Sigmoid()),
            Dense(neurons=10, 
                  activation=Linear())],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

trainer = Trainer(model, SGD(0.1))
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 130,
            eval_every = 1,
            seed=20190119,
            batch_size=60);
print()
calc_accuracy_model(model, X_test)

Validation loss after 1 epochs is 1.285
Validation loss after 2 epochs is 0.970
Validation loss after 3 epochs is 0.836
Validation loss after 4 epochs is 0.763
Validation loss after 5 epochs is 0.712
Validation loss after 6 epochs is 0.679
Validation loss after 7 epochs is 0.651
Validation loss after 8 epochs is 0.631
Validation loss after 9 epochs is 0.617
Validation loss after 10 epochs is 0.599
Validation loss after 11 epochs is 0.588
Validation loss after 12 epochs is 0.576
Validation loss after 13 epochs is 0.568
Validation loss after 14 epochs is 0.557
Validation loss after 15 epochs is 0.550
Validation loss after 16 epochs is 0.544
Validation loss after 17 epochs is 0.537
Validation loss after 18 epochs is 0.533
Validation loss after 19 epochs is 0.529
Validation loss after 20 epochs is 0.523
Validation loss after 21 epochs is 0.517
Validation loss after 22 epochs is 0.512
Validation loss after 23 epochs is 0.507

Loss increased after epoch 24, final loss was 0.507, 
using the m

#### Trying ReLU activation

In [15]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=ReLU()),
            Dense(neurons=10, 
                  activation=Linear())],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

trainer = Trainer(model, SGD(0.1))
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 50,
            eval_every = 10,
            seed=20190119,
            batch_size=60);
print()
calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 6.413

Loss increased after epoch 20, final loss was 6.413, 
using the model from epoch 10

The model validation accuracy is: 71.84%


In [16]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh()),
            Dense(neurons=10, 
                  activation=Linear())],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

trainer = Trainer(model, SGD(0.1))
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 50,
            eval_every = 10,
            seed=20190119,
            batch_size=60);
print()
calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.631
Validation loss after 20 epochs is 0.580
Validation loss after 30 epochs is 0.561
Validation loss after 40 epochs is 0.560
Validation loss after 50 epochs is 0.552

The model validation accuracy is: 90.92%


## SGD Momentum

In [17]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Sigmoid()),
            Dense(neurons=10, 
                  activation=Linear())],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

optim = SGDMomentum(0.1, momentum=0.9)

trainer = Trainer(model, SGDMomentum(0.1, momentum=0.9))
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 50,
            eval_every = 1,
            seed=20190119,
            batch_size=60);

calc_accuracy_model(model, X_test)

Validation loss after 1 epochs is 0.615
Validation loss after 2 epochs is 0.489
Validation loss after 3 epochs is 0.445

Loss increased after epoch 4, final loss was 0.445, 
using the model from epoch 3
The model validation accuracy is: 91.97%


In [18]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh()),
            Dense(neurons=10, 
                  activation=Linear())],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

optim = SGD(0.1)

optim = SGDMomentum(0.1, momentum=0.9)

trainer = Trainer(model, SGDMomentum(0.1, momentum=0.9))
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 50,
            eval_every = 10,
            seed=20190119,
            batch_size=60);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.387
Validation loss after 20 epochs is 0.333
Validation loss after 30 epochs is 0.316

Loss increased after epoch 40, final loss was 0.316, 
using the model from epoch 30
The model validation accuracy is: 95.36%


## Different weight decay

In [19]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh()),
            Dense(neurons=10, 
                  activation=Linear())],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

optimizer = SGDMomentum(0.15, momentum=0.9, final_lr = 0.05, decay_type='linear')

trainer = Trainer(model, optimizer)
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 50,
            eval_every = 10,
            seed=20190119,
            batch_size=60);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.389
Validation loss after 20 epochs is 0.307
Validation loss after 30 epochs is 0.290

Loss increased after epoch 40, final loss was 0.290, 
using the model from epoch 30
The model validation accuracy is: 95.98%


In [20]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh()),
            Dense(neurons=10, 
                  activation=Linear())],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

optimizer = SGDMomentum(0.2, 
                        momentum=0.9, 
                        final_lr = 0.05, 
                        decay_type='exponential')

trainer = Trainer(model, optimizer)
trainer.fit(X_train, train_labels, X_test, test_labels,
            epochs = 50,
            eval_every = 10,
            seed=20190119,
            batch_size=60);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.450
Validation loss after 20 epochs is 0.342
Validation loss after 30 epochs is 0.296

Loss increased after epoch 40, final loss was 0.296, 
using the model from epoch 30
The model validation accuracy is: 95.75%


## Changing weight init

In [21]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh(),
                  weight_init="glorot"),
            Dense(neurons=10, 
                  activation=Linear(),
                  weight_init="glorot")],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

optimizer = SGDMomentum(0.15, momentum=0.9, final_lr = 0.05, decay_type='linear')

trainer = Trainer(model, optimizer)
trainer.fit(X_train, train_labels, X_test, test_labels,
       epochs = 50,
       eval_every = 10,
       seed=20190119,
           batch_size=60,
           early_stopping=True);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.299
Validation loss after 20 epochs is 0.226

Loss increased after epoch 30, final loss was 0.226, 
using the model from epoch 20
The model validation accuracy is: 96.69%


In [22]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh(),
                  weight_init="glorot"),
            Dense(neurons=10, 
                  activation=Linear(),
                  weight_init="glorot")],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

trainer = Trainer(model, SGDMomentum(0.2, momentum=0.9, final_lr = 0.05, decay_type='exponential'))
trainer.fit(X_train, train_labels, X_test, test_labels,
       epochs = 50,
       eval_every = 10,
       seed=20190119,
           batch_size=60,
           early_stopping=True);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.378
Validation loss after 20 epochs is 0.264
Validation loss after 30 epochs is 0.259

Loss increased after epoch 40, final loss was 0.259, 
using the model from epoch 30
The model validation accuracy is: 96.48%


## Dropout

In [23]:
model = NeuralNetwork(
    layers=[Dense(neurons=89, 
                  activation=Tanh(),
                  weight_init="glorot",
                  dropout=0.8),
            Dense(neurons=10, 
                  activation=Linear(),
                  weight_init="glorot")],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

trainer = Trainer(model, SGDMomentum(0.2, momentum=0.9, final_lr = 0.05, decay_type='exponential'))
trainer.fit(X_train, train_labels, X_test, test_labels,
       epochs = 50,
       eval_every = 10,
       seed=20190119,
           batch_size=60,
           early_stopping=True);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.277
Validation loss after 20 epochs is 0.233
Validation loss after 30 epochs is 0.203
Validation loss after 40 epochs is 0.201

Loss increased after epoch 50, final loss was 0.201, 
using the model from epoch 40
The model validation accuracy is: 96.77%


## Deep Learning, with and without Dropout

In [24]:
model = NeuralNetwork(
    layers=[Dense(neurons=178, 
                  activation=Tanh(),
                  weight_init="glorot",
                  dropout=0.8),
            Dense(neurons=46, 
                  activation=Tanh(),
                  weight_init="glorot",
                  dropout=0.8),
            Dense(neurons=10, 
                  activation=Linear(),
                  weight_init="glorot")],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

trainer = Trainer(model, SGDMomentum(0.2, momentum=0.9, final_lr = 0.05, decay_type='exponential'))
trainer.fit(X_train, train_labels, X_test, test_labels,
       epochs = 100,
       eval_every = 10,
       seed=20190119,
           batch_size=60,
           early_stopping=True);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.315
Validation loss after 20 epochs is 0.283
Validation loss after 30 epochs is 0.258
Validation loss after 40 epochs is 0.225
Validation loss after 50 epochs is 0.204
Validation loss after 60 epochs is 0.197
Validation loss after 70 epochs is 0.175

Loss increased after epoch 80, final loss was 0.175, 
using the model from epoch 70
The model validation accuracy is: 97.34%


In [25]:
model = NeuralNetwork(
    layers=[Dense(neurons=178, 
                  activation=Tanh(),
                  weight_init="glorot"),
            Dense(neurons=46, 
                  activation=Tanh(),
                  weight_init="glorot"),
            Dense(neurons=10, 
                  activation=Linear(),
                  weight_init="glorot")],
            loss = SoftmaxCrossEntropy(), 
seed=20190119)

trainer = Trainer(model, SGDMomentum(0.2, momentum=0.9, final_lr = 0.05, decay_type='exponential'))
trainer.fit(X_train, train_labels, X_test, test_labels,
       epochs = 100,
       eval_every = 10,
       seed=20190119,
           batch_size=60,
           early_stopping=True);

calc_accuracy_model(model, X_test)

Validation loss after 10 epochs is 0.448
Validation loss after 20 epochs is 0.348
Validation loss after 30 epochs is 0.317
Validation loss after 40 epochs is 0.302
Validation loss after 50 epochs is 0.283
Validation loss after 60 epochs is 0.249

Loss increased after epoch 70, final loss was 0.249, 
using the model from epoch 60
The model validation accuracy is: 96.08%
