# DRIO5201A - Deep Learning

# TP1

In [2]:
#Importer le fichier "network.py"
import network

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

In [3]:
#On importe les données du MNIST
import mnist_loader
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

In [4]:
#After loading the MNIST data, we'll set up a Network with 30 hidden neurons.
#We do this after importing the Python program listed above, which is named network.
net = network.Network([784, 30, 10])

In [9]:
#Finally, we'll use stochastic gradient descent to learn from the MNIST training_data over 30 epochs,
#with a mini-batch size of 10, and a learning rate of η=3.0η=3.0
net.SGD(training_data, 30, 10, 3.0, test_data=test_data)

Epoch 0: 9099 / 10000
Epoch 1: 9271 / 10000
Epoch 2: 9338 / 10000
Epoch 3: 9369 / 10000
Epoch 4: 9327 / 10000
Epoch 5: 9393 / 10000
Epoch 6: 9427 / 10000
Epoch 7: 9432 / 10000
Epoch 8: 9455 / 10000
Epoch 9: 9462 / 10000
Epoch 10: 9457 / 10000
Epoch 11: 9470 / 10000
Epoch 12: 9460 / 10000
Epoch 13: 9453 / 10000
Epoch 14: 9450 / 10000
Epoch 15: 9475 / 10000
Epoch 16: 9490 / 10000
Epoch 17: 9460 / 10000
Epoch 18: 9475 / 10000
Epoch 19: 9492 / 10000
Epoch 20: 9502 / 10000
Epoch 21: 9476 / 10000
Epoch 22: 9480 / 10000
Epoch 23: 9482 / 10000
Epoch 24: 9488 / 10000
Epoch 25: 9483 / 10000
Epoch 26: 9471 / 10000
Epoch 27: 9492 / 10000
Epoch 28: 9485 / 10000
Epoch 29: 9490 / 10000


In [10]:
#Let's rerun the above experiment, changing the number of hidden neurons to 100.
#As was the case earlier, if you're running the code as you read along,
#you should be warned that it takes quite a while to execute
#(on my machine this experiment takes tens of seconds for each training epoch),
#so it's wise to continue reading in parallel while the code executes.

net = network.Network([784, 100, 10])
net.SGD(training_data, 30, 10, 3.0, test_data=test_data)

Epoch 0: 4686 / 10000
Epoch 1: 4865 / 10000
Epoch 2: 5640 / 10000
Epoch 3: 5618 / 10000
Epoch 4: 6572 / 10000
Epoch 5: 6615 / 10000
Epoch 6: 6670 / 10000
Epoch 7: 7539 / 10000
Epoch 8: 7570 / 10000
Epoch 9: 7579 / 10000
Epoch 10: 7608 / 10000
Epoch 11: 7597 / 10000
Epoch 12: 7615 / 10000
Epoch 13: 7618 / 10000
Epoch 14: 7627 / 10000
Epoch 15: 7612 / 10000
Epoch 16: 7624 / 10000
Epoch 17: 7622 / 10000
Epoch 18: 7623 / 10000
Epoch 19: 7642 / 10000
Epoch 20: 7645 / 10000
Epoch 21: 7641 / 10000
Epoch 22: 7641 / 10000
Epoch 23: 7649 / 10000
Epoch 24: 7650 / 10000
Epoch 25: 7648 / 10000
Epoch 26: 7644 / 10000
Epoch 27: 7637 / 10000
Epoch 28: 7647 / 10000
Epoch 29: 7658 / 10000


Sure enough, this improves the results to 96.59 percent. At least in this case, using more hidden neurons helps us get better results.Reader feedback indicates quite some variation in results for this experiment, and some training runs give results quite a bit worse. Using the techniques introduced in chapter 3 will greatly reduce the variation in performance across different training runs for our networks..

Of course, to obtain these accuracies I had to make specific choices for the number of epochs of training, the mini-batch size, and the learning rate, ηη. As I mentioned above, these are known as hyper-parameters for our neural network, in order to distinguish them from the parameters (weights and biases) learnt by our learning algorithm. If we choose our hyper-parameters poorly, we can get bad results. Suppose, for example, that we'd chosen the learning rate to be η=0.001η=0.001,

In [11]:
net = network.Network([784, 100, 10])
net.SGD(training_data, 30, 10, 0.001, test_data=test_data)

Epoch 0: 988 / 10000
Epoch 1: 998 / 10000
Epoch 2: 1071 / 10000
Epoch 3: 1127 / 10000
Epoch 4: 1151 / 10000
Epoch 5: 1197 / 10000
Epoch 6: 1237 / 10000
Epoch 7: 1289 / 10000
Epoch 8: 1339 / 10000
Epoch 9: 1379 / 10000
Epoch 10: 1414 / 10000
Epoch 11: 1438 / 10000
Epoch 12: 1475 / 10000
Epoch 13: 1519 / 10000
Epoch 14: 1572 / 10000
Epoch 15: 1619 / 10000
Epoch 16: 1655 / 10000
Epoch 17: 1698 / 10000
Epoch 18: 1734 / 10000
Epoch 19: 1780 / 10000
Epoch 20: 1820 / 10000
Epoch 21: 1853 / 10000
Epoch 22: 1894 / 10000
Epoch 23: 1956 / 10000
Epoch 24: 2008 / 10000
Epoch 25: 2042 / 10000
Epoch 26: 2076 / 10000
Epoch 27: 2118 / 10000
Epoch 28: 2162 / 10000
Epoch 29: 2194 / 10000


# TP2

In [5]:
import network2
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
net.large_weight_initializer()
net.SGD(training_data, 30, 10, 0.5, evaluation_data=test_data, monitor_evaluation_accuracy=True)

Epoch 0 training complete
Accuracy on evaluation data: 9077 / 10000

Epoch 1 training complete
Accuracy on evaluation data: 9272 / 10000

Epoch 2 training complete
Accuracy on evaluation data: 9373 / 10000

Epoch 3 training complete
Accuracy on evaluation data: 9411 / 10000

Epoch 4 training complete
Accuracy on evaluation data: 9422 / 10000

Epoch 5 training complete
Accuracy on evaluation data: 9431 / 10000

Epoch 6 training complete
Accuracy on evaluation data: 9458 / 10000

Epoch 7 training complete
Accuracy on evaluation data: 9434 / 10000

Epoch 8 training complete
Accuracy on evaluation data: 9476 / 10000

Epoch 9 training complete
Accuracy on evaluation data: 9438 / 10000

Epoch 10 training complete
Accuracy on evaluation data: 9491 / 10000

Epoch 11 training complete
Accuracy on evaluation data: 9512 / 10000

Epoch 12 training complete
Accuracy on evaluation data: 9488 / 10000

Epoch 13 training complete
Accuracy on evaluation data: 9541 / 10000

Epoch 14 training complete
Acc

([],
 [9077,
  9272,
  9373,
  9411,
  9422,
  9431,
  9458,
  9434,
  9476,
  9438,
  9491,
  9512,
  9488,
  9541,
  9534,
  9498,
  9539,
  9543,
  9510,
  9524,
  9540,
  9519,
  9497,
  9526,
  9517,
  9526,
  9551,
  9526,
  9536,
  9539],
 [],
 [])

In [6]:
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost) 
net.large_weight_initializer()
net.SGD(training_data[:1000], 400, 10, 0.5, evaluation_data=test_data, monitor_evaluation_accuracy=True, monitor_training_cost=True)

Epoch 0 training complete
Cost on training data: 1.85777569795
Accuracy on evaluation data: 5738 / 10000

Epoch 1 training complete
Cost on training data: 1.3042378396
Accuracy on evaluation data: 6952 / 10000

Epoch 2 training complete
Cost on training data: 1.04570121369
Accuracy on evaluation data: 7243 / 10000

Epoch 3 training complete
Cost on training data: 0.896191737276
Accuracy on evaluation data: 7532 / 10000

Epoch 4 training complete
Cost on training data: 0.730838204236
Accuracy on evaluation data: 7751 / 10000

Epoch 5 training complete
Cost on training data: 0.630722935236
Accuracy on evaluation data: 7927 / 10000

Epoch 6 training complete
Cost on training data: 0.552494613584
Accuracy on evaluation data: 8000 / 10000

Epoch 7 training complete
Cost on training data: 0.472837290136
Accuracy on evaluation data: 8046 / 10000

Epoch 8 training complete
Cost on training data: 0.438506237424
Accuracy on evaluation data: 8104 / 10000

Epoch 9 training complete
Cost on trainin

KeyboardInterrupt: 

In [7]:
import mnist_loader 
training_data, validation_data, test_data = mnist_loader.load_data_wrapper() 
import network2 
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
net.large_weight_initializer()
net.SGD(training_data[:1000], 400, 10, 0.5,evaluation_data=test_data, lmbda = 0.1,
        monitor_evaluation_cost=True, monitor_evaluation_accuracy=True,
        monitor_training_cost=True, monitor_training_accuracy=True)

Epoch 0 training complete
Cost on training data: 3.01171313621
Accuracy on training data: 669 / 1000
Cost on evaluation data: 2.24653679603
Accuracy on evaluation data: 5678 / 10000

Epoch 1 training complete
Cost on training data: 2.60169647596
Accuracy on training data: 762 / 1000
Cost on evaluation data: 1.96560088611
Accuracy on evaluation data: 6439 / 10000

Epoch 2 training complete
Cost on training data: 2.30962048631
Accuracy on training data: 825 / 1000
Cost on evaluation data: 1.72977951993
Accuracy on evaluation data: 6994 / 10000

Epoch 3 training complete
Cost on training data: 2.11462714929
Accuracy on training data: 851 / 1000
Cost on evaluation data: 1.5841266445
Accuracy on evaluation data: 7276 / 10000

Epoch 4 training complete
Cost on training data: 1.94563361911
Accuracy on training data: 894 / 1000
Cost on evaluation data: 1.5111658934
Accuracy on evaluation data: 7391 / 10000

Epoch 5 training complete
Cost on training data: 1.82147250306
Accuracy on training dat

KeyboardInterrupt: 

In [8]:
net.large_weight_initializer()
net.SGD(training_data, 30, 10, 0.5,
        evaluation_data=test_data, lmbda = 5.0,
        monitor_evaluation_accuracy=True, monitor_training_accuracy=True)

Epoch 0 training complete
Accuracy on training data: 45597 / 50000
Accuracy on evaluation data: 9123 / 10000

Epoch 1 training complete
Accuracy on training data: 46868 / 50000
Accuracy on evaluation data: 9357 / 10000

Epoch 2 training complete
Accuracy on training data: 47237 / 50000
Accuracy on evaluation data: 9425 / 10000

Epoch 3 training complete
Accuracy on training data: 47567 / 50000
Accuracy on evaluation data: 9472 / 10000

Epoch 4 training complete
Accuracy on training data: 47542 / 50000
Accuracy on evaluation data: 9431 / 10000

Epoch 5 training complete
Accuracy on training data: 47833 / 50000
Accuracy on evaluation data: 9513 / 10000

Epoch 6 training complete
Accuracy on training data: 48032 / 50000
Accuracy on evaluation data: 9545 / 10000

Epoch 7 training complete
Accuracy on training data: 48033 / 50000
Accuracy on evaluation data: 9525 / 10000

Epoch 8 training complete
Accuracy on training data: 47754 / 50000
Accuracy on evaluation data: 9491 / 10000

Epoch 9 tr

([],
 [9123,
  9357,
  9425,
  9472,
  9431,
  9513,
  9545,
  9525,
  9491,
  9598,
  9581,
  9587,
  9510,
  9590,
  9601,
  9588,
  9578,
  9552,
  9555,
  9587,
  9633,
  9562,
  9583,
  9618,
  9627,
  9587,
  9579,
  9580,
  9580,
  9502],
 [],
 [45597,
  46868,
  47237,
  47567,
  47542,
  47833,
  48032,
  48033,
  47754,
  48242,
  48273,
  48336,
  47943,
  48311,
  48348,
  48413,
  48248,
  48227,
  48198,
  48360,
  48556,
  48229,
  48359,
  48445,
  48494,
  48369,
  48215,
  48447,
  48200,
  48005])

In [9]:
net = network2.Network([784, 100, 10], cost=network2.CrossEntropyCost)
net.large_weight_initializer()
net.SGD(training_data, 30, 10, 0.5, lmbda=5.0,
        evaluation_data=validation_data,
        monitor_evaluation_accuracy=True)

Epoch 0 training complete
Accuracy on evaluation data: 9406 / 10000

Epoch 1 training complete
Accuracy on evaluation data: 9507 / 10000

Epoch 2 training complete
Accuracy on evaluation data: 9638 / 10000

Epoch 3 training complete
Accuracy on evaluation data: 9683 / 10000

Epoch 4 training complete
Accuracy on evaluation data: 9695 / 10000

Epoch 5 training complete
Accuracy on evaluation data: 9699 / 10000

Epoch 6 training complete
Accuracy on evaluation data: 9738 / 10000

Epoch 7 training complete
Accuracy on evaluation data: 9737 / 10000

Epoch 8 training complete
Accuracy on evaluation data: 9696 / 10000

Epoch 9 training complete
Accuracy on evaluation data: 9698 / 10000

Epoch 10 training complete
Accuracy on evaluation data: 9708 / 10000

Epoch 11 training complete
Accuracy on evaluation data: 9737 / 10000

Epoch 12 training complete
Accuracy on evaluation data: 9726 / 10000

Epoch 13 training complete
Accuracy on evaluation data: 9725 / 10000

Epoch 14 training complete
Acc

([],
 [9406,
  9507,
  9638,
  9683,
  9695,
  9699,
  9738,
  9737,
  9696,
  9698,
  9708,
  9737,
  9726,
  9725,
  9725,
  9727,
  9752,
  9726,
  9755,
  9756,
  9720,
  9781,
  9729,
  9782,
  9775,
  9760,
  9729,
  9754,
  9774,
  9772],
 [],
 [])

In [11]:
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
net.SGD(training_data, 30, 10, 0.1, lmbda = 5.0,
            evaluation_data=validation_data,
            monitor_evaluation_accuracy=True)

Epoch 0 training complete
Accuracy on evaluation data: 9293 / 10000

Epoch 1 training complete
Accuracy on evaluation data: 9428 / 10000

Epoch 2 training complete
Accuracy on evaluation data: 9477 / 10000

Epoch 3 training complete
Accuracy on evaluation data: 9516 / 10000

Epoch 4 training complete
Accuracy on evaluation data: 9539 / 10000

Epoch 5 training complete
Accuracy on evaluation data: 9572 / 10000

Epoch 6 training complete
Accuracy on evaluation data: 9581 / 10000

Epoch 7 training complete
Accuracy on evaluation data: 9595 / 10000

Epoch 8 training complete
Accuracy on evaluation data: 9590 / 10000

Epoch 9 training complete
Accuracy on evaluation data: 9614 / 10000

Epoch 10 training complete
Accuracy on evaluation data: 9617 / 10000

Epoch 11 training complete
Accuracy on evaluation data: 9618 / 10000

Epoch 12 training complete
Accuracy on evaluation data: 9607 / 10000

Epoch 13 training complete
Accuracy on evaluation data: 9617 / 10000

Epoch 14 training complete
Acc

([],
 [9293,
  9428,
  9477,
  9516,
  9539,
  9572,
  9581,
  9595,
  9590,
  9614,
  9617,
  9618,
  9607,
  9617,
  9643,
  9617,
  9625,
  9649,
  9644,
  9658,
  9641,
  9645,
  9638,
  9662,
  9664,
  9639,
  9669,
  9670,
  9671,
  9671],
 [],
 [])

In [12]:
net = network2.Network([784, 30, 10], cost=network2.CrossEntropyCost)
net.SGD(training_data, 30, 10, 0.5, lmbda = 5.0,
        evaluation_data=validation_data,
        monitor_evaluation_accuracy=True,
        monitor_evaluation_cost=True,
        monitor_training_accuracy=True,
        monitor_training_cost=True)

Epoch 0 training complete
Cost on training data: 0.475760897816
Accuracy on training data: 47069 / 50000
Cost on evaluation data: 0.788086647832
Accuracy on evaluation data: 9436 / 10000

Epoch 1 training complete
Cost on training data: 0.448170197545
Accuracy on training data: 47485 / 50000
Cost on evaluation data: 0.862582854958
Accuracy on evaluation data: 9492 / 10000

Epoch 2 training complete
Cost on training data: 0.434793024323
Accuracy on training data: 47711 / 50000
Cost on evaluation data: 0.903366013817
Accuracy on evaluation data: 9498 / 10000

Epoch 3 training complete
Cost on training data: 0.455624164815
Accuracy on training data: 47582 / 50000
Cost on evaluation data: 0.95958768833
Accuracy on evaluation data: 9497 / 10000

Epoch 4 training complete
Cost on training data: 0.407773068196
Accuracy on training data: 48021 / 50000
Cost on evaluation data: 0.929492915337
Accuracy on evaluation data: 9560 / 10000

Epoch 5 training complete
Cost on training data: 0.4007019792

([0.7880866478319567,
  0.8625828549579978,
  0.90336601381674875,
  0.959587688330239,
  0.92949291533743406,
  0.92926529016464399,
  0.97846225980416224,
  0.95313047808080831,
  0.94729791154201881,
  0.95983975171154734,
  1.0359718750753271,
  0.96875088175919166,
  0.94377080352810616,
  0.93259036693719921,
  0.94462371113016863,
  0.94913878264627038,
  0.986557633508515,
  0.96505552864254063,
  0.98652514591620988,
  0.96684612172510942,
  0.97487232519711964,
  0.96209167712255117,
  0.95357593371202687,
  0.96522465951347913,
  0.95727589707521266,
  0.94820427353213965,
  0.94447426947694713,
  0.94886724991080262,
  0.96911412857539858,
  0.95316629167398881],
 [9436,
  9492,
  9498,
  9497,
  9560,
  9575,
  9515,
  9581,
  9585,
  9551,
  9455,
  9544,
  9589,
  9609,
  9594,
  9606,
  9550,
  9580,
  9559,
  9576,
  9567,
  9606,
  9613,
  9572,
  9587,
  9604,
  9601,
  9592,
  9577,
  9615],
 [0.47576089781557829,
  0.44817019754469789,
  0.4347930243229165,
  0.455

# TP3

Install Theano : http://stackoverflow.com/questions/33687103/how-to-install-theano-on-anaconda-python-2-7-x64-on-windows

In [1]:
import network3
from network3 import Network
from network3 import ConvPoolLayer, FullyConnectedLayer, SoftmaxLayer
training_data, validation_data, test_data = network3.load_data_shared()
mini_batch_size = 10
net = Network([
        FullyConnectedLayer(n_in=784, n_out=100),
        SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)
net.SGD(training_data, 60, mini_batch_size, 0.1, 
            validation_data, test_data)

  "downsample module has been moved to the theano.tensor.signal.pool module.")


Running with a CPU.  If this is not desired, then the modify network3.py to set
the GPU flag to True.
Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Epoch 0: validation accuracy 92.36%
This is the best validation accuracy to date.
The corresponding test accuracy is 91.72%
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Epoch 1: validation accuracy 94.43%
This is the best validation accuracy to date.
The corresponding test accuracy is 93.80%
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Epoch 2: validation accuracy 95.56%
This is the best validation accuracy to date.
The corresponding test accuracy is 94.93%
Training mini-batch number 15000
Training mi

In [5]:
net = Network([
        ConvPoolLayer(image_shape=(mini_batch_size, 1, 28, 28), 
                      filter_shape=(20, 1, 5, 5), 
                      poolsize=(2, 2)),
        ConvPoolLayer(image_shape=(mini_batch_size, 20, 12, 12), 
                      filter_shape=(40, 20, 5, 5), 
                      poolsize=(2, 2)),
        FullyConnectedLayer(n_in=40*4*4, n_out=100),
        SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)
net.SGD(training_data, 60, mini_batch_size, 0.1, 
            validation_data, test_data)  

Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Epoch 0: validation accuracy 87.34%
This is the best validation accuracy to date.
The corresponding test accuracy is 86.37%
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Epoch 1: validation accuracy 96.62%
This is the best validation accuracy to date.
The corresponding test accuracy is 96.24%
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Epoch 2: validation accuracy 97.62%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.15%
Training mini-batch number 15000
Training mini-batch number 16000
Training mini-batch number 17000
Training mini-batch number 18000
Training mini-

KeyboardInterrupt: 

In [7]:
from network3 import ReLU
net = Network([
        ConvPoolLayer(image_shape=(mini_batch_size, 1, 28, 28), 
                      filter_shape=(20, 1, 5, 5), 
                      poolsize=(2, 2), 
                      activation_fn=ReLU),
        ConvPoolLayer(image_shape=(mini_batch_size, 20, 12, 12), 
                      filter_shape=(40, 20, 5, 5), 
                      poolsize=(2, 2), 
                      activation_fn=ReLU),
        FullyConnectedLayer(n_in=40*4*4, n_out=100, activation_fn=ReLU),
        SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)
net.SGD(training_data, 60, mini_batch_size, 0.03, 
            validation_data, test_data, lmbda=0.1)

Training mini-batch number 0
Training mini-batch number 1000
Training mini-batch number 2000
Training mini-batch number 3000
Training mini-batch number 4000
Epoch 0: validation accuracy 97.74%
This is the best validation accuracy to date.
The corresponding test accuracy is 97.67%
Training mini-batch number 5000
Training mini-batch number 6000
Training mini-batch number 7000
Training mini-batch number 8000
Training mini-batch number 9000
Epoch 1: validation accuracy 98.05%
This is the best validation accuracy to date.
The corresponding test accuracy is 98.09%
Training mini-batch number 10000
Training mini-batch number 11000
Training mini-batch number 12000
Training mini-batch number 13000
Training mini-batch number 14000
Epoch 2: validation accuracy 98.18%
This is the best validation accuracy to date.
The corresponding test accuracy is 98.13%
Training mini-batch number 15000
Training mini-batch number 16000
Training mini-batch number 17000
Training mini-batch number 18000
Training mini-