In [1]:
import model
import data
import tensorflow as tf
import numpy as np
import random

To avoid verbose warning messages... 

In [2]:
old_v = tf.logging.get_verbosity()
tf.logging.set_verbosity(tf.logging.ERROR)

Load MNIST dataset via a Dataset object:

In [3]:
dataset = data.Dataset(batch_size=128)

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz
55000


## Main Model Training

- ### Teacher Model

Training parameters:

In [5]:
learning_rate=0.001
num_steps = 500
batch_size = 128

Model parameters:

In [6]:
temperature = 1.0
dropout = 0.75

Extra:

In [7]:
checkpoint_dir="teachercpt"

Model definition:

In [8]:
teacher_model = model.BigModel(num_steps=num_steps, 
                               batch_size=batch_size,
                               learning_rate=learning_rate,
                               temperature=temperature,
                               dropoutprob=dropout,
                               checkpoint_dir=checkpoint_dir,
                               model_type="teacher");

Training:

At each step, the validation accuracy is computed and, if maximal, a model checkpoint is created. This is, in a way, analogous to **early stopping**.

In [9]:
teacher_model.start_session()
teacher_model.train(dataset)

Starting Training
Model Checkpointed to teachercpt\bigmodel.ckpt 
Step 1, Validation Loss= 52087.5781, Validation Accuracy= 0.095
Model Checkpointed to teachercpt\bigmodel.ckpt 
Step 100, Validation Loss= 2258.8335, Validation Accuracy= 0.886
Model Checkpointed to teachercpt\bigmodel.ckpt 
Step 200, Validation Loss= 1271.8615, Validation Accuracy= 0.928
Model Checkpointed to teachercpt\bigmodel.ckpt 
Step 300, Validation Loss= 917.8629, Validation Accuracy= 0.939
Model Checkpointed to teachercpt\bigmodel.ckpt 
Step 400, Validation Loss= 763.1020, Validation Accuracy= 0.947
Model Checkpointed to teachercpt\bigmodel.ckpt 
Step 500, Validation Loss= 618.6000, Validation Accuracy= 0.950
Model Checkpointed to teachercpt\bigmodel.ckpt 
Optimization Finished!


Test the **teacher model** (compute its accuracy againts the testing dataset) on the best model based on the validation set, this is, the *checkpointed model*. 

In [10]:
# Load the best model from created checkpoint
teacher_model.load_model_from_file(checkpoint_dir)
# Test the model against the testing set
teacher_model.run_inference(dataset)

Reading model parameters from teachercpt\bigmodel.ckpt
Testing Accuracy: 0.948645


In [11]:
# Close current tf session
teacher_model.close_session()

- ### Simple Student Model
Simple, as in, trained with the same data and parameters as the teacher model. 

Extra:

In [12]:
checkpoint_dir="studentcpt"

Model definition:

In [13]:
student_model = model.SmallModel(num_steps=num_steps, 
                                 batch_size=batch_size,
                                 learning_rate=learning_rate,
                                 temperature=temperature,
                                 dropoutprob=dropout,
                                 model_type="student");

Training:

At each step, the validation accuracy is computed and, if maximal, a model checkpoint is created. This is, in a way, analogous to **early stopping**.

In [14]:
student_model.start_session()
student_model.train(dataset)

Starting Training
Model Checkpointed to checkpoint\smallmodel 
Step 1, Validation Loss= 13.8730, Validation Accuracy= 0.080
Model Checkpointed to checkpoint\smallmodel 
Step 100, Validation Loss= 8.1598, Validation Accuracy= 0.158
Model Checkpointed to checkpoint\smallmodel 
Step 200, Validation Loss= 5.4582, Validation Accuracy= 0.272
Model Checkpointed to checkpoint\smallmodel 
Step 300, Validation Loss= 3.9543, Validation Accuracy= 0.400
Model Checkpointed to checkpoint\smallmodel 
Step 400, Validation Loss= 3.1070, Validation Accuracy= 0.496
Model Checkpointed to checkpoint\smallmodel 
Step 500, Validation Loss= 2.5562, Validation Accuracy= 0.560
Model Checkpointed to checkpoint\smallmodel 
Step 500, Validation Loss= 2.5562, Validation Accuracy= 0.560
Optimization Finished!


Test the **student model** (compute its accuracy againts the testing dataset) on the best model based on the validation set, this is, the *checkpointed model*. 

In [15]:
# Load the best model from created checkpoint
student_model.load_model_from_file(checkpoint_dir)
# Test the model against the testing set
student_model.run_inference(dataset)

Reading model parameters from studentcpt\smallmodel
Testing Accuracy: 0.636691


In [16]:
# Close current tf session
student_model.close_session()

- ### Distilled Student Model
Training data consists of the **logits** from the Teacher Model predictions of the standard training set.

Pretrained **teacher model** loading:

In [7]:
# Model definition
teacher_model = model.BigModel(num_steps=num_steps, 
                               batch_size=batch_size,
                               learning_rate=learning_rate,
                               temperature=temperature,
                               dropoutprob=dropout,
                               checkpoint_dir="teachercpt",
                               model_type="teacher");
# Start tf session
teacher_model.start_session()

In [7]:
# Load best model from teacher checkpoint
checkpoint_dir = "teachercpt"
teacher_model.load_model_from_file(checkpoint_dir)

Reading model parameters from teachercpt\bigmodel.ckpt


Verify **teacher** model state before training **student**:

In [8]:
# Test the model against the testing set
teacher_model.run_inference(dataset)

Testing Accuracy: 0.948791


Student model definition:

In [9]:
student_model = model.SmallModel(num_steps=num_steps, 
                                 batch_size=batch_size,
                                 learning_rate=learning_rate,
                                 temperature=temperature,
                                 dropoutprob=dropout,
                                 model_type="student");

Training:

At each step, the validation accuracy is computed and, if maximal, a model checkpoint is created. This is, in a way, analogous to **early stopping**.

In [10]:
student_model.start_session()
student_model.train(dataset, teacher_model)

Starting Training
Model Checkpointed to checkpoint\smallmodel 
Step 1, Validation Loss= 13.8398, Validation Accuracy= 0.090
Model Checkpointed to checkpoint\smallmodel 
Step 100, Validation Loss= 7.8430, Validation Accuracy= 0.188
Model Checkpointed to checkpoint\smallmodel 
Step 200, Validation Loss= 5.3653, Validation Accuracy= 0.300
Model Checkpointed to checkpoint\smallmodel 
Step 300, Validation Loss= 3.8572, Validation Accuracy= 0.417
Model Checkpointed to checkpoint\smallmodel 
Step 400, Validation Loss= 2.9456, Validation Accuracy= 0.500
Model Checkpointed to checkpoint\smallmodel 
Step 500, Validation Loss= 2.3688, Validation Accuracy= 0.570
Model Checkpointed to checkpoint\smallmodel 
Step 500, Validation Loss= 2.3688, Validation Accuracy= 0.570
Optimization Finished!


Test the **distilled student model** (compute its accuracy againts the testing dataset) on the best model based on the validation set, this is, the *checkpointed model*. 

In [11]:
checkpoint_dir = "studentcpt"
# Load the best model from created checkpoint
student_model.load_model_from_file(checkpoint_dir)
# Test the model against the testing set
student_model.run_inference(dataset)

Reading model parameters from studentcpt\smallmodel
Testing Accuracy: 0.636673


In [12]:
# Close current tf sessions
teacher_model.close_session()
student_model.close_session()

## Experiments

- ### 1. Learn from Probabilities
Take a class out from the training set for the Distilled Model, and later test the accuracy for that class. 