Skip to content

Learn about making a smaller network as good as a big ensemble model that can accelarate inference time.

Notifications You must be signed in to change notification settings

abhishm/dark_knowledge

Repository files navigation

Dark Knowledge

This repository contains my work of replicating the Geoffrey Hinton's work on distilling the knowledge from a big ensemble network to a smaller neural network.

Key points

  1. To replicate the work, I created a custom optimizer that you can find here. This optimizer make sure that the norm of weights going to each individual neuron does not exceed a certain threshold.
  2. The main results of this work are as follwoing:
    1. The missclassification error of the big ensemble network is 101.
    2. The missclassification error of the smaller network is 196.
    3. The missclassification error of the smaller network trained on the probabilities of ensemble network is 134.
  3. Using the code, you can also learn how to use multiple tensorflow graphs within one python file. I created separate tensorflow graph for the ensemble model and the distill model. I then feed the probabilities generated by ensemble model to the distill model during the training.

Use

We first need to train an ensemble model and save the model. We can achieve this by running the following code:

python model.py -n ensemble

We can train a distill model using the probabilities of this ensemble model by the following code:

python distill_knowledge.py

Conclusion:

The model trained on probabilities is certainly giving better results than the model trained on labels. However, I am not able to get the direct replications of the work in the paper.

About

Learn about making a smaller network as good as a big ensemble model that can accelarate inference time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages