Knowledge-Distillation Implementation of the paper Distilling the Knowledge in a Neural Network Results on MNIST Test set size = 10000 Model Accuracy Teacher model 0.9847 Distilled student model 0.9810 Compression ratio = 2