Skip to content

Jaycobson/Exploring-Knowledge-Distillation-in-Computer-Vision-for-Classification-Problem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Exploring-Knowledge-Distillation-in-Computer-Vision-for-Classification-Problem

I have previously worked with this road classification dataset and you can find it in my repository. I stumbled on knowledge distillation in a challenge and I decided to explore it in a simple classification problem such as this but before I go on, let me talk about knowledge distillation.

Knowlege distillation is the process of transfering knowledge from a large model to a smaller model. Usually, the large model is referred to as the teacher model while the smaller model is reffered to as the student model. The purpose of knowledge distillation is to have a smaller model for deployment to smaller devices like mobile phone. Large models are sometimes very complex and take a lot of space and time for processing but a smaller model would have a lower size and become faster, achieving similar performance as the large model.

The teacher model is first of all trained to ensure a high accuracy or low error, then the student model is also trained(shallow or deep). The distiller class in this file shows the knowledge distillation process. I used image augmentation methods to improve accuracy of all models and i also used pretrained models in the last phase.

image

The table above shows the result of each model. I realized that it is always a good option to always train the teacher very well before distilling the knowledge to a student. As shown, the teacher model had a poor result, hence it affected the result of the distiller. A good practice is to train for longer epochs, restructure the architecture or even do some hyperparameter tuning to ensure the teacher is well trained.

The teacher was redesigned with a pretrained model(mobilev2) and the knowledge was distilled to the student. Although the output of the distiller that was pretrained is a bit lower than the accuracy of the teacher, the ultimate goal is to reduce the size of the teacher for deployment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published