Skip to content
This repository has been archived by the owner on May 1, 2023. It is now read-only.

About Knowledge Distillation #102

Closed
RgZhangLihao opened this issue Dec 11, 2018 · 3 comments
Closed

About Knowledge Distillation #102

RgZhangLihao opened this issue Dec 11, 2018 · 3 comments

Comments

@RgZhangLihao
Copy link

RgZhangLihao commented Dec 11, 2018

I've read the Q&A in #90 .And I want to train a student model(preact_resnet20_cifar)from a preact_resnet44_cifar.Here is the command line I used to train the teacher model:
python compress_classifier.py -a preact_resnet44_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 .
The KD command line:
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 --kd-teacher preact_resnet44_cifar --kd-resume logs/2018.12.11-130318/checkpoint.pth.tar --kd-temp 5.0 --kd-dw 0.7 --kd-sw 0.3
I got the wrong message:
`==> using cifar10 dataset
=> creating preact_resnet44_cifar model for CIFAR10
=> loading checkpoint logs/2018.12.11-130318/checkpoint.pth.tar
Checkpoint keys:
epoch
arch
state_dict
best_top1
optimizer
compression_sched
quantizer_metadata
best top@1: 48.000
Loaded compression schedule from checkpoint (epoch 2)
Loaded quantizer metadata from the checkpoint

Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'

Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, , _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'
`
I don't know how could it happen.The other question is:Must the teacher model be deeper than the student model?
_

@guyjacob
Copy link
Contributor

It looks like you trained the teacher model with DoReFa quantization. Resuming from quantization aware training is still an open issue, for which there is a workaround you can apply - see here (Note that in the link WRPNQuantizer is modified, you'll need to modify DorefaQuantizer).

But - Is there a specific reason you'd want to have a quantized model as the teacher? An FP32 baseline would at least as good.

Regarding your question on teacher/student depth - there are not requirements on the teacher model. Having said that, the purpose of knowledge distillation is to use a model with bigger representational capacity to help train a model with smaller capacity. So using a shallower teacher goes against that purpose.
If you do use a shallower teacher, I guess it might help in the early stages of training to "point" the student in the right direction. But at some point you'd expect the deeper student model to surpass the performance of the shallower teacher model, at which point it doesn't make sense to continue with the distillation. In any case, I haven't tried it myself, and assuming you don't have any specific restrictions, I don't see a reason to do it.

@RgZhangLihao
Copy link
Author

@guyjacob
HI,your answer really helps.I use fp32 to train resnet56 as teacher and resnet20 as student.It works.
Thank you ^.^

@guyjacob
Copy link
Contributor

You're welcome!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants