About Knowledge Distillation #102

RgZhangLihao · 2018-12-11T05:22:01Z

I've read the Q&A in #90 .And I want to train a student model(preact_resnet20_cifar)from a preact_resnet44_cifar.Here is the command line I used to train the teacher model:
python compress_classifier.py -a preact_resnet44_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 .
The KD command line:
python compress_classifier.py -a preact_resnet20_cifar --lr 0.1 -p 50 -b 128 ../../../data.cifar10 -j 1 --epochs 200 --compress=../quantization/preact_resnet_cifar_dorefa.yaml --wd=0.0002 --vs=0 --gpus 0 --kd-teacher preact_resnet44_cifar --kd-resume logs/2018.12.11-130318/checkpoint.pth.tar --kd-temp 5.0 --kd-dw 0.7 --kd-sw 0.3
I got the wrong message:
`==> using cifar10 dataset
=> creating preact_resnet44_cifar model for CIFAR10
=> loading checkpoint logs/2018.12.11-130318/checkpoint.pth.tar
Checkpoint keys:
epoch
arch
state_dict
best_top1
optimizer
compression_sched
quantizer_metadata
best top@1: 48.000
Loaded compression schedule from checkpoint (epoch 2)
Loaded quantizer metadata from the checkpoint

Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, _, _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'

Traceback (most recent call last):
File "compress_classifier.py", line 784, in
main()
File "compress_classifier.py", line 359, in main
teacher, , _ = apputils.load_checkpoint(teacher, chkpt_file=args.kd_resume)
File "/home/share/distiller/apputils/checkpoint.py", line 116, in load_checkpoint
quantizer = qmd['type'](model, **qmd['params'])
TypeError: init() missing 1 required positional argument: 'optimizer'
`
I don't know how could it happen.The other question is:Must the teacher model be deeper than the student model?_

The text was updated successfully, but these errors were encountered:

guyjacob · 2018-12-11T12:41:22Z

It looks like you trained the teacher model with DoReFa quantization. Resuming from quantization aware training is still an open issue, for which there is a workaround you can apply - see here (Note that in the link WRPNQuantizer is modified, you'll need to modify DorefaQuantizer).

But - Is there a specific reason you'd want to have a quantized model as the teacher? An FP32 baseline would at least as good.

Regarding your question on teacher/student depth - there are not requirements on the teacher model. Having said that, the purpose of knowledge distillation is to use a model with bigger representational capacity to help train a model with smaller capacity. So using a shallower teacher goes against that purpose.
If you do use a shallower teacher, I guess it might help in the early stages of training to "point" the student in the right direction. But at some point you'd expect the deeper student model to surpass the performance of the shallower teacher model, at which point it doesn't make sense to continue with the distillation. In any case, I haven't tried it myself, and assuming you don't have any specific restrictions, I don't see a reason to do it.

RgZhangLihao · 2018-12-12T05:04:42Z

@guyjacob
HI,your answer really helps.I use fp32 to train resnet56 as teacher and resnet20 as student.It works.
Thank you ^.^

guyjacob · 2018-12-12T16:06:07Z

You're welcome!

guyjacob closed this as completed Dec 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Knowledge Distillation #102

About Knowledge Distillation #102

RgZhangLihao commented Dec 11, 2018 •

edited

guyjacob commented Dec 11, 2018

RgZhangLihao commented Dec 12, 2018

guyjacob commented Dec 12, 2018

About Knowledge Distillation #102

About Knowledge Distillation #102

Comments

RgZhangLihao commented Dec 11, 2018 • edited

guyjacob commented Dec 11, 2018

RgZhangLihao commented Dec 12, 2018

guyjacob commented Dec 12, 2018

RgZhangLihao commented Dec 11, 2018 •

edited