how to add middle layers' activation loss functions? #340

brisker · 2019-07-31T14:30:39Z

If I want to train a dorefa-Net quantization on alexnet , is the command line like this ?
python compress_classifier.py -a alexnet /ImageNet_Share/ --compress=/distiller/examples/quantization/quant_aware_train/alexnet_bn_dorefa.yaml
We do not need to modify the code in compress_classifier.py to do this training?

The text was updated successfully, but these errors were encountered:

levzlotnik · 2019-08-02T20:45:36Z

Hi @brisker ,

Sorry for the late response.
You're correct, this yaml file is the configuration for the compress_classifier.py script, and the command line would be exactly as you've specified.

Let us know if you have more questions.

Cheers,
Lev

brisker · 2019-08-05T09:42:30Z

@levzlotnik
if we apply PostTrainQuant mode quantization by using /distiller-master/examples/quantization/post_train_quant/stats/resnet18_quant_stats.yaml, the code will read all the activation statistical data such as avg-max, abs-min etc. But I can not find where you use them, in the file range_linear.py? Can you tell me where did you use the data in examples/quantization/post_train_quant/stats/resnet18_quant_stats.yaml ?

nzmora · 2019-08-06T21:32:16Z

Hi @brisker,

You can see example invocations using resnet50_quant_stats.yaml here. You can use resnet18_quant_stats.yaml in a similar fashion.

Cheers
Neta

brisker · 2019-08-07T06:01:00Z

@levzlotnik
@nzmora
If I want to quantize the input and output of MaxPooling layer in dorefa-net, do I have to write a new
nn.Module named DorefaMaxPooling?

guyjacob · 2019-08-07T11:36:22Z

Hi @brisker,

(Note that I edited your last comment to remove irrelevant people you tagged)

To go back to your original question, there could be some confusion I'd like to clear up. If you notice, the yaml file is named "alexnet_bn". That is to say, the intention is to run this not on the original AlexNet, but on a modified AlexNet with batch norm layers. We have it implemented here.
The reason for this is that the implementation from the DoReFa authors used this model - see here.
So in the command line, you should actually use -a alexnet_bn. (There's nothing preventing you from running it on the "vanilla" AlexNet, but the settings in the YAML file were meant to fit the settings used in the reference DoReFa implementation.
In addition, the reference implementation used Adam instead of SGD, and indeed in my experiments I saw Adam gives better results. Our sample uses SGD and it's not configurable, so one needs to edit the code in order to use Adam.

All of this wasn't detailed in the yaml files - that's my fault. I pushed updates to both the base FP32 yaml and the DoReFa yaml with details on how to run it and the results I got. Please check those out.

Regarding your question on MaxPool - In general the answer is yes, you should replace MaxPool with something that does quant --> maxpool --> quant. Then you define a function that will create this new module and return it, and then add that function to the "replacement factory". Similar to what we do with ReLU in DorefaQuantizer.init():

https://github.com/NervanaSystems/distiller/blob/e65ec8fce890049fb421aa7d9d32cad5b075cd87/distiller/quantization/clipped_linear.py#L169-L177

nzmora · 2019-11-28T13:05:20Z

Closing due to inactivity. Please reopen if needed.

brisker · 2020-03-02T12:40:52Z

@nzmora
@levzlotnik
@guyjacob
If my model is like a->b->c->d,->e, and I want to add some loss functions onto the outputs of three layers -> layer b, layer c, layer d, this loss function can be something like MSELoss(output_b - label_b, .etc..)and I hope that "b,c,d" can be configured in the yaml file. So how can I implement this?

levzlotnik · 2020-03-02T18:02:21Z

Hi @brisker ,

You could define "regularization" policies for these layers that will calculate the MSELoss for each of the layers.
Of course these aren't actually regularizations, but since the implementation would require you to apply the additional loss on each minibatch - the API for regularization policies is just the thing you need.
To add it - you create a new class that inherits from distiller.regularization.regularizer._Regularizer base class, and implement your details. Also insert it into distiller.regularization.__init__ imports so it's visible to compress_classifier.py. After that - you'll be able to use it from your yaml files.

Cheers,
Lev

brisker · 2020-03-03T03:27:14Z

@levzlotnik
Still I do not know how to get the middle layers' outputs, before I add them into the loss functions. I generally know I can use hooks, but how to add hooks to particular layers according to the yaml file?

levzlotnik · 2020-03-03T05:43:51Z

Hi @brisker ,
You can add a hook by getting the module itself from name:

modules_dict = dict(model.named_modules())
your_layer = modules_dict[your_layer_name]
your_layer.register_forward_hook(your_hook)

Where your_hook may be sending an output to your custom "regularizer".

brisker · 2020-03-03T07:31:51Z

@levzlotnik
I know the code should be like this , but I do not know how to involve the return values of hook functions into the model's total loss functions

brisker · 2020-03-03T07:31:57Z

@levzlotnik
I know the code should be like this , but I do not know how to involve the return values of hook functions into the model's total loss functions

nzmora added the quantization The issue is related to quantization label Aug 6, 2019

nzmora closed this as completed Nov 28, 2019

brisker changed the title ~~quanti_aware_training issues~~ how to add middle layers' activation loss functions? Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to add middle layers' activation loss functions? #340

how to add middle layers' activation loss functions? #340

brisker commented Jul 31, 2019 •

edited

levzlotnik commented Aug 2, 2019

brisker commented Aug 5, 2019

nzmora commented Aug 6, 2019

brisker commented Aug 7, 2019 •

edited by guyjacob

guyjacob commented Aug 7, 2019

nzmora commented Nov 28, 2019

brisker commented Mar 2, 2020

levzlotnik commented Mar 2, 2020

brisker commented Mar 3, 2020

levzlotnik commented Mar 3, 2020

brisker commented Mar 3, 2020

brisker commented Mar 3, 2020

how to add middle layers' activation loss functions? #340

how to add middle layers' activation loss functions? #340

Comments

brisker commented Jul 31, 2019 • edited

levzlotnik commented Aug 2, 2019

brisker commented Aug 5, 2019

nzmora commented Aug 6, 2019

brisker commented Aug 7, 2019 • edited by guyjacob

guyjacob commented Aug 7, 2019

nzmora commented Nov 28, 2019

brisker commented Mar 2, 2020

levzlotnik commented Mar 2, 2020

brisker commented Mar 3, 2020

levzlotnik commented Mar 3, 2020

brisker commented Mar 3, 2020

brisker commented Mar 3, 2020

brisker commented Jul 31, 2019 •

edited

brisker commented Aug 7, 2019 •

edited by guyjacob