Skip to content
This repository has been archived by the owner on May 1, 2023. It is now read-only.

how to add middle layers' activation loss functions? #340

Closed
brisker opened this issue Jul 31, 2019 · 12 comments
Closed

how to add middle layers' activation loss functions? #340

brisker opened this issue Jul 31, 2019 · 12 comments
Labels
quantization The issue is related to quantization

Comments

@brisker
Copy link

brisker commented Jul 31, 2019

If I want to train a dorefa-Net quantization on alexnet , is the command line like this ?
python compress_classifier.py -a alexnet /ImageNet_Share/ --compress=/distiller/examples/quantization/quant_aware_train/alexnet_bn_dorefa.yaml
We do not need to modify the code in compress_classifier.py to do this training?

@levzlotnik
Copy link
Contributor

Hi @brisker ,

Sorry for the late response.
You're correct, this yaml file is the configuration for the compress_classifier.py script, and the command line would be exactly as you've specified.

Let us know if you have more questions.

Cheers,
Lev

@brisker
Copy link
Author

brisker commented Aug 5, 2019

@levzlotnik
if we apply PostTrainQuant mode quantization by using /distiller-master/examples/quantization/post_train_quant/stats/resnet18_quant_stats.yaml, the code will read all the activation statistical data such as avg-max, abs-min etc. But I can not find where you use them, in the file range_linear.py? Can you tell me where did you use the data in examples/quantization/post_train_quant/stats/resnet18_quant_stats.yaml ?

@nzmora
Copy link
Contributor

nzmora commented Aug 6, 2019

Hi @brisker,

You can see example invocations using resnet50_quant_stats.yaml here. You can use resnet18_quant_stats.yaml in a similar fashion.

Cheers
Neta

@nzmora nzmora added the quantization The issue is related to quantization label Aug 6, 2019
@brisker
Copy link
Author

brisker commented Aug 7, 2019

@levzlotnik
@nzmora
If I want to quantize the input and output of MaxPooling layer in dorefa-net, do I have to write a new
nn.Module named DorefaMaxPooling?

@guyjacob
Copy link
Contributor

guyjacob commented Aug 7, 2019

Hi @brisker,

(Note that I edited your last comment to remove irrelevant people you tagged)

To go back to your original question, there could be some confusion I'd like to clear up. If you notice, the yaml file is named "alexnet_bn". That is to say, the intention is to run this not on the original AlexNet, but on a modified AlexNet with batch norm layers. We have it implemented here.
The reason for this is that the implementation from the DoReFa authors used this model - see here.
So in the command line, you should actually use -a alexnet_bn. (There's nothing preventing you from running it on the "vanilla" AlexNet, but the settings in the YAML file were meant to fit the settings used in the reference DoReFa implementation.
In addition, the reference implementation used Adam instead of SGD, and indeed in my experiments I saw Adam gives better results. Our sample uses SGD and it's not configurable, so one needs to edit the code in order to use Adam.

All of this wasn't detailed in the yaml files - that's my fault. I pushed updates to both the base FP32 yaml and the DoReFa yaml with details on how to run it and the results I got. Please check those out.

Regarding your question on MaxPool - In general the answer is yes, you should replace MaxPool with something that does quant --> maxpool --> quant. Then you define a function that will create this new module and return it, and then add that function to the "replacement factory". Similar to what we do with ReLU in DorefaQuantizer.init():

https://github.com/NervanaSystems/distiller/blob/e65ec8fce890049fb421aa7d9d32cad5b075cd87/distiller/quantization/clipped_linear.py#L169-L177

@nzmora
Copy link
Contributor

nzmora commented Nov 28, 2019

Closing due to inactivity. Please reopen if needed.

@nzmora nzmora closed this as completed Nov 28, 2019
@brisker brisker changed the title quanti_aware_training issues how to add middle layers' activation loss functions? Mar 2, 2020
@brisker
Copy link
Author

brisker commented Mar 2, 2020

@nzmora
@levzlotnik
@guyjacob
If my model is like a->b->c->d,->e, and I want to add some loss functions onto the outputs of three layers -> layer b, layer c, layer d, this loss function can be something like MSELoss(output_b - label_b, .etc..)and I hope that "b,c,d" can be configured in the yaml file. So how can I implement this?

@levzlotnik
Copy link
Contributor

Hi @brisker ,

You could define "regularization" policies for these layers that will calculate the MSELoss for each of the layers.
Of course these aren't actually regularizations, but since the implementation would require you to apply the additional loss on each minibatch - the API for regularization policies is just the thing you need.
To add it - you create a new class that inherits from distiller.regularization.regularizer._Regularizer base class, and implement your details. Also insert it into distiller.regularization.__init__ imports so it's visible to compress_classifier.py. After that - you'll be able to use it from your yaml files.

Cheers,
Lev

@brisker
Copy link
Author

brisker commented Mar 3, 2020

@levzlotnik
Still I do not know how to get the middle layers' outputs, before I add them into the loss functions. I generally know I can use hooks, but how to add hooks to particular layers according to the yaml file?

@levzlotnik
Copy link
Contributor

Hi @brisker ,
You can add a hook by getting the module itself from name:

modules_dict = dict(model.named_modules())
your_layer = modules_dict[your_layer_name]
your_layer.register_forward_hook(your_hook)

Where your_hook may be sending an output to your custom "regularizer".

@brisker
Copy link
Author

brisker commented Mar 3, 2020

@levzlotnik
I know the code should be like this , but I do not know how to involve the return values of hook functions into the model's total loss functions

1 similar comment
@brisker
Copy link
Author

brisker commented Mar 3, 2020

@levzlotnik
I know the code should be like this , but I do not know how to involve the return values of hook functions into the model's total loss functions

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
quantization The issue is related to quantization
Projects
None yet
Development

No branches or pull requests

4 participants