Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use quant_runner #5

Closed
feixiang7701 opened this issue Nov 4, 2021 · 8 comments
Closed

How to use quant_runner #5

feixiang7701 opened this issue Nov 4, 2021 · 8 comments

Comments

@feixiang7701
Copy link

feixiang7701 commented Nov 4, 2021

Thank you for the excellent work of MQBench and EOD. I am interested in the work of quantization and I have tried the config of retinanet-r50_1x_quant.yaml. However, there are some errors. Besides, I found that there is no quantitative document in this project. Can you give some suggestions to use the quant_runner.

Here are the errors I encountered when use retinanet-r50_1x_quant.yaml:

error_1

File "/home/user/miniconda3/envs/eod/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/user/project/EOD/eod/utils/env/launch.py", line 117, in _distributed_worker
main_func(args)
File "/home/user/project/EOD/eod/commands/train.py", line 121, in main
runner = RUNNER_REGISTRY.get(runner_cfg['type'])(cfg, **runner_cfg['kwargs'])
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 14, in init
super(QuantRunner, self).init(config, work_dir, training)
File "/home/user/project/EOD/eod/runner/base_runner.py", line 52, in init
self.build()
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 32, in build
self.quantize_model()
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 68, in quantize_model
from mqbench.prepare_by_platform import prepare_by_platform
ImportError: cannot import name 'prepare_by_platform' from 'mqbench.prepare_by_platform' (/home/user/project/MQBench/mqbench/prepare_by_platform.py)

solved by modifying the EOD/eod/runner/quant_runner.py 68-72:

from mqbench.prepare_by_platform import prepare_qat_fx_by_platform
logger.info("prepare quantize model")
deploy_backend = self.config['quant']['deploy_backend']
prepare_args = self.config['quant'].get('prepare_args', {})
self.model = prepare_qat_fx_by_platform(self.model, self.backend_type[deploy_backend], prepare_args)

error_2

I can use single gpu train the quant model, but when using multiple gpus I meet the error below, which is still unsolved.

Traceback (most recent call last):
File "/home/user/miniconda3/envs/eod/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/user/project/EOD/eod/utils/env/launch.py", line 117, in _distributed_worker
main_func(args)
File "/home/user/project/EOD/eod/commands/train.py", line 121, in main
runner = RUNNER_REGISTRY.get(runner_cfg['type'])(cfg, **runner_cfg['kwargs'])
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 15, in init
super(QuantRunner, self).init(config, work_dir, training)
File "/home/user/project/EOD/eod/runner/base_runner.py", line 52, in init
self.build()
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 34, in build
self.calibrate()
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 84, in calibrate
self.model(batch)
File "/home/user/miniconda3/envs/eod/lib/python3.8/site-packages/torch/fx/graph_module.py", line 513, in wrapped_call
raise e.with_traceback(None)
NameError: name 'dist' is not defined

@yqyao
Copy link

yqyao commented Nov 4, 2021

Can you provide the version of MQbench? @feixiang7701 . And maybe you can try the latest MQbench.

@feixiang7701
Copy link
Author

feixiang7701 commented Nov 4, 2021

The version of MQBench is v0.0.2. And as you say MqBench support dist train, so maybe it's an error in the config file?@yqyao

@yqyao
Copy link

yqyao commented Nov 4, 2021

Mqbench just updated the code, you can try it. @feixiang7701

@feixiang7701
Copy link
Author

feixiang7701 commented Nov 4, 2021

Mqbench just updated the code, you can try it. @feixiang7701

Yes, updating mqbench to v0.0.3 solves my problem.Thank you.

@feixiang7701
Copy link
Author

feixiang7701 commented Nov 10, 2021

@Joker-co @yqyao I have tried the quant_runner, but the yolox_nano model quantization result is far away from the benchmark provided. In addition, though loading ckpt from float model the Initial loss of quant model is far greater than than the last loss of float model. Can you give some suggestions, or do you have any plan to publish documents for quantization aware training?

@Tracin
Copy link

Tracin commented Nov 10, 2021

Which backend type did you chose?
Try eval the model after quant model initialized. Should have similar loss with FP model.

@feixiang7701
Copy link
Author

@Tracin I use configs/yolox/yolox_nano.yaml and introduced quant parameters like configs/retinanet/retinanet_r50_1x_quant.yaml.

@feixiang7701
Copy link
Author

feixiang7701 commented Nov 11, 2021

@Joker-co @yqyao @Tracin In testing the quant_runner, I fixed some bugs and submitted a merge request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants