How to use quant_runner #5

feixiang7701 · 2021-11-04T07:45:03Z

Thank you for the excellent work of MQBench and EOD. I am interested in the work of quantization and I have tried the config of retinanet-r50_1x_quant.yaml. However, there are some errors. Besides, I found that there is no quantitative document in this project. Can you give some suggestions to use the quant_runner.

Here are the errors I encountered when use retinanet-r50_1x_quant.yaml：

error_1

File "/home/user/miniconda3/envs/eod/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/user/project/EOD/eod/utils/env/launch.py", line 117, in _distributed_worker
main_func(args)
File "/home/user/project/EOD/eod/commands/train.py", line 121, in main
runner = RUNNER_REGISTRY.get(runner_cfg['type'])(cfg, **runner_cfg['kwargs'])
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 14, in init
super(QuantRunner, self).init(config, work_dir, training)
File "/home/user/project/EOD/eod/runner/base_runner.py", line 52, in init
self.build()
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 32, in build
self.quantize_model()
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 68, in quantize_model
from mqbench.prepare_by_platform import prepare_by_platform
ImportError: cannot import name 'prepare_by_platform' from 'mqbench.prepare_by_platform' (/home/user/project/MQBench/mqbench/prepare_by_platform.py)

solved by modifying the EOD/eod/runner/quant_runner.py 68-72:

from mqbench.prepare_by_platform import prepare_qat_fx_by_platform
logger.info("prepare quantize model")
deploy_backend = self.config['quant']['deploy_backend']
prepare_args = self.config['quant'].get('prepare_args', {})
self.model = prepare_qat_fx_by_platform(self.model, self.backend_type[deploy_backend], prepare_args)

error_2

I can use single gpu train the quant model, but when using multiple gpus I meet the error below, which is still unsolved.

Traceback (most recent call last):
File "/home/user/miniconda3/envs/eod/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/user/project/EOD/eod/utils/env/launch.py", line 117, in _distributed_worker
main_func(args)
File "/home/user/project/EOD/eod/commands/train.py", line 121, in main
runner = RUNNER_REGISTRY.get(runner_cfg['type'])(cfg, **runner_cfg['kwargs'])
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 15, in init
super(QuantRunner, self).init(config, work_dir, training)
File "/home/user/project/EOD/eod/runner/base_runner.py", line 52, in init
self.build()
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 34, in build
self.calibrate()
File "/home/user/project/EOD/eod/runner/quant_runner.py", line 84, in calibrate
self.model(batch)
File "/home/user/miniconda3/envs/eod/lib/python3.8/site-packages/torch/fx/graph_module.py", line 513, in wrapped_call
raise e.with_traceback(None)
NameError: name 'dist' is not defined

The text was updated successfully, but these errors were encountered:

yqyao · 2021-11-04T08:13:48Z

Can you provide the version of MQbench? @feixiang7701 . And maybe you can try the latest MQbench.

feixiang7701 · 2021-11-04T08:21:02Z

The version of MQBench is v0.0.2. And as you say MqBench support dist train, so maybe it's an error in the config file?@yqyao

yqyao · 2021-11-04T08:43:45Z

Mqbench just updated the code, you can try it. @feixiang7701

feixiang7701 · 2021-11-04T11:20:10Z

Mqbench just updated the code, you can try it. @feixiang7701

Yes, updating mqbench to v0.0.3 solves my problem.Thank you.

feixiang7701 · 2021-11-10T09:50:09Z

@Joker-co @yqyao I have tried the quant_runner, but the yolox_nano model quantization result is far away from the benchmark provided. In addition, though loading ckpt from float model the Initial loss of quant model is far greater than than the last loss of float model. Can you give some suggestions, or do you have any plan to publish documents for quantization aware training?

Tracin · 2021-11-10T10:16:40Z

Which backend type did you chose?
Try eval the model after quant model initialized. Should have similar loss with FP model.

feixiang7701 · 2021-11-10T10:35:08Z

@Tracin I use configs/yolox/yolox_nano.yaml and introduced quant parameters like configs/retinanet/retinanet_r50_1x_quant.yaml.

feixiang7701 · 2021-11-11T01:26:53Z

@Joker-co @yqyao @Tracin In testing the quant_runner, I fixed some bugs and submitted a merge request.

feixiang7701 closed this as completed Nov 4, 2021

feixiang7701 reopened this Nov 10, 2021

LitPrice closed this as completed in d3e7d17 May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use quant_runner #5

How to use quant_runner #5

feixiang7701 commented Nov 4, 2021 •

edited

yqyao commented Nov 4, 2021

feixiang7701 commented Nov 4, 2021 •

edited

yqyao commented Nov 4, 2021

feixiang7701 commented Nov 4, 2021 •

edited

feixiang7701 commented Nov 10, 2021 •

edited

Tracin commented Nov 10, 2021

feixiang7701 commented Nov 10, 2021

feixiang7701 commented Nov 11, 2021 •

edited

How to use quant_runner #5

How to use quant_runner #5

Comments

feixiang7701 commented Nov 4, 2021 • edited

error_1

error_2

yqyao commented Nov 4, 2021

feixiang7701 commented Nov 4, 2021 • edited

yqyao commented Nov 4, 2021

feixiang7701 commented Nov 4, 2021 • edited

feixiang7701 commented Nov 10, 2021 • edited

Tracin commented Nov 10, 2021

feixiang7701 commented Nov 10, 2021

feixiang7701 commented Nov 11, 2021 • edited

feixiang7701 commented Nov 4, 2021 •

edited

feixiang7701 commented Nov 4, 2021 •

edited

feixiang7701 commented Nov 4, 2021 •

edited

feixiang7701 commented Nov 10, 2021 •

edited

feixiang7701 commented Nov 11, 2021 •

edited