-
Notifications
You must be signed in to change notification settings - Fork 19
RetinaNet Quantization #1
Comments
Thanks for the report. Could you try adding "merge_duplicate_nodes" to the
Here's the information about all of the graph transformations you can do: Best. |
Yes, try the "merge_duplicate_nodes" option and give it a try again. But , here are some questions: |
Hi @tonyreina, thank you for your support! I was able to generate the optimized_graph.pb, by following your instructions:
But when I run:
I got the following error:
|
Hi @nammbash, thank you for your support! When I use:
When I use |
@felipheggaliza would you mind sharing your fp32 graph? We can take a look. We are in development stage of the tools. Thanks for your feedback. |
@felipheggaliza Use the following command: with and without the merge_duplicate_nodes and you should have your optimized FP32 graph. Hoping that helps. bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=/workspace/quantization/frozen_inference_graph.pb --out_graph=/workspace/quantization/optimized_graph.pb --inputs="input_1" --outputs="bboxes,scores,classes" --transforms='remove_nodes(op=Identity, op=CheckNumerics, op=StopGradient) fold_old_batch_norms strip_unused_nodes merge_duplicate_nodes' |
Hi @nammbash, I was able to generate the optimized_graph.pb using your instructions:
But I got the following error again (ValueError: Duplicate node names detected.) when trying to run the quantize_graph.py script:
|
Just as an update. I was able to generate the quantized_dynamic_range_graph.pb using an updated version of quantize_graph.py provided by @mdfaijul. The command I used was:
I was able to generate the logged_quantized_graph.pb using the command:
And according to steps listed here: https://github.com/IntelAI/tools/blob/master/tensorflow_quantization/README.md#quantization-tools Unfortunatelly I am not being able to run inferences using the logged_quantized_graph.pb graph because when I load the graph I get this error:
It seems QuantizedConv2DPerChannel is not present on my current TensorFlow installation. I have tested it using TensorFlow 1.13.1-mkl_py36h27d456a_0 from anaconda channel and TensorFlow 2.0.0-alpha0 (pip install tensorflow==2.0.0-alpha0), and both resulted in the same error mentioned above. I am looking forward to getting help and working with you until we get the quantization working. Best Regards, Feliphe Galiza |
@felipheggaliza how are you are generating the calibration data using @mdfaijul should we use your latest PR for this fix? or on @WafaaT / @dmsuehir have you seen this error before? Any docker image you suggest to use? |
Hi @karthikvadla, I am generating the calibration data using I've shared more information about this case in a separated email. Best Regards, Feliphe Galiza |
Hi all, First of all I would like to thank you all for the help you have been giving me in this journey of enabling INT8 and VNNI for inference using RetinaNet on a Cascade Lake server. Special thanks to @nammbash and @mdfaijul who gave me most of the instructions and even a script which saved me from getting stuck in one of the steps. That being said, I have good and bad news. • The good news is that I was able to convert the FP32 graph to INT8 using Intel AI Quantization tool(https://github.com/IntelAI/tools) and also it seems VNNI is enabled (I checked it using Intel XED); One of the reasons for getting these results maybe is because something wrong happened with the graph during INT8 conversion.
See more details below: CPU Info
Installing Bazel and TensorFlow from source using MKL-DNN
Some outputs related to Tensorflow instalation:
Running Intel AI Quantization tool (Attempting int8 Quantization)
Steps for FP32 Optimized Frozen Graph
Output:
Therefore:
Output:
The above output shows the FP32 inference is working properly. Steps for Int8 Quantization
Since I was having issues with the quantize_graph.py from master, FAIJUL (@mdfaijul) helped me with another version of quantize_graph.py where he fixed some things.
The int8_quantized_dynamic_range_graph.pb is generated successfully.
The int8_logged_quantized_graph.pb is generated successfully.
Output:
Output:
I did not perform this step since I still not sure what further optimizations could be applied. Finally, verifying the quantized model performance and accuracy:
Typically, the accuracy target is the optimized FP32 model accuracy values. Performance Experiments
Output:
Output:
Reference: https://www.tensorflow.org/guide/performance/overview#tuning_mkl_for_the_best_performance
Output:
Output:
Is VNNI being used?Checking it using MKLDNN_JIT_DUMP.
Running this code produces a bunch of files. As for eg: mkldnn_dump_jit_uni_reorder_kernel_f32.1250.bin , mkldnn_dump_jit_uni_reorder_kernel_f32.987.bin, etc..
Reference: https://intel.github.io/mkl-dnn/perf_profile.html
Output:
Output:
Apparently, It seems VNNI is enabled for at least some operations. Do you have suggestions on how I could do further investigation in order to make INT8 inference faster than FP32 for this RetinaNet neural network? Thanks in advance! Regards, Feliphe Galiza |
@felipheggaliza Pleasure to be of help. @mdfaijul in in the process of updating and cleaning the quantize graph.py and I am in the process of integrating the regex feature to this quantize_graph.py. after this integration you can exclude certain specific nodes, or certain nodes following a regex. nevertheless here is the reason, : not all ops present in all models are quantize and quantize fused ready. So if there is an op which is quantized and the following op cannot be, we need to have a dequnatize operation which at runtime will create a lot of performance loss. example: For fasterrcnnwith fpn, there are three networks Resnet50 + RPN + FPN. For quantization I exclude the RPN and FPN portion of it so that it is faster and get 1.27x. Hope this helps. Now: Here is a follow up question: being a user of the quantization tool, how would you want these kind of information intuitive to the user? Suggestions are welcome. |
Hi @nammbash, I am exciting for the release of the new features you and @mdfaijul are working on! Thank you for the explanation, now the possible reason is clear to me. I will try to study the graph and see if I can use the same approach you used on FASTER-CNN with FPN. The challenge for me is because I don't really know exactly which operations are optimized and also I don't have a deep knowledge about all operations present in RetinaNet, the graph is very big and it probably will take a lot of time until I figure out which operations I have to exclude from quantization. Do you know where I can get a list of all operations which are already quantized and quantized fused ready? Regarding your follow up question, I would say that users are interested in answering questions like:
We could try answering these questions by providing a tool which receives a FP32 model as input and outputs the answers for the above questions. Format could be both stdout, txt and HTML file showing it in a graph. Hope my suggestion is useful, somehow. Regards, Feliphe Galiza |
Updated Qunatize_graph.py
https://github.com/NervanaSystems/tools/tree/niroop/perchannel_padfusion_fasterrcnnfpn
From: Feliphe Gonçalves Galiza [mailto:notifications@github.com]
Sent: Friday, May 24, 2019 11:46 AM
To: IntelAI/tools <tools@noreply.github.com>
Cc: Ammbashankar, Niroopshankar <niroopshankar.ammbashankar@intel.com>; Mention <mention@noreply.github.com>
Subject: Re: [IntelAI/tools] RetinaNet Quantization (#1)
Hi @nammbash<https://github.com/nammbash>,
I am exciting for the release of the new features you and @mdfaijul<https://github.com/mdfaijul> are working on!
Thank you for the explanation, now the possible reason is clear to me. I will try to study the graph and see if I can use the same approach you used on FASTER-CNN with FPN. The challenge for me is because I don't really know exactly which operations are optimized and also I don't have a deep knowledge about all operations present in RetinaNet, the graph is very big and it probably will take a lot of time until I figure out which operations I have to exclude from quantization.
Do you know where I can get a list of all operations which are already quantized and quatized fused ready?
Regarding your follow up question, I would say that users are interested in answering questions like:
* If I apply quantization using Intel AI tool, what is the theoretical VNNI performance gain I will have based on what operations are available for quantization right now? What about accuracy drop?
We could try answering these questions by providing a tool which receives a FP32 model as input and outputs the answers for the above questions. Format could be both stdout, txt and HTML file showing it in a graph.
Hope my suggestion is useful, somehow.
Regards,
Feliphe Galiza
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AJIRVNKR3S7N23JW43PDE5TPXAZXFANCNFSM4HHRO4YA>.
|
Hi,
I am trying to quantize the RetinaNet topology trained on TensorFlow, but I am getting an error. These are the steps I followed based on these instructions https://github.com/IntelAI/tools/tree/master/tensorflow_quantization:
I was able to generate an optimized_graph.pb using the command:
bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=/workspace/quantization/frozen_inference_graph.pb --out_graph=/workspace/quantization/optimized_graph.pb --inputs="input_1" --outputs="bboxes,scores,classes" --transforms="fold_batch_norms"
But when I tried to run the quantization, using this command:
python tensorflow/tools/quantization/quantize_graph.py --input=/workspace/quantization/optimized_graph.pb --output=/workspace/quantization/quantized_dynamic_range_graph.pb --output_node_names="bboxes,scores,classes" --mode=eightbit --intel_cpu_eightbitize=True
I got this error:
W0422 17:10:22.236689 140385778120448 deprecation.py:323] From tensorflow/tools/quantization/quantize_graph.py:540: remove_training_nodes (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.remove_training_nodes
2019-04-22 17:10:22.323616: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: AVX512F
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-04-22 17:10:22.345101: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2095090000 Hz
2019-04-22 17:10:22.360829: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x1d7bfdd0 executing computations on platform Host. Devices:
2019-04-22 17:10:22.360862: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): ,
2019-04-22 17:10:22.367186: I tensorflow/core/common_runtime/process_util.cc:92] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
W0422 17:10:22.368036 140385778120448 deprecation.py:323] From tensorflow/tools/quantization/quantize_graph.py:406: quantize_v2 (from tensorflow.python.ops.array_ops) is deprecated and will be removed after 2017-10-25.
Instructions for updating:
tf.quantize_v2
is deprecated, please usetf.quantization.quantize
instead.Traceback (most recent call last):
File "tensorflow/tools/quantization/quantize_graph.py", line 1951, in
app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "tensorflow/tools/quantization/quantize_graph.py", line 1937, in main
output_graph = rewriter.rewrite(FLAGS.output_node_names.split(","))
File "tensorflow/tools/quantization/quantize_graph.py", line 583, in rewrite
self.output_graph)
File "tensorflow/tools/quantization/quantize_graph.py", line 1733, in remove_redundant_quantization
old_nodes_map = self.create_nodes_map(old_graph)
File "tensorflow/tools/quantization/quantize_graph.py", line 506, in create_nodes_map
raise ValueError("Duplicate node names detected.")
ValueError: Duplicate node names detected.
The text was updated successfully, but these errors were encountered: