-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Hello!
The aim: run inference (no training needed) of a custom Mask-RCNN at CPU VM as an AksWebservice as fast as possible. (CPU is chosen mainly because it's cheaper.)
Default TF build from pip on inference sent warnings like:
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
So I tried to install provided CPU-optimized build using:
CondaDependencies.add_tensorflow_conda_package(core_type='cpu', version="1.15")
It successfully installs TF 1.15 with all the needed instructions but only for MKL-DNN operations:
This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
And it slowed down the inference by x2. Why could that be? I've seen quite a lot of similar issues ([1], [2], [3] about performance degradation when using MKL. Maybe, it is somehow threads related.
I also tried pip intel-tensorflow==1.15.2. Same performance.
Installing CondaDependencies.add_tensorflow_pip_package(core_type='cpu', version="1.15")
leads to no optimizations at all (pip installs usual TF binary).
So I decided to build my own TF1.15.3 with AVX2 AVX512F FMA but without MKL (correct me if I'm wrong and that won't change the performance):
$ bazel build -c opt --copt=-march=native --copt=-mfpmath=both //tensorflow/tools/pip_package:build_pip_package
It's been successfully compiled and installed without errors. No warnings about unsupported instructions or MKL-DNN-only operations occurred. But no performance boost has been noticed.
So, why optimized TF builds works the same (if not worse for MKL-DNN part)? Am I using wrong type of VM for this type of task? (Right now I'm using Fs-v2 series.)
And also few side questions if you please:
- How to delete an Environment from a Workspace? So it wouldn't appear on
Environment.list(ws)anymore? - Why could it be that the AksWebservice doesn't create all the replicas but only few of them? Say, 2/10. Others are "unavailable" on a Kubernetes Services page on portal.azure.com. Autoscaling is set to False so this should not be the reason.