Sync main branch by EmmaNingMS · Pull Request #1 · EmmaNingMS/onnxruntime

EmmaNingMS · 2019-09-07T01:08:33Z

Description: Describe your changes.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

Add AutoML to 3 main builds. Fix unit tests. Enable copy elision, do not move movable object on return by value.

Implement the first round of changes for quantization inside MLAS. This adds a MatMul operation for U8xU8=S32 for x86/x64 processors.

* update MKLML which has bugfix for thread hang. move PATCH_COMMAND outside BUILD_FOR_NATIVE_MACHINE check. * MKLML_VERSION 2020.0.20190813 is for windows only.

…ool (#1646)

…enums. Relax constraint for enable_all. (#1650)

* Mention OrtCreateSessionFromArray in C API doc * Update perf tool documentation to reflect the new graph optimization enums. Relax constraint for enable_all. * Update one more doc * Update onnx test runner documentation * Add default in the docs

Also cleanup a couple of unused variables.

PyTorch exporter in Pytorch1.2 can natively support multiple opset now

Fix issue that cudnnRNNForwardInferenceEx doesn't support 0 sequence in the bathes Solution: Reset the 0 sequence to 1 for the bathes before call the cudnnRNNForwardInferenceEx, has a array to track the batch id which has 0 sequence. Once get the result, call a CUDA kernel to mask on the output using the batch id tracked in the array.

…rovider. (#1665)

* Added check for unnecessary function initializations, and removed lock from unneeded areas of code. * Added LRU cache to EP. * Bugfixes for nGraph EP Optimization PR * Changed default cache size to 500 and refactored mutex readability. * Fixed unsafe environmental variable fetch for Windows. * Cleaned up Windows environment functions and cleaned up mutexes.

…g> as input. Fix a bug in the InferenceSession Run() with RunOptions (#1671) - Support bool-Tensor and int8-Tensor in input-output of C# api - Support string-tensor as input in C# api - Fix a bug in InferenceSession.Run() -- RunOptions was not passed into the native call

* update clip for opset 11 * exclude ngraph provider for clip unit tests * exclude ngraph for all clip opset 11 tests * fix op version

* Add support of ReduceSum int64 * add unit test for int64

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

…rdingly. Expose Session/Run log severity levels. (#1615) * Mention OrtCreateSessionFromArray in C API doc * Don't create the default allocator every single time. Rename API accordingly. * Don't create the default allocator every single time. Rename API accordingly. * updates... * updates... * PR comments * fix typo in license header * fix build

Description: make default CPU allocator to use MLAS preferred alignment Motivation and Context This is needed for C API to have an aligned default CPU allocator, the same as the one in CPU provider

- Fix the Windows end-to-end test in NuGet CI - Skip the TestModelSerialization, because it is failing on Linux. Must be fixed before API is released for use. Owner is notified.

* use mlas qgemm for u8u8_s32 gemms * update test

* Updates * Remove preview texts * Update README.md * Updates * Update README.md * Update README.md * Minor wording update * Update README.md * Update doc on CUDA version * revert update * Update readme for issue #1558 * Clean up example section * Cosmetic updates - Add a index of build instructions for browsability - Update build CUDA version from 9.1 to 10 * Fix broken link * Update README to reflect upgrade to pip requirement * Update CuDNN version for Linux Python packages * Clean up content Updated ordering and add table of contents * Minor format fixes * Move Android NNAPI under EP section * Add link to operator support documentation * Fix typo * typo fix * remove todo section

Avoid the need for @PCGOTREL relocations by annotating MLAS global data shared with assembly modules with attribute(visibility("hidden")).

Fix the aarch64 kernel to build properly with the Android NDK (specifically clang).

…ame allocator device (#1715) as long as these providers use the same allocator device Description: Currently ORT throws error when one input is used in different EPs. The change removes that restriction Motivation and Context It is now possible to share inputs across EPs now that allocation are device-based, instead of EP based.

…tom op (#1391) Description: The change adds necessary quantization support on CPU with mixed int8/uint8, as well as int16 for matrix multiply operations that outputs int32 Motivation and Context Integer operations are critical for quantized model's performance Current MatMulInteger implementation in CPU only supports uint8 x uint8, while the spec supports int8 x uint8. Having a default CPU implementation that fully support the spec would help accuracy verification. Besides, some model may need to quantize to int16, but MatMulInteger op does not support that yet. A custom op of MatMulInteger16 is added to satisfy such models.

* Use exec form of ENTRYPOINT for docker server # Issue The entrypoint currently uses the shell form - this prevents users from passing in any cmdline arguments... also passing a model_path in means the server only works in the envvar is set... however this is not what the error message says! ``` $ docker run -v /home/rakelkar/try/onnxzoo/style:/mnt/models -it mcr.microsoft.com/onnxruntime/server --model_path /mnt/models/model.onnx Version: local_build Commit ID: default model_path must be the location of a valid file Allowed options: -h [ --help ] Shows a help message and exits --log_level arg (=info) Logging level. Allowed options (case sensitive): verbose, info, warning, error, fatal --model_path arg Path to ONNX model --address arg (=0.0.0.0) The base HTTP address --http_port arg (=8001) HTTP port to listen to requests --num_http_threads arg (=4) Number of http threads --grpc_port arg (=50051) GRPC port to listen to requests ``` # Fix 1. remove the env var 2. use the exec form * Update readme to use model_path arg

…1679) * Support bilinear mode with actual 2D inputs in Resize and upsample * Fix build break * Fix build break * Add test * CUDA changes * Resolve PR comments * Resolve comments

…in 0.5 release. (#1694) * Mention OrtCreateSessionFromArray in C API doc * Fix registration of Equal op causing one of the automl models to break in 0.5 release. * updates...

…which cause huge data copy. If the node's inputs are all initializer, we shouldn't fallback the node to CPU. (#1727) Fix an issue that CUDA EP fallback too much nodes to CPU for some case which cause huge data copy. #1675 Currently, if the node's inputs are all as initialier, CUDA EP will fallback it to CPU. And it will also fallback some nodes under it. It could cause some huge data copy. for the case reported by a user, it has several Slices with input from initializer, and a Concat op to concat the output from Slice output. The data is huge 16MB after concat, which make the data copy from CPU to GPU quite costly because it's a sync copy. Fix If the node's inputs are all initializer, we shouldn't fallback the node to CPU.

Update the docker file for OpenVINO which is used for AML

Fix typo in NMS code

* moved subgraph_index to MklDnn Execution Provider * code cleanup

* Implement Nuphar execution provider Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan. This PR is mainly for a preview of the shared codegen library for other TVM-based providers. * Fix submodules * Fix TVM submodule * Update Nuphar to latest and resolve confliction * Remove stale files caused by merge -X theirs * Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test * Fix bad merge * Merge from Nuphar * Fix warning treated as error, revert some unnecessary changes * Revert some more test changes * Some more test revert or comments to make review easier New tests could be added later * One more revert of unnecessary changes * More change revert. Test could be added back later.

* Mention OrtCreateSessionFromArray in C API doc * Enforce shape validation. * Update broken models

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

Restore the missing variable

* Update README.md * Update onnx-inference-byoc-gpu-cpu-aks.ipynb * Update README.md

…ions (unless where not possible). (#1730) * Mention OrtCreateSessionFromArray in C API doc * Fix perf test executable due to removal of certain C APIs * fix linux build * Avoid duplication * Update coding guidelines to prefer using make_unique for heap allocations (unless where not possible).

When NUPHAR_USE_MKL or NUPHAR_USE_AVX2 is not defined, we got "unreachable code" warnings on Windows, which were truned into errors and broke the build.

Enable Nuphar EP docker build Revert back to LLVM 6.0.1 Reinstate disabled Softmax tests caused by LLVM 8.0.1 Reinstate Nuphar Python test due to stale sympy version Increase build timeout of Linux CI

) * Mention OrtCreateSessionFromArray in C API doc * Rename OrtAllocatorInfo to OrtMemoryInfo to avoid confusion

jywu-msft and others added 30 commits August 16, 2019 17:44

update TRT EP CI's to use latest model.zip (#1637)

372b657

Add AutoML to 3 main builds. (#1631)

fbd790f

Add AutoML to 3 main builds. Fix unit tests. Enable copy elision, do not move movable object on return by value.

MLAS: add U8U8 MatMul operation (#1644)

bc72c2d

Implement the first round of changes for quantization inside MLAS. This adds a MatMul operation for U8xU8=S32 for x86/x64 processors.

Add uint8 Support for NonZero Op (#1614)

b963e4b

update MKLML to version which contains fix for thread hang. (#1636)

bdc6943

* update MKLML which has bugfix for thread hang. move PATCH_COMMAND outside BUILD_FOR_NATIVE_MACHINE check. * MKLML_VERSION 2020.0.20190813 is for windows only.

MlasGetMaximumThreadCount: plus 1 to the NumThreads from ORT thread p…

4137303

…ool (#1646)

Update perf tool documentation to reflect the new graph optimization …

6f3a835

…enums. Relax constraint for enable_all. (#1650)

Allow user disable multiple threading (#1647)

224dde7

Fix memory leak in mlas unitest (#1654)

a1b3c64

fix bug on windows where ops were always getting dumped. (#1648)

68d496c

Remove --whole-archive (#1655)

7be5695

Check return value form CreateFeedsFetchesManager. (#1653)

5311c1b

Also cleanup a couple of unused variables.

Update PyTorch Section for supported onnx version (#1635)

d0d8243

PyTorch exporter in Pytorch1.2 can natively support multiple opset now

Add details of which node was not able to be placed on an execution p…

a68a20e

…rovider. (#1665)

Fix a few errors in the NuGet pipeline (still broken) (#1656)

6f70a78

update set fetches for execution with allocation plan. (#1668)

b53f40a

Optimize kernel index (#1672)

4de0aa8

update clip for opset 11 (#1661)

d2569d3

* update clip for opset 11 * exclude ngraph provider for clip unit tests * exclude ngraph for all clip opset 11 tests * fix op version

Add support of ReduceSum int64 (#1664)

c9a4fe2

* Add support of ReduceSum int64 * add unit test for int64

int64 support for 'where' op (#1666)

addf32f

Added some mo optimizations to improve performance (#1674)

7408dec

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

Share default CPU allocator with Mlas preferred alignment (#1682)

5873bdb

Description: make default CPU allocator to use MLAS preferred alignment Motivation and Context This is needed for C API to have an aligned default CPU allocator, the same as the one in CPU provider

More fixes on the NuGet CPU CI pipeline (#1688)

f25847b

- Fix the Windows end-to-end test in NuGet CI - Skip the TestModelSerialization, because it is failing on Linux. Must be fixed before API is released for use. Owner is notified.

treat zero point properly (#1686)

a8998b0

use MLAS for QGEMM in matmulInteger and convInteger (#1692)

961b14a

* use mlas qgemm for u8u8_s32 gemms * update test

faxu and others added 29 commits August 27, 2019 21:31

remove @PCGOTREL x64 usage (#1707)

14eae29

Avoid the need for @PCGOTREL relocations by annotating MLAS global data shared with assembly modules with attribute(visibility("hidden")).

MLAS: Android sgemm kernel build fix (#1710)

73312b8

Fix the aarch64 kernel to build properly with the Android NDK (specifically clang).

Remove TaskThreadPool (#1713)

81ad480

Support 'Bilinear' mode for 2D inputs in Resize and Upsample kernels (#…

4b5b037

…1679) * Support bilinear mode with actual 2D inputs in Resize and upsample * Fix build break * Fix build break * Add test * CUDA changes * Resolve PR comments * Resolve comments

add implementation for dynamic quantize linear (#1697)

e54904e

Fix reading of onnx domain causing one of the automl models to break …

25d02a3

…in 0.5 release. (#1694) * Mention OrtCreateSessionFromArray in C API doc * Fix registration of Equal op causing one of the automl models to break in 0.5 release. * updates...

Publish perf tool with nightly build (#1728)

833e183

Update the docker file for OpenVINO (#1741)

dc9c895

Update the docker file for OpenVINO which is used for AML

Fix typo in NMS code

2598637

Fix typo in NMS code

MKL-DNN EP: control flow fix (#1740)

f4a6d26

* moved subgraph_index to MklDnn Execution Provider * code cleanup

Enforce shape validation. (#1716)

ad7ab3d

* Mention OrtCreateSessionFromArray in C API doc * Enforce shape validation. * Update broken models

enable quantizing specific nodes (#1742)

0f6cf9a

Add nuphar to Linux CI build (#1750)

94d9161

Added emotion ferplus support (#1752)

9523977

Signed-off-by: suryasidd <surya.siddharth.pemmaraju@intel.com>

Put the initializers at the end of the cluster inputs list (#1751)

4ed8d4b

Restore the missing variable

Updated docs related to base images (#1753)

3d44c55

* Update README.md * Update onnx-inference-byoc-gpu-cpu-aks.ipynb * Update README.md

fixed "unreachable code" warnings on Windows (#1755)

eddb9d7

When NUPHAR_USE_MKL or NUPHAR_USE_AVX2 is not defined, we got "unreachable code" warnings on Windows, which were truned into errors and broke the build.

Enable Nuphar docker build, and reinstate Nuphar tests (#1757)

58fe5a6

Enable Nuphar EP docker build Revert back to LLVM 6.0.1 Reinstate disabled Softmax tests caused by LLVM 8.0.1 Reinstate Nuphar Python test due to stale sympy version Increase build timeout of Linux CI

Rename OrtAllocatorInfo to OrtMemoryInfo to make it more obvious. (#1758

52fe574

) * Mention OrtCreateSessionFromArray in C API doc * Rename OrtAllocatorInfo to OrtMemoryInfo to avoid confusion

Fix some unnecessary copies of the Node attributes (#1763)

e1a12b1

add dequantize and quantize back to contrib ops (#1712)

b2a2326

MLAS: clang u8u8 GEMM fix

a324ad7

EmmaNingMS merged commit 7201d9d into EmmaNingMS:master Sep 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync main branch#1

Sync main branch#1
EmmaNingMS merged 62 commits intoEmmaNingMS:masterfrom
microsoft:master

EmmaNingMS commented Sep 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

EmmaNingMS commented Sep 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants