[Hexagon] Enable Hexagon User DMA bypass mode#13147
Closed
adstraw wants to merge 1 commit intoapache:mainfrom
Closed
[Hexagon] Enable Hexagon User DMA bypass mode#13147adstraw wants to merge 1 commit intoapache:mainfrom
adstraw wants to merge 1 commit intoapache:mainfrom
Conversation
Collaborator
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
c47b451 to
7bc715f
Compare
9912a49 to
a464a0f
Compare
nverke
reviewed
Nov 7, 2022
nverke
reviewed
Nov 7, 2022
d1b387f to
64e9071
Compare
64e9071 to
d08330a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable Hexagon User DMA bypass mode.
Note All numbers from 888
TLDR Showing peak DMA bandwidth of up to 40.5848 GBps and peak compute of 38.0909 Gops
PASSING FUNCTIONAL TESTS:
tests/python/contrib/test_hexagon/test_run_unit_tests.py
tests/python/contrib/test_hexagon/test_software_pipeline_async.py
tests/python/contrib/test_hexagon/test_async_dma_pipeline.py
PASSING PERFORMANCE TESTS:
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[10240-4-2-128] Test bandwidth with buffer size 0.01MB...
-Base: 3.4133 GBps
-Vectorized: 19.7858 GBps
-Vectorized and Parallelized: 5.5934 GBps
-Single DMA Copy: 6.3962 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[20480-4-2-128] Test bandwidth with buffer size 0.02MB...
-Base: 3.6213 GBps
-Vectorized: 32.8853 GBps
-Vectorized and Parallelized: 11.2528 GBps
-Single DMA Copy: 10.62 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[40960-4-2-128] Test bandwidth with buffer size 0.04MB...
-Base: 3.7874 GBps
-Vectorized: 46.8636 GBps
-Vectorized and Parallelized: 20.9945 GBps
-Single DMA Copy: 17.2299 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[81920-4-2-128] Test bandwidth with buffer size 0.08MB...
-Base: 3.9149 GBps
-Vectorized: 60.4548 GBps
-Vectorized and Parallelized: 38.8462 GBps
-Single DMA Copy: 24.1973 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[163840-4-2-128] Test bandwidth with buffer size 0.16MB...
-Base: 3.9843 GBps
-Vectorized: 69.3581 GBps
-Vectorized and Parallelized: 64.005 GBps
-Single DMA Copy: 29.7732 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[327680-4-2-128] Test bandwidth with buffer size 0.31MB...
-Base: 4.0159 GBps
-Vectorized: 76.1037 GBps
-Vectorized and Parallelized: 101.4885 GBps
-Single DMA Copy: 34.4364 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[655360-4-2-128] Test bandwidth with buffer size 0.62MB...
-Base: 4.0322 GBps
-Vectorized: 80.2672 GBps
-Vectorized and Parallelized: 138.9689 GBps
-Single DMA Copy: 37.7413 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[1048576-4-2-128] Test bandwidth with buffer size 1.0MB...
-Base: 3.0942 GBps
-Vectorized: 63.3063 GBps
-Vectorized and Parallelized: 112.2099 GBps
-Single DMA Copy: 38.9783 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[2097152-4-2-128] Test bandwidth with buffer size 2.0MB...
-Base: 1.1455 GBps
-Vectorized: 8.7992 GBps
-Vectorized and Parallelized: 32.7387 GBps
-Single DMA Copy: 40.1052 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[3145728-4-2-128] Test bandwidth with buffer size 3.0MB...
-Base: 0.8867 GBps
-Vectorized: 5.932 GBps
-Vectorized and Parallelized: 22.9154 GBps
-Single DMA Copy: 40.5201 GBps
PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[4194304-4-2-128] Test bandwidth with buffer size 4.0MB...
-Base: 0.8243 GBps
-Vectorized: 5.7229 GBps
-Vectorized and Parallelized: 22.945 GBps
-Single DMA Copy: 40.5848 GBps
PASSED
tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py
tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py::TestMatMulVec::test_loading_vtcm_for_vrmpy[1024-4-8-64-16-8] Test with 0.39 MB of data to load...
-No VTCM: 113.2208 Gops
-Basic VTCM: 0.5998 Gops
-Vectorized: 1.35 Gops
-Vectorized and Parallelized: 1.7299 Gops
-Preallocated and Vectorized: 47.6337 Gops
-Preallocated, Vectorized, and Parallelized: 46.6338 Gops
-Single DMA: 25.9121 Gops
-Preloaded: 148.4394 Gops
PASSED
tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py::TestMatMulVec::test_loading_vtcm_for_vrmpy[2048-4-8-64-16-8] Test with 0.79 MB of data to load...
-No VTCM: 149.3698 Gops
-Basic VTCM: 0.6187 Gops
-Vectorized: 1.4778 Gops
-Vectorized and Parallelized: 2.1075 Gops
-Preallocated and Vectorized: 52.7311 Gops
-Preallocated, Vectorized, and Parallelized: 64.594 Gops
-Single DMA: 31.4309 Gops
-Preloaded: 221.7175 Gops
PASSED
tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py::TestMatMulVec::test_loading_vtcm_for_vrmpy[4096-4-8-64-16-8] Test with 1.57 MB of data to load...
-No VTCM: 45.17 Gops
-Basic VTCM: 0.5283 Gops
-Vectorized: 1.2955 Gops
-Vectorized and Parallelized: 2.1974 Gops
-Preallocated and Vectorized: 12.3972 Gops
-Preallocated, Vectorized, and Parallelized: 32.5443 Gops
-Single DMA: 34.9533 Gops
-Preloaded: 295.8179 Gops
PASSED
tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py::TestMatMulVec::test_loading_vtcm_for_vrmpy[10240-4-8-64-16-8] Test with 3.93 MB of data to load...
-No VTCM: 17.88 Gops
-Basic VTCM: 0.4725 Gops
-Vectorized: 1.2207 Gops
-Vectorized and Parallelized: 2.313 Gops
-Preallocated and Vectorized: 7.3002 Gops
-Preallocated, Vectorized, and Parallelized: 18.9695 Gops
-Single DMA: 38.0909 Gops
-Preloaded: 365.0014 Gops
PASSED