Skip to content

[Hexagon] Enable Hexagon User DMA bypass mode#13147

Closed
adstraw wants to merge 1 commit intoapache:mainfrom
adstraw:straw-hex-dma-bypass
Closed

[Hexagon] Enable Hexagon User DMA bypass mode#13147
adstraw wants to merge 1 commit intoapache:mainfrom
adstraw:straw-hex-dma-bypass

Conversation

@adstraw
Copy link
Copy Markdown
Contributor

@adstraw adstraw commented Oct 19, 2022

Enable Hexagon User DMA bypass mode.

Note All numbers from 888
TLDR Showing peak DMA bandwidth of up to 40.5848 GBps and peak compute of 38.0909 Gops

PASSING FUNCTIONAL TESTS:

tests/python/contrib/test_hexagon/test_run_unit_tests.py
tests/python/contrib/test_hexagon/test_software_pipeline_async.py
tests/python/contrib/test_hexagon/test_async_dma_pipeline.py

PASSING PERFORMANCE TESTS:

tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py

tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[10240-4-2-128] Test bandwidth with buffer size 0.01MB...
-Base: 3.4133 GBps
-Vectorized: 19.7858 GBps
-Vectorized and Parallelized: 5.5934 GBps
-Single DMA Copy: 6.3962 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[20480-4-2-128] Test bandwidth with buffer size 0.02MB...
-Base: 3.6213 GBps
-Vectorized: 32.8853 GBps
-Vectorized and Parallelized: 11.2528 GBps
-Single DMA Copy: 10.62 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[40960-4-2-128] Test bandwidth with buffer size 0.04MB...
-Base: 3.7874 GBps
-Vectorized: 46.8636 GBps
-Vectorized and Parallelized: 20.9945 GBps
-Single DMA Copy: 17.2299 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[81920-4-2-128] Test bandwidth with buffer size 0.08MB...
-Base: 3.9149 GBps
-Vectorized: 60.4548 GBps
-Vectorized and Parallelized: 38.8462 GBps
-Single DMA Copy: 24.1973 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[163840-4-2-128] Test bandwidth with buffer size 0.16MB...
-Base: 3.9843 GBps
-Vectorized: 69.3581 GBps
-Vectorized and Parallelized: 64.005 GBps
-Single DMA Copy: 29.7732 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[327680-4-2-128] Test bandwidth with buffer size 0.31MB...
-Base: 4.0159 GBps
-Vectorized: 76.1037 GBps
-Vectorized and Parallelized: 101.4885 GBps
-Single DMA Copy: 34.4364 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[655360-4-2-128] Test bandwidth with buffer size 0.62MB...
-Base: 4.0322 GBps
-Vectorized: 80.2672 GBps
-Vectorized and Parallelized: 138.9689 GBps
-Single DMA Copy: 37.7413 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[1048576-4-2-128] Test bandwidth with buffer size 1.0MB...
-Base: 3.0942 GBps
-Vectorized: 63.3063 GBps
-Vectorized and Parallelized: 112.2099 GBps
-Single DMA Copy: 38.9783 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[2097152-4-2-128] Test bandwidth with buffer size 2.0MB...
-Base: 1.1455 GBps
-Vectorized: 8.7992 GBps
-Vectorized and Parallelized: 32.7387 GBps
-Single DMA Copy: 40.1052 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[3145728-4-2-128] Test bandwidth with buffer size 3.0MB...
-Base: 0.8867 GBps
-Vectorized: 5.932 GBps
-Vectorized and Parallelized: 22.9154 GBps
-Single DMA Copy: 40.5201 GBps

PASSED
tests/python/contrib/test_hexagon/test_vtcm_bandwidth.py::TestMatMulVec::test_bandwidth[4194304-4-2-128] Test bandwidth with buffer size 4.0MB...
-Base: 0.8243 GBps
-Vectorized: 5.7229 GBps
-Vectorized and Parallelized: 22.945 GBps
-Single DMA Copy: 40.5848 GBps

PASSED

tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py

tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py::TestMatMulVec::test_loading_vtcm_for_vrmpy[1024-4-8-64-16-8] Test with 0.39 MB of data to load...
-No VTCM: 113.2208 Gops
-Basic VTCM: 0.5998 Gops
-Vectorized: 1.35 Gops
-Vectorized and Parallelized: 1.7299 Gops
-Preallocated and Vectorized: 47.6337 Gops
-Preallocated, Vectorized, and Parallelized: 46.6338 Gops
-Single DMA: 25.9121 Gops
-Preloaded: 148.4394 Gops

PASSED
tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py::TestMatMulVec::test_loading_vtcm_for_vrmpy[2048-4-8-64-16-8] Test with 0.79 MB of data to load...
-No VTCM: 149.3698 Gops
-Basic VTCM: 0.6187 Gops
-Vectorized: 1.4778 Gops
-Vectorized and Parallelized: 2.1075 Gops
-Preallocated and Vectorized: 52.7311 Gops
-Preallocated, Vectorized, and Parallelized: 64.594 Gops
-Single DMA: 31.4309 Gops
-Preloaded: 221.7175 Gops

PASSED
tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py::TestMatMulVec::test_loading_vtcm_for_vrmpy[4096-4-8-64-16-8] Test with 1.57 MB of data to load...
-No VTCM: 45.17 Gops
-Basic VTCM: 0.5283 Gops
-Vectorized: 1.2955 Gops
-Vectorized and Parallelized: 2.1974 Gops
-Preallocated and Vectorized: 12.3972 Gops
-Preallocated, Vectorized, and Parallelized: 32.5443 Gops
-Single DMA: 34.9533 Gops
-Preloaded: 295.8179 Gops

PASSED
tests/python/contrib/test_hexagon/test_parallel_hvx_load_vtcm.py::TestMatMulVec::test_loading_vtcm_for_vrmpy[10240-4-8-64-16-8] Test with 3.93 MB of data to load...
-No VTCM: 17.88 Gops
-Basic VTCM: 0.4725 Gops
-Vectorized: 1.2207 Gops
-Vectorized and Parallelized: 2.313 Gops
-Preallocated and Vectorized: 7.3002 Gops
-Preallocated, Vectorized, and Parallelized: 18.9695 Gops
-Single DMA: 38.0909 Gops
-Preloaded: 365.0014 Gops

PASSED

@tvm-bot
Copy link
Copy Markdown
Collaborator

tvm-bot commented Oct 19, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@adstraw adstraw force-pushed the straw-hex-dma-bypass branch from c47b451 to 7bc715f Compare October 21, 2022 00:01
@adstraw adstraw changed the title [Hexagon] Enable DMA bypass with cache invalidate [Hexagon] Enable Hexagon User DMA bypass mode Oct 28, 2022
@adstraw adstraw force-pushed the straw-hex-dma-bypass branch from 9912a49 to a464a0f Compare November 4, 2022 20:57
@adstraw adstraw force-pushed the straw-hex-dma-bypass branch 7 times, most recently from d1b387f to 64e9071 Compare November 14, 2022 20:25
@adstraw adstraw force-pushed the straw-hex-dma-bypass branch from 64e9071 to d08330a Compare November 14, 2022 20:28
@adstraw adstraw closed this Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants