[DLIGHT][ADRENO] Fix for opencl adreno matmul schedule by krishnaraj36 · Pull Request #17258 · apache/tvm

krishnaraj36 · 2024-08-09T04:50:04Z

Fixed the schedule to support the epilog block.

Integrate implicit call of BYOC preprocessing module into collage tunning module and enable benchmark script for adreno targets. Benchmark results: **Networks | OpenCL texture | OpenCLML | Collage** resnet-18-float32 | 0.010584622 | 0.00720695 | 0.007289728 resnet-18-float16 | 0.007052029 | 0.0045642 | 0.004857585 resnet-34-float32 | 0.016259185 | 0.01242092 | 0.013071063 resnet-34-float16 | 0.011350326 | 0.0073473 | 0.00796802 resnet-50-float32 | 0.019188419 | 0.02085548 | 0.018910226 resnet-50-float16 | 0.01338978 | 0.01199576 | 0.011089206 densenet-121-float32 | 0.025430062 | 0.01798478 | 0.013212844 densenet-121-float16 | 0.012384599 | 0.01101491 | 0.008722716 inception_v3-float32 | 0.040408253 | 0.02229727 | 0.022636675 inception_v3-float16 | 0.029910533 | 0.01368941 | 0.014519823 mobilenet-float32 | 0.004093148 | 0.00367917 | 0.003189258 mobilenet-float16 | 0.00280268 | 0.00244494 | 0.002101514 </body> </html> Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>

get_output_index support added. Co-authored-by: Siva <quic_sivb@quicinc.com>

Co-authored-by: Siva <quic_sivb@quicinc.com>

Basic support while building Adreno on Windows

Co-authored-by: Siva <quic_sivb@quicinc.com>

Partition pass should shoose off loading ops based on target support this config enables choosing target version on python api aswell as tvmc.

We can now build one binary and use across targets Co-authored-by: Siva <quic_sivb@quicinc.com>

rpc, device ports and device temp folders made unique across session Co-authored-by: Siva <quic_sivb@quicinc.com>

Fixed the opencl codegen for few operators - 1. Atomic add for float - opencl doesn't have support float atomic add, Enabled work-around for this operation with atomic_cmpexch() 2. fmodf - Opencl only support fmod for all floating point 3. nearbyint - Opencl doesn't have this function and henced replaced with roud function. --------- Co-authored-by: Siva <quic_sivb@quicinc.com> Co-authored-by: B, Siva Rama Krishna Reddy <sivb@qti.qualcomm.com> Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>

Test case for clip layer added

Co-authored-by: Sanjay Shankar Krishnaa <sanjs@qti.qualcomm.com> Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>

1. Enhanced the GPU matmul schedule for OpenCL Android and windows backend. 2. It improves the 2X performance gain for Llama-2-7B prefill process Model device Earlier prefill perf Optimized prefill perf Llama-2-7B-chat-hf Snapdragon® 8 Gen 3 27 tok/sec 50 tok/sec Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>

Co-authored-by: Siva <quic_sivb@quicinc.com>

…che#58) This commit adds support for limited support for texture based group convolution where the `in_channels` after accounting for `texture dim` is divisible by `group size`. Otherwise, and if there was no extra texture dim then it use default compute.

This is the fix for the cases for matmul with epilog block --------- Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com> Co-authored-by: Krishna Raju Vegiraju <kvegiraj@blr-ubuntu-tvm03.qualcomm.com>

srkreddy1238 and others added 30 commits August 8, 2024 11:30

Qualcomm internal CI additions

7e18fa3

Additional API support (apache#30)

c245139

get_output_index support added. Co-authored-by: Siva <quic_sivb@quicinc.com>

Support for pixel_unshuffle (apache#34)

71b2a01

Co-authored-by: Siva <quic_sivb@quicinc.com>

Windows build dependencies updated

fa55a7a

Basic support while building Adreno on Windows

Support for windows build

5b61b77

Nightly rebase and test

f09a0aa

Compiler options for windows env (apache#49)

7dd77e6

Co-authored-by: Siva <quic_sivb@quicinc.com>

Windows workflow

13b493b

* Windows mainline rebase build workflow

8b27b9b

Dlight windows target selection

e00fae1

Improved schedules for Adreno with q4f16_0 scheme

01a2431

Adreno ACCL byoc support in relax codegen and runtime

d453def

Build and CI infrastructure

0c946c0

Enable AdrenoAcCL for build.

11e1d03

* Use environment variables instead of hardcode.

f14e252

Adreno target utils to support wheel deployment

849d038

Wheel generation through specialized docker

62e6133

AcCL partitioning to be effective only for Adreno target

6993d97

ASF headers

caad997

Release nightly tag and publich on github

9d2ed4e

Update tag commit id for nightly builds

243812a

Zlib enabled for TVM windows build

25f094a

Copy ACCL lib if availabe for runtime

9faf084

Windows python wheel nightly build

9e012eb

Windows build fix

69eaf05

Releases under seperate repo.

a0604aa

Build Arm64 artifcts.

4f0a7fd

New CI Config

3169ad6

Windows build

b60d78c

srkreddy1238 and others added 25 commits August 8, 2024 11:35

Windows Rebase build config.

caa99f7

tvm lib build script.

601d009

Release paths and names

c980689

Windows CI flow.

7bc9b5e

Compiler pass config to choose target clml support version

f77def5

Partition pass should shoose off loading ops based on target support this config enables choosing target version on python api aswell as tvmc.

Copy AcCL lib if available

45e5846

Use only new CI for all tasks.

16d9811

Update adreno_main.yml

a1a11fb

Dynamic backward compatibility (apache#54)

a8e4bef

We can now build one binary and use across targets Co-authored-by: Siva <quic_sivb@quicinc.com>

Update clml_runtime.h

5cc9794

Update adreno_main.yml

9458ccf

Update adreno_main.yml

33989f7

Ci parallel fix (apache#57)

f8b5774

rpc, device ports and device temp folders made unique across session Co-authored-by: Siva <quic_sivb@quicinc.com>

clip test case updated (apache#59)

9b0581c

Test case for clip layer added

Improved dequant matmul schedule for Adreno (apache#60)

4b1de6d

Co-authored-by: Sanjay Shankar Krishnaa <sanjs@qti.qualcomm.com> Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>

Rebase staging flow (apache#62)

89024d5

Co-authored-by: Siva <quic_sivb@quicinc.com>

Rebase staging flow (apache#63)

20d7979

Co-authored-by: Siva <quic_sivb@quicinc.com>

Update adreno_rebase_staging.yml

f6ce2e9

Staging test and release build flow (apache#64)

f51778d

Co-authored-by: Siva <quic_sivb@quicinc.com>

Update adreno_rebase_staging_release_gen.yml

1032151

[DLIGHT][GPU] Fix for opencl adreno matmul schedule (apache#65)

42c6a17

This is the fix for the cases for matmul with epilog block --------- Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com> Co-authored-by: Krishna Raju Vegiraju <kvegiraj@blr-ubuntu-tvm03.qualcomm.com>

Adreno device selections improved

3841ad5

krishnaraj36 closed this Aug 9, 2024

krishnaraj36 deleted the dequant_matmul_fix branch August 9, 2024 04:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DLIGHT][ADRENO] Fix for opencl adreno matmul schedule#17258

[DLIGHT][ADRENO] Fix for opencl adreno matmul schedule#17258
krishnaraj36 wants to merge 55 commits intoapache:mainfrom
krishnaraj36:dequant_matmul_fix

krishnaraj36 commented Aug 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

krishnaraj36 commented Aug 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants