Skip to content

[DLIGHT][ADRENO] Fix for opencl adreno matmul schedule#17258

Closed
krishnaraj36 wants to merge 55 commits intoapache:mainfrom
krishnaraj36:dequant_matmul_fix
Closed

[DLIGHT][ADRENO] Fix for opencl adreno matmul schedule#17258
krishnaraj36 wants to merge 55 commits intoapache:mainfrom
krishnaraj36:dequant_matmul_fix

Conversation

@krishnaraj36
Copy link
Contributor

Fixed the schedule to support the epilog block.

srkreddy1238 and others added 30 commits August 8, 2024 11:30
Integrate implicit call of BYOC preprocessing module into collage tunning module
and enable benchmark script for adreno targets.

Benchmark results:

**Networks                     | OpenCL texture | OpenCLML  | Collage**
resnet-18-float32        | 0.010584622      | 0.00720695 | 0.007289728
resnet-18-float16        | 0.007052029      | 0.0045642   | 0.004857585
resnet-34-float32        | 0.016259185      | 0.01242092 | 0.013071063
resnet-34-float16        | 0.011350326      | 0.0073473   | 0.00796802
resnet-50-float32        | 0.019188419      | 0.02085548 | 0.018910226
resnet-50-float16        | 0.01338978        | 0.01199576 | 0.011089206
densenet-121-float32 | 0.025430062      | 0.01798478 | 0.013212844
densenet-121-float16 | 0.012384599      | 0.01101491 | 0.008722716
inception_v3-float32   | 0.040408253      | 0.02229727 | 0.022636675
inception_v3-float16   | 0.029910533      | 0.01368941 | 0.014519823
mobilenet-float32       | 0.004093148      | 0.00367917 | 0.003189258
mobilenet-float16       | 0.00280268        | 0.00244494 | 0.002101514

</body>

</html>

Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
get_output_index support added.

Co-authored-by: Siva <quic_sivb@quicinc.com>
Co-authored-by: Siva <quic_sivb@quicinc.com>
Basic support while building Adreno on Windows
Co-authored-by: Siva <quic_sivb@quicinc.com>
srkreddy1238 and others added 25 commits August 8, 2024 11:35
Partition pass should shoose off loading ops based on target support
this config enables choosing target version on python api aswell as
tvmc.
We can now build one binary and use across targets

Co-authored-by: Siva <quic_sivb@quicinc.com>
rpc, device ports and device temp folders made unique across session

Co-authored-by: Siva <quic_sivb@quicinc.com>
Fixed the opencl codegen for few operators -
1. Atomic add for float - opencl doesn't have support float atomic add,
Enabled work-around for this operation with atomic_cmpexch()
2. fmodf - Opencl only support fmod for all floating point
3. nearbyint - Opencl doesn't have this function and henced replaced
with roud function.

---------

Co-authored-by: Siva <quic_sivb@quicinc.com>
Co-authored-by: B, Siva Rama Krishna Reddy <sivb@qti.qualcomm.com>
Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
Test case for clip layer added
Co-authored-by: Sanjay Shankar Krishnaa <sanjs@qti.qualcomm.com>
Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
1. Enhanced the GPU matmul schedule for OpenCL Android and windows
backend.
2. It improves the 2X performance gain for Llama-2-7B prefill process
Model device Earlier prefill perf Optimized prefill perf
Llama-2-7B-chat-hf Snapdragon® 8 Gen 3 27 tok/sec 50 tok/sec

Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
Co-authored-by: Siva <quic_sivb@quicinc.com>
Co-authored-by: Siva <quic_sivb@quicinc.com>
Co-authored-by: Siva <quic_sivb@quicinc.com>
…che#58)

This commit adds support for limited support for texture based group
convolution where the `in_channels` after accounting for `texture dim`
is divisible by `group size`. Otherwise, and if there was no extra
texture dim then it use default compute.
This is the fix for the cases for matmul with epilog block

---------

Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
Co-authored-by: Krishna Raju Vegiraju <kvegiraj@blr-ubuntu-tvm03.qualcomm.com>
@krishnaraj36 krishnaraj36 deleted the dequant_matmul_fix branch August 9, 2024 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants