[DLIGHT][ADRENO] Fix for opencl adreno matmul schedule#17258
Closed
krishnaraj36 wants to merge 55 commits intoapache:mainfrom
Closed
[DLIGHT][ADRENO] Fix for opencl adreno matmul schedule#17258krishnaraj36 wants to merge 55 commits intoapache:mainfrom
krishnaraj36 wants to merge 55 commits intoapache:mainfrom
Conversation
Integrate implicit call of BYOC preprocessing module into collage tunning module and enable benchmark script for adreno targets. Benchmark results: **Networks | OpenCL texture | OpenCLML | Collage** resnet-18-float32 | 0.010584622 | 0.00720695 | 0.007289728 resnet-18-float16 | 0.007052029 | 0.0045642 | 0.004857585 resnet-34-float32 | 0.016259185 | 0.01242092 | 0.013071063 resnet-34-float16 | 0.011350326 | 0.0073473 | 0.00796802 resnet-50-float32 | 0.019188419 | 0.02085548 | 0.018910226 resnet-50-float16 | 0.01338978 | 0.01199576 | 0.011089206 densenet-121-float32 | 0.025430062 | 0.01798478 | 0.013212844 densenet-121-float16 | 0.012384599 | 0.01101491 | 0.008722716 inception_v3-float32 | 0.040408253 | 0.02229727 | 0.022636675 inception_v3-float16 | 0.029910533 | 0.01368941 | 0.014519823 mobilenet-float32 | 0.004093148 | 0.00367917 | 0.003189258 mobilenet-float16 | 0.00280268 | 0.00244494 | 0.002101514 </body> </html> Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
get_output_index support added. Co-authored-by: Siva <quic_sivb@quicinc.com>
Co-authored-by: Siva <quic_sivb@quicinc.com>
Basic support while building Adreno on Windows
Co-authored-by: Siva <quic_sivb@quicinc.com>
Partition pass should shoose off loading ops based on target support this config enables choosing target version on python api aswell as tvmc.
We can now build one binary and use across targets Co-authored-by: Siva <quic_sivb@quicinc.com>
rpc, device ports and device temp folders made unique across session Co-authored-by: Siva <quic_sivb@quicinc.com>
Fixed the opencl codegen for few operators - 1. Atomic add for float - opencl doesn't have support float atomic add, Enabled work-around for this operation with atomic_cmpexch() 2. fmodf - Opencl only support fmod for all floating point 3. nearbyint - Opencl doesn't have this function and henced replaced with roud function. --------- Co-authored-by: Siva <quic_sivb@quicinc.com> Co-authored-by: B, Siva Rama Krishna Reddy <sivb@qti.qualcomm.com> Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
Test case for clip layer added
Co-authored-by: Sanjay Shankar Krishnaa <sanjs@qti.qualcomm.com> Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
1. Enhanced the GPU matmul schedule for OpenCL Android and windows backend. 2. It improves the 2X performance gain for Llama-2-7B prefill process Model device Earlier prefill perf Optimized prefill perf Llama-2-7B-chat-hf Snapdragon® 8 Gen 3 27 tok/sec 50 tok/sec Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com>
Co-authored-by: Siva <quic_sivb@quicinc.com>
Co-authored-by: Siva <quic_sivb@quicinc.com>
Co-authored-by: Siva <quic_sivb@quicinc.com>
…che#58) This commit adds support for limited support for texture based group convolution where the `in_channels` after accounting for `texture dim` is divisible by `group size`. Otherwise, and if there was no extra texture dim then it use default compute.
This is the fix for the cases for matmul with epilog block --------- Co-authored-by: krishnaraj36 <quic_kvegiraj@quicinc.com> Co-authored-by: Krishna Raju Vegiraju <kvegiraj@blr-ubuntu-tvm03.qualcomm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixed the schedule to support the epilog block.