-
Notifications
You must be signed in to change notification settings - Fork 716
Add SYCL Kernels for XPU backend #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fix transpose
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
revert cpu changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
remove check for better performance
Can we use a more accurate title for the commit? or reviewers would get confused if all SYCL kernels are included in the PR. |
This is the first PR for SYCL kernels targeting QLoRA, I have added detailed description. |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
fix xpu log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Remove ipex entirely
fix lint
When I tried to compile it, I had issues with https://github.khronos.org/SYCL_Reference/iface/nd_range.html https://github.khronos.org/SYCL_Reference/iface/nd_item.html |
I replaced types as described above and tested implementation. In my experiment SYCL implementation was about 2x faster for token generation than triton. I guess due to fused dequant + matmul. Triton compiler currently have an issue with that: intel/intel-xpu-backend-for-triton#4327. However, some tests failed
|
* fix logs Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Hi @Egor-Krivov , |
Hi @Egor-Krivov . Could you share your device name? I can pass all tests on |
* fix sycl nd Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
|
||
|
||
# SYCL should be faster for xpu, so at first checking if it is available. | ||
if not isinstance(lib, ErrorHandlerMockBNBNativeLibrary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently you either pick all methods from SYCL or all methods from triton. However, sycl implementation right now is missing these methods, available in triton:
quantize_blockwize
quantize_4bit
I suggest we keep using these triton methods even with SYCL, since that's the only option on XPU for new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two kernels don't affect the performance of QLoRA, they are now default running with pytorch ops and we will implemented them with SYCL kernel later.
The implementation is missing following methods:
|
Hi @Egor-Krivov . Could you share your script to get this error? |
@Egor-Krivov , these kernels have been implemented already.
@Egor-Krivov, these kernels already implemented with SYCL kernel. |
Hi @matthewdouglas . Could you please trigger the CI for this PR? Thanks! |
This PR is ready for review now, please reach us if there is any other question, thanks! |
I'm working on performance testing of unsloth right now. These methods are used for CUDA implementation here: I am working with POC branch (not merged to upstream) from https://github.com/leizhenyuan/unsloth/blob/7bed913255f611e220c2d219ee988c179ed98033/unsloth/kernels/utils.py#L154 For me the call happens in the last 2 lines of my script, which is essentially a copy of unsloth tutorial:
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @matthewdouglas . The lint test failed with error fix. See this comment. Do you know how to skip xpu kernels on typo test? |
* skip test for xpu ops Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix lint Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * skip typo for xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * skip Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * skip Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
This is the pull request for the SYCL Kernels targeting the XPU backend.