-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) Multi backend refactor -> main (full diff of all already merged PRs) #1220
Open
Titus-von-Koeller
wants to merge
275
commits into
main
Choose a base branch
from
multi-backend-refactor
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+10,091
−981
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add build job for rocm * Add rocm build script * Copy shared obj file into output_dir * upload build artifacts and enable wheels build * Remove cuda build temporarily * Add ROCm version to .so filename * Add rocm_version to whls build * Revert "Remove cuda build temporarily" This reverts commit 1413c5f. * Add rocm_version env var * Remove thrush header files * Print node info * print cuda node info * Revert "print cuda node info" This reverts commit cdb209a. * Revert "Print node info" This reverts commit 7e9a65c. * Add rocm arch to compile command * Rename .so files to rocm * Update default gpu arch * Skip cpu based igemmlt int tests on ROCm * Update Documentation * Update upstream repo name * Update docs * Update string format Co-authored-by: Aarni Koskela <akx@iki.fi> * Remove pre-release option for torch install * Update pytorch install path Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> * Add messages for Heuristics error * Remove toolcache for disk space * print disk usage * Clean disk space for linux * Fix for ubuntu * Add sudo for apt clean * Update clean up disk list * remove disk usage print * Add BNB_BACKEND variable * Update diagnostic functions for ROCm * Fix tuple error * Fix library detection bug for recursive and symlink cases * fix pre-commit errors * Remove recursive path lib search * Create function for runtime lib patterns * Update logger format Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting Co-authored-by: Aarni Koskela <akx@iki.fi> * Remove commented code Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting * Create hip diagnostics functions * Fix Typo * Fix pre-commit checks * Enable 6.2 build * Skip gemv 4 bit cpu test * Update documentation for 6.2.0 pip install * Update README for default branch change * Fix typo * Sync README with upstream * Remove depth --------- Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Aswin John Mathews <81309834+amathews-amd@users.noreply.github.com> Co-authored-by: root <root@banff-cyxtera-s78-4.ctr.dcgpu>
…tsandbytes from source in personal repo (#1419)
* enable new ipex API ipex weight is 4D so we cannot transpose fix dequant check require grad * use ipex op in backward * enable backward * Multi backend refactor (#8) * AMD: Clarify diagnostic messages; free up disk space for CI build * Add build job for rocm * Add rocm build script * Copy shared obj file into output_dir * upload build artifacts and enable wheels build * Remove cuda build temporarily * Add ROCm version to .so filename * Add rocm_version to whls build * Revert "Remove cuda build temporarily" This reverts commit 1413c5f. * Add rocm_version env var * Remove thrush header files * Print node info * print cuda node info * Revert "print cuda node info" This reverts commit cdb209a. * Revert "Print node info" This reverts commit 7e9a65c. * Add rocm arch to compile command * Rename .so files to rocm * Update default gpu arch * Skip cpu based igemmlt int tests on ROCm * Update Documentation * Update upstream repo name * Update docs * Update string format Co-authored-by: Aarni Koskela <akx@iki.fi> * Remove pre-release option for torch install * Update pytorch install path Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> * Add messages for Heuristics error * Remove toolcache for disk space * print disk usage * Clean disk space for linux * Fix for ubuntu * Add sudo for apt clean * Update clean up disk list * remove disk usage print * Add BNB_BACKEND variable * Update diagnostic functions for ROCm * Fix tuple error * Fix library detection bug for recursive and symlink cases * fix pre-commit errors * Remove recursive path lib search * Create function for runtime lib patterns * Update logger format Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting Co-authored-by: Aarni Koskela <akx@iki.fi> * Remove commented code Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting Co-authored-by: Aarni Koskela <akx@iki.fi> * Update error reporting * Create hip diagnostics functions * Fix Typo * Fix pre-commit checks --------- Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> * check grad before using ipex (#1358) * Enable packaging for ROCm 6.2 (#1367) * Enable 6.2 build * Update documentation for 6.2.0 pip install * Update for VS2022 17.11 compatibility with CUDA < 12.4 (#1341) * Update for VS2022 17.11 compatibility with CUDA < 12.4 * Try again * Enable continuous releases for multi-backend-refactor branch * Update release workflow * Publish continuous release for multi-backend * continuous release: revert wheel renaming due to install err * Revert "continuous release: revert wheel renaming due to install err" This reverts commit 0a2b539. * add dynamic tag-based versioning + git hash for dev vers * docs: update w/ changes from `main` * get tags for dynamic versioning * fine-tune continuous release params * reduce the pkg size + build times for the preview release * refine docs for multi-backend alpha release (#1380) * refine docs for multi-backend alpha release * docs: further tweaks to multi-backend alpha docs * docs: further tweaks to multi-backend alpha docs * docs: further tweaks to multi-backend alpha docs * docs: add multi-backend feedback links * docs: add request for contributions * docs: small fixes * docs: small fixes * docs: add info about `main` continuous build * docs: further tweaks to multi-backend alpha docs * docs: further tweaks to multi-backend alpha docs * docs: remove 2 obsolete lines --------- Co-authored-by: pnunna93 <104791500+pnunna93@users.noreply.github.com> Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Revert "enable backward" This reverts commit cd7bf21. * Revert "use ipex op in backward" This reverts commit b8df1aa. * fix finetune * check training * fix gemv check * reformat * avoid double quant in backward if not needed * Zh/xpu support (#9) * Add xpu support * Add xpu support for int8 * Add xpu dequant kernel support * update code * remove debug comments * remove redundant comments * Add xpu integration for woqlinear * correct the comments * Update cpu_xpu_common.py --------- Co-authored-by: zhuhong61 <hong.zhu@intel.com> Co-authored-by: zhuhong61 <95205772+zhuhong61@users.noreply.github.com> * avoid import triton if CPU and XPU backend * fix setup in docker without git config * xpu do not support compile for now Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update 4bit compute dtype * fix xpu int8 path Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * optimize 4bit dequant Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix xpu dequant Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add empty cache in each xpu op * add nf4 dequant ipex kernel * fix dequant 4bit op * empty cache has negative effect on 4bit gemv * fix xpu save * fix save * xpu use float16 default Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm empty cache as it cause slower perf Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix xpu save Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 8bit int8 param device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 8bit int8 param device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 8bit int8 param device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 8bit int8 param device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format * update readme for Intel CPU and XPU do not need make csrc codes * fix format * fix import --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: pnunna93 <104791500+pnunna93@users.noreply.github.com> Co-authored-by: Aarni Koskela <akx@iki.fi> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com> Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by: zhuhong61 <hong.zhu@intel.com> Co-authored-by: zhuhong61 <95205772+zhuhong61@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Add npu support for nf4 quant Co-authored-by: Slightwind <slightwindsec@gmail.com> Co-authored-by: Ginray <ginray0215@gmail.com> * code format * update * pass lint check and fix typos * add npu to supported devices --------- Co-authored-by: Slightwind <slightwindsec@gmail.com> Co-authored-by: Ginray <ginray0215@gmail.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix dequant 8bit Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * support double quant on intel cpu and xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix shape Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 4bit format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device error for xpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix 4bit tensor shape Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix nf4 xpu finetune Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* new matmul8bit Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix cxb Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* fix xpu dtypoe Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix nf4 dtype Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix setup version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable benchmark script Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Small fixes to non_cuda_backends.mdx --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable quant storage Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix to numpy Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix 4bit XPU dequant 4bit Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix default value Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix ipex linear set Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix ipex linear set to false when calling state dict Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix Int8Param device patch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix xpu to cpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix xpu cpu data device Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR to
main
serves the purpose to keep an overview of all the extensive changes that have been introduced tomulti-backend-refactor
to the iterative PRs around this topic.We will eventually merge this into master and before that do a thorough final review and, as well, get Tim's final sign-off on this extensive refactor.
For now, it mainly serves the purpose of providing a public diff of the entirety of the changes. However, already feel free to leave constructive feedback and review comments.