Bump actions/github-script from 7.0.1 to 8.0.0#85
Closed
dependabot[bot] wants to merge 34 commits into
Closed
Conversation
This PR allows support for IBM's Spyre accelerator. This work was carried out in an private fork of vLLM. We are now moving the code into the open, and all future work will be done using this public fork. --------- Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Nikolaos Papandreou <npo@zurich.ibm.com> Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Sophie du Couedic <sophie.du.couedic.de.kergoualer@ibm.com> Co-authored-by: Sophie du Couédic <Sophie.Du.Couedic.de.Kergoualer@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Nikolaos Papandreou <npo@zurich.ibm.com> Co-authored-by: TRAVIS JOHNSON <tsjohnso@us.ibm.com> Co-authored-by: Burkhard Ringlein <NGL@zurich.ibm.com> Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com> Co-authored-by: Maximilien Philippe Marie de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
This PR enables the Spyre tests to run as a Github action. I realized that the model we were using for the tests `llama-194m` is not available on HF hub, but if we want to run the tests externally we need to use some model that is available. I've replaced it with this one: https://huggingface.co/JackFram/llama-160m Note I haven't actually changed the model name in the tests, I just "hacked" it for now using a soft link in the docker container. This is because there is some ongoing work to introduce environment variables to control the tests and I don't want to complicate things. For this model I see some quite weird behaviour where the tokens produced by vLLM and HF Transformers are identical but the decode text is slightly different (they are the same up to a leading space). I don't think this difference is related to Spyre so I've just changed the test to compare token ids instead. --------- Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Merge from upstream to include new embedding model fixes. There were new changes to the platform code and the task names were refactored, so I had to fix some of our code.
Some models such as `sentence-transformers/all-MiniLM-L12-v2` don't have special tokens such as "bos_token" in their tokenizer configuration. This causes a key error when the warmup logic tries to get the id for these tokens. However, since the IDs are only used to exclude them from the set of tokens that can be generated during the warmup, it doesn't make a difference if they don't exist.
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Fix issue with batch padding changing during decoding (e.g., if one sequence finished before the others).
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
This function no longer exists upstream and we don't see to use `VLLM_INSTANCE_ID` anywhere else in the code.
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
These now work due to upstream changes that were pulled in.
…ncies. These comments cause the packages to be stripped out when running use_existing_torch, but the packages are required dependencies. Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
…ncies (#68) I already tried to fix this using #66 but upstream didn't like that change (the behaviour to filter out comments containing torch was intentional). After [some discussion](vllm-project/vllm#12255), we agreed on a different solution implemented in this PR. Note that I reverted the changes from #66 by force pushing main. Note this has already been merged upstream by vllm-project/vllm#12260 but I'm cherry-picking the fix here since it is blocking the CI builds.
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Small typo fix referring to wrong test script...
This PR reworks our code according to some important upstream changes. In particular, there is no longer any need to have a separate `SpyreExecutor` and `MultiprocessingSpyreExecutor`. Upstream has added generic classes for this that work across different platforms. Acutally, it simplifies our code quite a lot. The model runner classes now inherit from `ModelRunnerBase` and we need to define a `ModelInputForSpyre` class accordingly. This is current passing all CPU tests, but needs to be tested on Spyre and needs careful review since it quite a big change. **Note:** the target for this PR is a branch `upstream-2025-01-17` containing upstream changes merged into our current branch. I've done it like this so it is easier to review the changes. If this PR is approved, we can then merge the changes into `upstream-2025-01-17` and then merge that one into main. --------- Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
This PR fixes a previously unidentified bug and adds pytests for validation. **Changes**: - addressing the logic error described below by introducing `SpyreCausalLM.indices` containing a mask indicating the unfinished sequences in the current batch. -> [commit](3f087a7) - adapting the generation functions in [tests/spyre/spyre_util.py](main...ysc-fix-variable-max-tokens#diff-d232e0cf89b92b0ec7da17e322bb2ca675af8a704099e5ae0c54995ddb4a3f9a) for `hf` and `vllm` to accept different number of max decoding token for sequences within the same batch -> [commit](f632e8e) - adding [tests/spyre/test_spyre_max_new_tokens.py](main...ysc-fix-variable-max-tokens#diff-82d9214a22b1db2e524795c8a649a40c115fd95a40b279e4d3245c7820e6ddf8) to validate functionality when sequences in a batch finish decoding before others. -> [commit](f632e8e) **Bug description**: Having a different number of requested output tokens within the same batch will lead to some sequences being removed from the batch while others are still decoding. Previously the code did not take into account the offset a removed sequence introduces in the `positions` (ids) and (attention) `masks`. This error remains undetected if all prompts are of the same length (they will have the same position ids and attention masks) or if always the last sequence in a batch finishes early (the offset at the end will not affect sequences with smaller indices within the same batch). _bug example_: <img width="1392" alt="Screenshot 2025-01-31 at 12 39 26" src="https://github.com/user-attachments/assets/b19deee5-af32-48cd-9b1a-051e9f074737" /> --------- Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Bumps [actions/github-script](https://github.com/actions/github-script) from 7.0.1 to 8.0.0. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@60a0d83...ed59741) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: 8.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Contributor
Author
|
Looks like actions/github-script is up-to-date now, so this is no longer needed. |
starpit
pushed a commit
to starpit/vllm-ibm
that referenced
this pull request
Sep 19, 2025
initial working implementation of spans
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps actions/github-script from 7.0.1 to 8.0.0.
Release notes
Sourced from actions/github-script's releases.
Commits
ed59741Merge pull request #653 from actions/sneha-krip/readme-for-v82dc352eBold minimum Actions Runner version in README01e118cUpdate README for Node 24 runtime requirements8b222acApply suggestion from@salmanmkcadc0eeaREADME for updating actions/github-script from v7 to v820fe497Merge pull request #637 from actions/node24e7b7f22update licenses2c81ba0Update Node.js version support to 24.xf28e40cMerge pull request #610 from actions/nebuk89-patch-11ae9958Update README.mdDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)