Skip to content

Support for IBM Spyre#56

Merged
sducouedic merged 2 commits into
mainfrom
internal_to_public
Dec 19, 2024
Merged

Support for IBM Spyre#56
sducouedic merged 2 commits into
mainfrom
internal_to_public

Conversation

@sducouedic
Copy link
Copy Markdown
Member

@sducouedic sducouedic commented Dec 19, 2024

This PR allows support for IBM's Spyre accelerator.

This work was carried out in an private fork of vLLM. We are now moving the code into the open, and all future work will be done using this public fork.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

sducouedic and others added 2 commits December 19, 2024 11:08
Initial code drop with Spyre support

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Nikolaos Papandreou <npo@zurich.ibm.com>
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nikolaos Papandreou <npo@zurich.ibm.com>
Co-authored-by: TRAVIS JOHNSON <tsjohnso@us.ibm.com>
Co-authored-by: Burkhard Ringlein <NGL@zurich.ibm.com>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
Co-authored-by: Maximilien Philippe Marie de Bayser <mbayser@br.ibm.com>
Signed-off-by: Sophie du Couedic <sophie.du.couedic.de.kergoualer@ibm.com>
Signed-off-by: Sophie du Couedic <sophie.du.couedic.de.kergoualer@ibm.com>
Copy link
Copy Markdown
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sducouedic sducouedic merged commit 99523dd into main Dec 19, 2024
@sducouedic sducouedic deleted the internal_to_public branch December 19, 2024 11:16
tdoublep added a commit that referenced this pull request Jan 20, 2025
@mbayser reported that trying to deploy inference server using latest
image, he was seeing this error:
```
$ python3 -m vllm.entrypoints.openai.api_server      --model /models/granite-8b-code-base/     --max-model-len=2048     --block-size=2048
...
[SENDNNWorker] warmup 2/2...
compile_graph: /project_src/deeptools/dsm/graphOptimizer2.cpp:1060: void Dsm::runDsmAct2ForProgSharing(const std::map<std::__cxx11::basic_string<char>, std::vector<std::pair<std::__cxx11::basic_string<char>, sengraph::Attribute> > >&, sengraph::Graph*): Assertion `outShape.at(outDimIdx) == 1' failed.
```
I tracked this down to the recent changes around the dtype. We recently
reverted the change to take the user-provided dtype, since this caused
many issues for users, and actually doesn't provide any flexibility at
this stage. However, we forgot to apply the same rule to the dtype that
is used for the mask. This PR fixes that, and also includes a couple of
changes to clean things up:

1. The dtype that we hard-code is now an attribute of the model object,
and therefore can be re-used in different parts of the code.
2. I added a check to see if the user-provided dtype matches what we
hard-code, and to log a warning if this is not the case. This is good
practice, to let the user know what is happening at least. Note that I
have used the logging functionality of vLLM rather than print
statements. We need to replace the other print statements in the rest of
our code in a similar way.
3. I added `DYN_BACKEND` to the list of env variables and renamed it to
be VLLM_SPYRE_DYNAMO_BACKEND` to be consistent with the other
environment variables.

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants