Support for IBM Spyre by sducouedic · Pull Request #56 · IBM/vllm

sducouedic · 2024-12-19T11:05:21Z

This PR allows support for IBM's Spyre accelerator.

This work was carried out in an private fork of vLLM. We are now moving the code into the open, and all future work will be done using this public fork.

github-actions · 2024-12-19T11:05:35Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Initial code drop with Spyre support Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Nikolaos Papandreou <npo@zurich.ibm.com> Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Nikolaos Papandreou <npo@zurich.ibm.com> Co-authored-by: TRAVIS JOHNSON <tsjohnso@us.ibm.com> Co-authored-by: Burkhard Ringlein <NGL@zurich.ibm.com> Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com> Co-authored-by: Maximilien Philippe Marie de Bayser <mbayser@br.ibm.com> Signed-off-by: Sophie du Couedic <sophie.du.couedic.de.kergoualer@ibm.com>

Signed-off-by: Sophie du Couedic <sophie.du.couedic.de.kergoualer@ibm.com>

tdoublep

LGTM

@mbayser reported that trying to deploy inference server using latest image, he was seeing this error: ``` $ python3 -m vllm.entrypoints.openai.api_server --model /models/granite-8b-code-base/ --max-model-len=2048 --block-size=2048 ... [SENDNNWorker] warmup 2/2... compile_graph: /project_src/deeptools/dsm/graphOptimizer2.cpp:1060: void Dsm::runDsmAct2ForProgSharing(const std::map<std::__cxx11::basic_string<char>, std::vector<std::pair<std::__cxx11::basic_string<char>, sengraph::Attribute> > >&, sengraph::Graph*): Assertion `outShape.at(outDimIdx) == 1' failed. ``` I tracked this down to the recent changes around the dtype. We recently reverted the change to take the user-provided dtype, since this caused many issues for users, and actually doesn't provide any flexibility at this stage. However, we forgot to apply the same rule to the dtype that is used for the mask. This PR fixes that, and also includes a couple of changes to clean things up: 1. The dtype that we hard-code is now an attribute of the model object, and therefore can be re-used in different parts of the code. 2. I added a check to see if the user-provided dtype matches what we hard-code, and to log a warning if this is not the case. This is good practice, to let the user know what is happening at least. Note that I have used the logging functionality of vLLM rather than print statements. We need to replace the other print statements in the rest of our code in a similar way. 3. I added `DYN_BACKEND` to the list of env variables and renamed it to be VLLM_SPYRE_DYNAMO_BACKEND` to be consistent with the other environment variables. Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

sducouedic and others added 2 commits December 19, 2024 11:08

README cleanup

00b2db7

Signed-off-by: Sophie du Couedic <sophie.du.couedic.de.kergoualer@ibm.com>

sducouedic force-pushed the internal_to_public branch from 2866c16 to 00b2db7 Compare December 19, 2024 11:09

tdoublep approved these changes Dec 19, 2024

View reviewed changes

sducouedic merged commit 99523dd into main Dec 19, 2024

sducouedic deleted the internal_to_public branch December 19, 2024 11:16

tlrmchlsmth mentioned this pull request Dec 19, 2024

[RFC]: Add support for IBM Spyre accelerator vllm-project/vllm#9652

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for IBM Spyre#56

Support for IBM Spyre#56
sducouedic merged 2 commits into
mainfrom
internal_to_public

sducouedic commented Dec 19, 2024 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Dec 19, 2024

Uh oh!

tdoublep left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sducouedic commented Dec 19, 2024 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Dec 19, 2024

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sducouedic commented Dec 19, 2024 •

edited by github-actions Bot

Loading