Support for IBM Spyre#56
Merged
Merged
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Initial code drop with Spyre support Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Nikolaos Papandreou <npo@zurich.ibm.com> Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Nikolaos Papandreou <npo@zurich.ibm.com> Co-authored-by: TRAVIS JOHNSON <tsjohnso@us.ibm.com> Co-authored-by: Burkhard Ringlein <NGL@zurich.ibm.com> Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com> Co-authored-by: Maximilien Philippe Marie de Bayser <mbayser@br.ibm.com> Signed-off-by: Sophie du Couedic <sophie.du.couedic.de.kergoualer@ibm.com>
Signed-off-by: Sophie du Couedic <sophie.du.couedic.de.kergoualer@ibm.com>
2866c16 to
00b2db7
Compare
6 tasks
tdoublep
added a commit
that referenced
this pull request
Jan 20, 2025
@mbayser reported that trying to deploy inference server using latest image, he was seeing this error: ``` $ python3 -m vllm.entrypoints.openai.api_server --model /models/granite-8b-code-base/ --max-model-len=2048 --block-size=2048 ... [SENDNNWorker] warmup 2/2... compile_graph: /project_src/deeptools/dsm/graphOptimizer2.cpp:1060: void Dsm::runDsmAct2ForProgSharing(const std::map<std::__cxx11::basic_string<char>, std::vector<std::pair<std::__cxx11::basic_string<char>, sengraph::Attribute> > >&, sengraph::Graph*): Assertion `outShape.at(outDimIdx) == 1' failed. ``` I tracked this down to the recent changes around the dtype. We recently reverted the change to take the user-provided dtype, since this caused many issues for users, and actually doesn't provide any flexibility at this stage. However, we forgot to apply the same rule to the dtype that is used for the mask. This PR fixes that, and also includes a couple of changes to clean things up: 1. The dtype that we hard-code is now an attribute of the model object, and therefore can be re-used in different parts of the code. 2. I added a check to see if the user-provided dtype matches what we hard-code, and to log a warning if this is not the case. This is good practice, to let the user know what is happening at least. Note that I have used the logging functionality of vLLM rather than print statements. We need to replace the other print statements in the rest of our code in a similar way. 3. I added `DYN_BACKEND` to the list of env variables and renamed it to be VLLM_SPYRE_DYNAMO_BACKEND` to be consistent with the other environment variables. Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR allows support for IBM's Spyre accelerator.
This work was carried out in an private fork of vLLM. We are now moving the code into the open, and all future work will be done using this public fork.