Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Inferentai2 and Optimum Neuron Support #120

Merged
merged 9 commits into from
May 8, 2024
Merged

Add Inferentai2 and Optimum Neuron Support #120

merged 9 commits into from
May 8, 2024

Conversation

philschmid
Copy link
Collaborator

What does this PR do?

This PR add support for infernetia2. To deploy a model on Inferentia2 you have 3 options:

  • Provide an already compiled model with a model.neuron file as HF_MODEL_ID, .e.g. optimum/tiny_random_bert_neuron
  • Provide the HF_OPTIMUM_BATCH_SIZE and HF_OPTIMUM_SEQUENCE_LENGTH environment variables to compile the model on the fly, e.g. HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128
  • Include neuron dictionary in the config.json file in the model archive, e.g. neuron: {"static_batch_size": 1, "static_sequence_length": 128}

The currently supported tasks can be found here. If you plan to deploy an LLM, we recommend taking a look at Neuronx TGI, which is purposly build for LLMs

Copy link
Contributor

@JingyaHuang JingyaHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @philschmid. Just left some small nits.

README.md Outdated Show resolved Hide resolved
Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
@philschmid philschmid merged commit 4164089 into main May 8, 2024
2 checks passed
@philschmid philschmid deleted the inf2 branch May 8, 2024 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants