huggingface / text-generation-inference Public

Notifications
Fork 868
Star 8k

Code
Issues 132
Pull requests 11
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

132 Open 938 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Logging has no formating when using docker enviroment instead of command

#1880 opened May 11, 2024 by onel

1 of 4 tasks

Multi-Model Endpoint support in Sagemaker

#1878 opened May 10, 2024 by Najib-Haq

concurrent requests permit limit is broken

#1877 opened May 10, 2024 by oOraph

1 of 4 tasks

text generation details not working when stream=False

#1876 opened May 10, 2024 by uyeongkim

2 of 4 tasks

How to share memory among 2 GPUS for distributed inference?

#1875 opened May 10, 2024 by martinigoyanes

Automatic NUMA binding

#1874 opened May 10, 2024 by fxmarty

[Question] Onnx support in TGI

#1873 opened May 9, 2024 by Ben-Epstein

how do I adjust the logging level when launching via the docker container?

#1872 opened May 8, 2024 by bitsofinfo

2 of 4 tasks

llama3-70B-Instruct-AWQ causing CUDA error: an illegal memory access was encountered

#1871 opened May 8, 2024 by anindya-saha

4 tasks

Cannot use Inference Endpoint: UnprocessableEntityError: Error code: 422 - {'error': 'Template error: template not found', 'error_type': 'template_error'}

#1870 opened May 8, 2024 by rvoak

1 of 4 tasks

"docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -e HUGGING_FACE_HUB_TOKEN={your_token} ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard" showing error with my token id that "Unable to find image 'ghcr.io/huggingface/text-generation-inference:latest' locally latest: Pulling from huggingface/text-generation-inference docker: no matching manifest for linux/arm64/v8 in the manifest list entries. See 'docker run --help'."

#1868 opened May 7, 2024 by anushka192001

4 tasks

Use pre-built FA2, vllm, quantization kernels in the dockerfiles

#1867 opened May 7, 2024 by fxmarty

Regarding llama3-70b-instruct

#1864 opened May 6, 2024 by chintanshrinath

Mistral7b takes 4 times its size in VRAM on A100

#1863 opened May 6, 2024 by martinigoyanes

Encounter install error when install vllm package.

#1862 opened May 6, 2024 by for-just-we

2 of 4 tasks

TGI-2.0.2 encounter "CUDA is not available"

#1861 opened May 6, 2024 by Cucunnber

2 of 4 tasks

Add Intel Arc iGPU support (Meteor Lake)

#1859 opened May 5, 2024 by sulliwane

Add grammar to chat/completions endpoint / Messages API

#1858 opened May 5, 2024 by ggbetz

Add stop_regex parameter to /generate

#1857 opened May 5, 2024 by rojas-diego

The quantized llama-3-8b-instruct-awq with TGI 1.4 can handle fewer batch requests than the standard llama-3-8b-instruct with TGI 1.4 on the same RTX 3090 with 24GB VRAM.

#1856 opened May 4, 2024 by rxsalad

2 of 4 tasks

404 for Multi-modal docs

#1853 opened May 3, 2024 by RonanKMcGovern

1 of 4 tasks

Serverless inference API endpoints fails to return logprobs via chat completions

#1852 opened May 2, 2024 by ggbetz

2 of 4 tasks

UserWarning: You are using a Backend <class 'text_generation_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0

#1847 opened May 2, 2024 by fxmarty

2 of 4 tasks

Do I need to additionally apply an inference template?

#1846 opened May 2, 2024 by Semihal

2 tasks

Unable to stop TGI after serving models

#1842 opened May 1, 2024 by ponshane

2 of 4 tasks

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly