-
Notifications
You must be signed in to change notification settings - Fork 910
Issues: huggingface/text-generation-inference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
ROCm: Server error: transport error when running batch size >=2 (Falcon-11B)
#2043
opened Jun 8, 2024 by
almersawi
2 of 4 tasks
GPU memory not saturated using microsoft/Phi-3-small-128k-instruct
#2040
opened Jun 7, 2024 by
calwoo
1 of 4 tasks
RuntimeError: FlashAttention only supports Ampere GPUs or newer.
#2037
opened Jun 7, 2024 by
Ansh-Sarkar
3 of 4 tasks
HuggingFaceM4/idefics2: TGI would crash when I set do_image_splitting to False
#2029
opened Jun 6, 2024 by
newsbreakDuadua9
2 of 4 tasks
4bit quantized model using bnb not able to inference
#2025
opened Jun 5, 2024 by
arihant-neohuman
2 of 4 tasks
TGI 1.4.4+ failed to serve KoboldAI/OPT-13B-Erebus which worked on 1.4.2
#2012
opened Jun 4, 2024 by
KCFindstr
2 of 4 tasks
Problem of inference with Mixtral-8x7B
RuntimeError: ptxas failed with error code 2
#2009
opened Jun 4, 2024 by
EvanDufraisse
2 of 4 tasks
stop
param doesn't work at all for /v1/completions
endpoint
#1999
opened Jun 3, 2024 by
josephrocca
2 of 4 tasks
Unable to load quantized commandrplus-medusa on H100
#1991
opened Jun 1, 2024 by
sdadas
2 of 4 tasks
Llama3 Tokenizer Troubles: All added_tokens unrecognized, given id of
None
#1984
opened May 30, 2024 by
Dtphelan1
2 of 4 tasks
[Feature]: Additional metrics to enable better autoscaling / load balancing of TGI servers in Kubernetes
#1977
opened May 29, 2024 by
EandrewJones
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.