huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 910
Star 8.2k

Code
Issues 147
Pull requests 14
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

147 Open 991 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

ROCm: Server error: transport error when running batch size >=2 (Falcon-11B)

#2043 opened Jun 8, 2024 by almersawi

2 of 4 tasks

Guide/support for quantization + MLPSpeculator

#2042 opened Jun 8, 2024 by josephrocca

Can I run text-generation-interface "offline"

#2041 opened Jun 8, 2024 by yunoJ

GPU memory not saturated using microsoft/Phi-3-small-128k-instruct

#2040 opened Jun 7, 2024 by calwoo

1 of 4 tasks

RuntimeError: FlashAttention only supports Ampere GPUs or newer.

#2037 opened Jun 7, 2024 by Ansh-Sarkar

3 of 4 tasks

Cannot load model HuggingFaceM4/idefics2-8b-AWQ

#2036 opened Jun 7, 2024 by jla346

2 of 4 tasks

HuggingFaceM4/idefics2: TGI would crash when I set do_image_splitting to False

#2029 opened Jun 6, 2024 by newsbreakDuadua9

2 of 4 tasks

Fp8 support KV-Cache

#2027 opened Jun 6, 2024 by philschmid

API_KEY argument

#2026 opened Jun 5, 2024 by nbroad1881

4bit quantized model using bnb not able to inference

#2025 opened Jun 5, 2024 by arihant-neohuman

2 of 4 tasks

Problem with Idefics2: text_generation.errors.GenerationError: Request failed during generation: Server error: SERVER: Received message larger than max (7169121 vs. 4194304)

#2018 opened Jun 5, 2024 by andimarafioti

1 of 4 tasks

TGI 1.4.4+ failed to serve KoboldAI/OPT-13B-Erebus which worked on 1.4.2

#2012 opened Jun 4, 2024 by KCFindstr

2 of 4 tasks

Problem of inference with Mixtral-8x7B RuntimeError: ptxas failed with error code 2

#2009 opened Jun 4, 2024 by EvanDufraisse

2 of 4 tasks

Can't I run llama3 with cuda 12.0?

#2001 opened Jun 4, 2024 by uyeongkim

2 of 4 tasks

Missing Schema in API Documentation

#2000 opened Jun 4, 2024 by jkawamoto

stop param doesn't work at all for /v1/completions endpoint

#1999 opened Jun 3, 2024 by josephrocca

2 of 4 tasks

Support for openbmb/MiniCPM-Llama3-V-2_5

#1998 opened Jun 3, 2024 by sfbemerk

2 tasks done

warmup doesn't work as expected

#1993 opened Jun 3, 2024 by meitalbensinai

2 of 4 tasks

Deberta V3 not supported

#1992 opened Jun 1, 2024 by Stealthwriter

2 of 4 tasks

Unable to load quantized commandrplus-medusa on H100

#1991 opened Jun 1, 2024 by sdadas

2 of 4 tasks

Gemma not starting with tensor parallelism

#1987 opened May 31, 2024 by arunpatala

2 of 4 tasks

Llama3 Tokenizer Troubles: All added_tokens unrecognized, given id of None

#1984 opened May 30, 2024 by Dtphelan1

2 of 4 tasks

Intel XPU Docker image import error on start

#1983 opened May 30, 2024 by grafail

2 of 4 tasks

Support OpenAI's stop parameter logic

#1979 opened May 30, 2024 by thomas-schillaci

[Feature]: Additional metrics to enable better autoscaling / load balancing of TGI servers in Kubernetes

#1977 opened May 29, 2024 by EandrewJones

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly