-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: sgl-project/sglang
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[RFC][Feature][Model] Add templated fallback HF
transformers
model backend in SRT
#5471
opened Apr 16, 2025 by
XuehaiPan
1 of 6 tasks
[Bug] SGLang Server Freezes During High Traffic Periods with 16-GPU DeepSeek v3 Setup
#5465
opened Apr 16, 2025 by
moqimoqidea
5 tasks done
[Bug] 模型一直停留在Prefill batch阶段,无法进行decode 出现卡死情况,最终导致计算节点Watchdog Timeout
#5458
opened Apr 16, 2025 by
WALLE-AI
5 tasks
[Bug] rank 0 could finish the model loading, but there are other ranks that didn't finish loading. It is likely due to unexpected failures (eg,OOM) or a slow node.
#5457
opened Apr 16, 2025 by
Alex-9827
5 tasks done
[Bug] possible prefix matching issue for numerous VLMs
#5455
opened Apr 16, 2025 by
KivenChen
5 tasks done
What is the relationship between ModelRunner and the model(deepseek.py,llama.py..etc)?
#5453
opened Apr 16, 2025 by
wangzhen2271
[Bug] sglang.bench_serving IndexError when add extra body stream_options.include_usage
#5451
opened Apr 16, 2025 by
morty-zxb
5 tasks done
[Bug] PD disaggregation, KV transfer slow down under high concurrency
#5450
opened Apr 16, 2025 by
MtFitzRoy
5 tasks done
[Bug] v0.4.5 NameError: name 'VLLM_AVAILABLE' is not defined from compressed_tensors.py
#5443
opened Apr 16, 2025 by
kratorado
5 tasks done
[Bug] ValueError: Model architectures ['Glm4ForCausalLM'] are not supported for now.
#5441
opened Apr 16, 2025 by
chunxingque
5 tasks done
[Feature] optimize SegmentPackBits
high priority
speculative-decoding
#5437
opened Apr 15, 2025 by
zhyncs
2 tasks
Help Needed: Slow Inference on 2xH100x8 Setup with DeepSeek-R1
#5429
opened Apr 15, 2025 by
seungduk-yanolja
[Bug] hiradix_cache encountered an exception while executing self.dec_lock_ref
#5410
opened Apr 15, 2025 by
a4zhangfei
[Bug] Auto-truncation still uses full context length instead of (context_length - max_tokens)
#5409
opened Apr 15, 2025 by
BaiMoHan
5 tasks done
https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python is too slow to download!
#5407
opened Apr 15, 2025 by
Huixxi
[Bug] Llama 4 Scout Outputs Garbage with 2xMi300x
high priority
#5402
opened Apr 15, 2025 by
Bihan
5 tasks done
[Bug] loading meta-llama/Llama-4-Scout-17B-16E-Instruct with flashinfer-0.2.0(+) raises an error when dtype="float16"
#5391
opened Apr 15, 2025 by
liye0626
5 tasks done
[Bug] SGLang server fails during CUDA graph capture: flashinfer JIT requires cuda_fp8.h even with pre-compiled wheel and --disable-cuda-graph
#5389
opened Apr 14, 2025 by
brando90
5 tasks done
[Bug]
HF_HUB_OFFLINE
not longer supported in version 0.4.5
#5386
opened Apr 14, 2025 by
dcfidalgo
5 tasks done
[Bug] Non-deterministic outputs with fixed seed parameter in API server
#5377
opened Apr 14, 2025 by
BaiMoHan
5 tasks done
Previous Next
ProTip!
Follow long discussions with comments:>50.