You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not much of an issue anymore. The minute difference are due to precision differences in floating point operations. Downcasting operations from FP32 to BF16/FP16 introduces a slight amount of numerical instability, which causes the engine to produce slightly different responses in otherwise deterministic settings.
This has not been an issue so far for downstream users, but I'll investigate a fix as soon as I'm able.
Someone has raised this issue in the vLLM project
vllm-project/vllm#450 (comment)
The text was updated successfully, but these errors were encountered: