Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference results from vLLM is inconsistant with HF #63

Closed
silverriver opened this issue Oct 11, 2023 · 3 comments
Closed

Inference results from vLLM is inconsistant with HF #63

silverriver opened this issue Oct 11, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@silverriver
Copy link

Someone has raised this issue in the vLLM project

In the case of greedy decoding, although LLaMA 7B, 13B, 30B can get meaningful output, the output results are different from HF transformers.

vllm-project/vllm#450 (comment)

@AlpinDale AlpinDale added the bug Something isn't working label Oct 12, 2023
@AlpinDale
Copy link
Member

AlpinDale commented Nov 3, 2023

Not much of an issue anymore. The minute difference are due to precision differences in floating point operations. Downcasting operations from FP32 to BF16/FP16 introduces a slight amount of numerical instability, which causes the engine to produce slightly different responses in otherwise deterministic settings.

This has not been an issue so far for downstream users, but I'll investigate a fix as soon as I'm able.

@silverriver
Copy link
Author

Yeap, this is not a big issue in more use cases.

@AlpinDale AlpinDale self-assigned this Feb 16, 2024
@boluoyu
Copy link

boluoyu commented Apr 27, 2024

Yeap, this is not a big issue in more use cases.

This is a big problem, and when I use Qwen1.5 , a lot of the inference results from vLLM are wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants