Inference results from vLLM is inconsistant with HF #63

silverriver · 2023-10-11T06:02:28Z

Someone has raised this issue in the vLLM project

In the case of greedy decoding, although LLaMA 7B, 13B, 30B can get meaningful output, the output results are different from HF transformers.

vllm-project/vllm#450 (comment)

The text was updated successfully, but these errors were encountered:

AlpinDale · 2023-11-03T17:07:15Z

Not much of an issue anymore. The minute difference are due to precision differences in floating point operations. Downcasting operations from FP32 to BF16/FP16 introduces a slight amount of numerical instability, which causes the engine to produce slightly different responses in otherwise deterministic settings.

This has not been an issue so far for downstream users, but I'll investigate a fix as soon as I'm able.

silverriver · 2023-11-04T03:34:53Z

Yeap, this is not a big issue in more use cases.

boluoyu · 2024-04-27T13:25:54Z

Yeap, this is not a big issue in more use cases.

This is a big problem, and when I use Qwen1.5 , a lot of the inference results from vLLM are wrong

AlpinDale added the bug Something isn't working label Oct 12, 2023

AlpinDale self-assigned this Feb 16, 2024

AlpinDale closed this as completed Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference results from vLLM is inconsistant with HF #63

Inference results from vLLM is inconsistant with HF #63

silverriver commented Oct 11, 2023

AlpinDale commented Nov 3, 2023 •

edited

Loading

silverriver commented Nov 4, 2023

boluoyu commented Apr 27, 2024

Inference results from vLLM is inconsistant with HF #63

Inference results from vLLM is inconsistant with HF #63

Comments

silverriver commented Oct 11, 2023

AlpinDale commented Nov 3, 2023 • edited Loading

silverriver commented Nov 4, 2023

boluoyu commented Apr 27, 2024

AlpinDale commented Nov 3, 2023 •

edited

Loading