Hi,
I think there's a bug in delphi offline mode when using the newest vLLM:
We installed delphi a few days ago and run e2e.py in offline mode. A fresh installation of delphi installs vLLM==0.10.2 and we encounter two problems:
- Default gpu_memory_utilization=0.9 causes OOM/crash with no easy way in the interface to override.
- LLM.generate() rejects prompt_token_ids (TypeError), so offline pipeline breaks.
We patched this, making gpu_memory_utilization a parameter in RunConfig, and making a change to offline.py to pass text prompts, e.g. prompts = TokensPrompt.from_token_ids(prompts).
We fixed this in a forked repo https://github.com/schraderSimon/delphi/tree/main .
Is this the correct way to fix it, or do you recommend using an older version of vLLM (which one)?