Is there a current way to run lm-eval against a self-hosted inference server? #1072
Labels
feature request
A feature that isn't implemented yet.
help wanted
Contributors and extra help welcome.
We are interested in trying to run lm-eval on a low-resource machine and have it talk to models on a self-hosted inference server. We are not bound to any specific inference server, but some that we are interested in are vLLM, TGI, and ray-llm.
Is there a current way to do this out of the box? It looks like the
big-refactor
branch has support for loading models with vLLM, but only in the same process as the evaluation runner (which would require GPU resources on the machine).One option I was thinking about was to extend the
LLM
base class and implement bindings to one of our self-host inference servers. But I'm not sure if that is necessary if the library already supports that capability.Thanks for the help!
The text was updated successfully, but these errors were encountered: