Skip to content

A faster compatible implementation of LM-SYS FastChat llm-judge and derivatives

License

Notifications You must be signed in to change notification settings

AUGMXNT/llm-judge

Repository files navigation

llm-judge

A faster compatible implementation of LM-SYS FastChat llm-judge and derivatives

We switch to the fastest inference engine depending on format:

  • HF: vLLM
  • AWQ: vLLM
  • GPTQ: ExLlamaV2

Install

We use mamba (install instructions) to run.

mamba create -n llm-judge python=3.11
git clone https://github.com/AUGMXNT/llm-judge
cd llm-judge

# We 
pip install -r requirements.qwen.txt

For Qwen

pip install -r
pip install csrc/layer_norm

TODO

# 4090+3090:
* Original: 8h for 13b... wtf

# V0: Just faster inferences
[x] Just rip out for fast vLLM first
* real    16m35.112s


# AutoAWQ vs vLLM
* vLLM uses more memory than it should
* How's the speed? https://github.com/casper-hansen/AutoAWQ

# Add GGUF support
[ ] python-llama-cpp

# Batching
First Pass:
[x] Organize by Temperature

We actually should thread our queries, since we have multiturn to deal with
We can easily batch the choices together (but maybe shouldn't for seeding purposes)

A real PITA which we don't need.

Better UI
[ ] InquirerPy - anything missing, let you pick, generate cmd-line for batching or run
[ ] Add Config Files
[ ] Look at https://github.com/AUGMXNT/gpt4-autoeval
[ ] Run logging
[ ] Run autoresume

About

A faster compatible implementation of LM-SYS FastChat llm-judge and derivatives

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published