Implementing local OpenAI API-style chat completions on any given inference server #1174

veekaybee · 2023-12-19T20:07:31Z

This PR addresses this issue:

by passing a base_url to a new class, LocalChatCompletionsLM, which inherits from OpenaiChatCompletionsLM and accepts a local HuggingFace-style model name and uses TikToken with modifications to pass HuggingFace model encodings.

To use this, you'll need to pass EMPTY as your OpenAI token and hit a local inference server.

Confirmed that it works by running lm_eval --model local-chat-completions --tasks gsm8k --model_args model=facebook/opt-125m,base_url=http://{yourip}:8000/v1 against a local vLLM inference server

Compare to the OpenAI task:

lm_eval --model openai-chat-completions --tasks gsm8k this should also work with an OpenAI key.

Related PRS and context:

Sample response:

local-chat-completions (), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|Tasks|Version|  Filter  |n-shot|  Metric   |Value|   |Stderr|
|-----|-------|----------|-----:|-----------|----:|---|-----:|
|gsm8k|Yaml   |get-answer|     5|exact_match|    0|±  |     0|

INFO:     {serverip}:47136 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 12-19 12:03:57 async_llm_engine.py:379] Received request cmpl-0ce5d6fa82ba417cbf178069974aaf28: prompt: "Question: At Mario's barbershop haircuts are 50% more expensive during the weekends. If Mario paid $18 for his last haircut on Monday, how much he would have paid the day before?\nAnswer: Mario's $18 cut on Monday would have been 50% more expensive on Sunday or $18*50% = $<<18*50*.01=9>>9 more expensive\nThat means he would have paid $9 more on Sunday than what he paid ($18) on Monday or $9+$18 = $<<9+18=27>>27\n#### 27\n\nQuestion: Jack buys a squat rack for $2500.  The barbell cost 1/10 as much.  How much did he pay for everything?\nAnswer: The barbell cost 2500/10=$<<2500/10=250>>250\nSo he paid 2500+250=$<<2500+250=2750>>2750 for everything\n#### 2750\n\nQuestion: There are 9 boys and 12 girls in a class. The teacher needs to create groups with three members for their class activity. How many groups are formed?\nAnswer: There are 9 + 12 = <<9+12=21>>21 students in a class.\nHence, 21/3 = <<21/3=7>>7 groups are formed.\n#### 7\n\nQuestion: What is fifteen more than a quarter of 48?\nAnswer: A quarter of 48 is 48/4=<<48/4=12>>12.\nThe number is 12+15=<<12+15=27>>27.\n#### 27\n\nQuestion: A bond paper ream has 500 sheets and costs $27. An office needs 5000 sheets of bond paper. How much will it cost to buy their needed sheets of paper?\nAnswer: An office needs to buy 5000/500 = <<5000/500=10>>10 reams of bond paper.\nSo, it will cost 10 x $27 = $<<10*27=270>>270.\n#### 270\n\nQuestion: Jon runs a triathlon.  It takes him 40 minutes for the swim, an hour and 20 minutes for the bike ride and 50 minutes for the run.  Compared to Jon, James finishes the swim 10% faster but takes 5 minutes longer on the bike.  If Jon won by 10 minutes, how long did it take James to do the run?\nAnswer:</s>", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], ignore_eos=False, max_tokens=256, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt token ids: [2, 45641, 35, 497, 8782, 18, 2003, 9569, 9547, 39799, 7046, 32, 654, 207, 55, 3214, 148, 5, 12729, 4, 318, 8782, 1199, 68, 1366, 13, 39, 94, 29618, 15, 302, 6, 141, 203, 37, 74, 33, 1199, 5, 183, 137, 116, 50118, 33683, 35, 8782, 18, 68, 1366, 847, 15, 302, 74, 33, 57, 654, 207, 55, 3214, 15, 395, 50, 68, 1366, 3226, 1096, 207, 5457, 68, 48203, 1366, 3226, 1096, 44460, 2663, 5214, 466, 44226, 466, 55, 3214, 50118, 1711, 839, 37, 74, 33, 1199, 68, 466, 55, 15, 395, 87, 99, 37, 1199, 1358, 1366, 43, 15, 302, 50, 68, 466, 2744, 1629, 1366, 5457, 68, 48203, 466, 2744, 1366, 5214, 2518, 44226, 2518, 50118, 49629, 974, 50118, 50118, 45641, 35, 2722, 13079, 10, 31147, 20004, 13, 68, 41374, 4, 1437, 20, 2003, 11312, 701, 112, 73, 698, 25, 203, 4, 1437, 1336, 203, 222, 37, 582, 13, 960, 116, 50118, 33683, 35, 20, 2003, 11312, 701, 35014, 73, 698, 45946, 48203, 41374, 73, 698, 5214, 5714, 44226, 5714, 50118, 2847, 37, 1199, 35014, 2744, 5714, 45946, 48203, 41374, 2744, 5714, 5214, 2518, 1096, 44226, 2518, 1096, 13, 960, 50118, 49629, 974, 1096, 50118, 50118, 45641, 35, 345, 32, 361, 2786, 8, 316, 1972, 11, 10, 1380, 4, 20, 3254, 782, 7, 1045, 1134, 19, 130, 453, 13, 49, 1380, 1940, 4, 1336, 171, 1134, 32, 4829, 116, 50118, 33683, 35, 345, 32, 361, 2055, 316, 5457, 48188, 466, 2744, 1092, 5214, 2146, 44226, 2146, 521, 11, 10, 1380, 4, 50118, 725, 4086, 6, 733, 73, 246, 5457, 48188, 2146, 73, 246, 5214, 406, 44226, 406, 1134, 32, 4829, 4, 50118, 49629, 262, 50118, 50118, 45641, 35, 653, 16, 23843, 55, 87, 10, 297, 9, 2929, 116, 50118, 33683, 35, 83, 297, 9, 2929, 16, 2929, 73, 306, 5214, 48203, 3818, 73, 306, 5214, 1092, 44226, 1092, 4, 50118, 133, 346, 16, 316, 2744, 996, 5214, 48203, 1092, 2744, 996, 5214, 2518, 44226, 2518, 4, 50118, 49629, 974, 50118, 50118, 45641, 35, 83, 2175, 2225, 769, 424, 34, 1764, 12208, 8, 1042, 68, 2518, 4, 660, 558, 782, 23221, 12208, 9, 2175, 2225, 4, 1336, 203, 40, 24, 701, 7, 907, 49, 956, 12208, 9, 2225, 116, 50118, 33683, 35, 660, 558, 782, 7, 907, 23221, 73, 1497, 5457, 48188, 31830, 73, 1497, 5214, 698, 44226, 698, 769, 7042, 9, 2175, 2225, 4, 50118, 2847, 6, 24, 40, 701, 158, 3023, 68, 2518, 5457, 68, 48203, 698, 3226, 2518, 5214, 21063, 44226, 21063, 4, 50118, 49629, 18673, 50118, 50118, 45641, 35, 4160, 1237, 10, 7182, 22166, 4, 1437, 85, 1239, 123, 843, 728, 13, 5, 6966, 6, 41, 1946, 8, 291, 728, 13, 5, 4806, 3068, 8, 654, 728, 13, 5, 422, 4, 1437, 23570, 7, 4160, 6, 957, 11630, 5, 6966, 158, 207, 3845, 53, 1239, 195, 728, 1181, 15, 5, 4806, 4, 1437, 318, 4160, 351, 30, 158, 728, 6, 141, 251, 222, 24, 185, 957, 7, 109, 5, 422, 116, 50118, 33683, 35, 2].

veekaybee · 2023-12-19T20:08:52Z

@haileyschoelkopf - any thoughts or opinions on whether adding a new class is the way to go here appreciated. I had two thoughts

Create a new class in the OpenAI completions class - lots of replication that we'd need to then refactor out
Create an entirely new class for local models, likely doesn't make sense since it inherits from OpenAI

I decided to go with the first approach for now.

haileyschoelkopf

Happy to dispute / discuss this, but I think although the current approach is alright, we can minimize duplicated code via simply implementing arbitrary base_url features in the actual openAI LM classes. Let me know what you think!

we could have an assertion that base_url is provided if OPENAI_API_KEY is not provided (in the case of something that is a non-self-hosted server we may still want an API key to be provided alongside a unique base_url value).

As far as the HF Tokenizer usage, I think having a kwarg for tokenizer_backend: Literal["tiktoken", "huggingface"] might be alright for user experience? I think we mostly only need the tokenizer's EOS/EOT/EOD token, as well as just tokenizer encoding length for the purposes of sorting long-to-short in input length (could probably remove _encode_pair().

We should factor out the code duplication where possible since we're subclassing the OpenAIChatCompletionsLM class, but I think this makes sense.

Sorry, I realize some of these comments are regarding the existing OAI implementations, so may not be immediately necessary to address in this PR!

lm_eval/models/openai_completions.py

veekaybee · 2023-12-19T21:20:24Z

actual openAI LM classes. Let me know what you think!

Yep, this makes sense! I'm always for less code to maintain. When I was initially thinking about this, my assumption was that eventually we'd want to abstract out OpenAI and have the generic endpoint class not be tied to any vendor-specific implementation, so this was my attempt at starting to keep that logic separate.

In the changes you suggested, would you as the user still call openai-chat-completions and pass the base url? It seems like this might be confusing for users and relying on the API for something it's not really meant for long-term (i.e. calling non-OpenAI models) might not be useful, but we could definitely add documentation around this.

Maybe the first step could be implementing your suggestion and just get that working, and the next scope of work could be to create a class independent of the OpenAI API - does that seem reasonable or outside of what you were envisioning? Or maybe it's more effort than necessary to reimplement it in a generic way for local models? Let me know what you think.

haileyschoelkopf · 2023-12-19T23:17:15Z

In the changes you suggested, would you as the user still call openai-chat-completions and pass the base url?

I think we can add a secondary model name chat-completions or local-chat-completions so that users can also access the OpenAIChatCompletions api via that if they desire, or still call it via openai-chat-completions.

relying on the API for something it's not really meant for long-term (i.e. calling non-OpenAI models) might not be useful,

my assumption was that eventually we'd want to abstract out OpenAI and have the generic endpoint class not be tied to any vendor-specific implementation, so this was my attempt at starting to keep that logic separate.

I think that for the time being, if companies or OS libraries support an OpenAI-mirroring interface for their API, it's reasonable to assume they'll continue to mirror OpenAI's api in future (at least for a good while, or while OpenAI still is the dominant provider s.t. it's beneficial to use their interface to minimize user friction in switching to other providers).

Maybe the first step could be implementing your suggestion and just get that working, and the next scope of work could be to create a class independent of the OpenAI API - does that seem reasonable or outside of what you were envisioning?

Sounds like a plan! If you are willing to do this (allowing the OpenAI implementations to take base_url and a HF tokenizer) for OpenAI's completions API as well in this PR that'd be awesome but no worries if you just want to commit to the ChatCompletions model.

I'm not necessarily opposed to reimplementing some more generic API class (in particular, I think Llama-CPP/GGUF which we currently support separately can be merged into the same class as this extended OpenAI one), but would probably prefer to only do this if there is an existing standard that providers have already converged on--I think innovating our own abstraction or model API interface would be out of scope for this project.

veekaybee · 2023-12-20T00:05:52Z

Sounds like a plan! If you are willing to do this (allowing the OpenAI implementations to take base_url and a HF tokenizer) for OpenAI's completions API as well in this PR that'd be awesome but no worries if you just want to commit to the ChatCompletions model.

Yep, sounds good will add this to the Pr. Thanks for the discussion and clarification!

lm_eval/models/openai_completions.py

veekaybee · 2023-12-20T16:16:46Z

Ok, I think this is good to go 🤞 - one last question - when we pass in an incorrect arg, for example number instead of n, we get a TypeError at the Completions API create step:

lm_eval --model local-chat-completions --tasks gsm8k --model_args model=facebook/opt-125m,base_url=http://{ip}:8000/v1 --gen_kwargs number=1
TypeError: Completions.create() got an unexpected keyword argument 'number'

Do we want to do any error handling before that or is this standard behavior across the library?

haileyschoelkopf

Thanks very much, @veekaybee ! I left a couple very minor nits, once those are resolved this can be good to go.

There's also the Completions API but either that can be handled in another PR or I can port the ChatCompletions changes up to that one myself.

README.md

lm_eval/models/openai_completions.py

veekaybee · 2023-12-20T20:49:08Z

@haileyschoelkopf Ok for me to merge this or do you generally merge?

haileyschoelkopf · 2023-12-20T20:49:33Z

okay to merge! EDIT: oops, looks like it was blocked because a conversation wasn't resolved yet, resolved it!

haileyschoelkopf · 2023-12-20T20:50:06Z

Thanks very much @veekaybee for all your work on this!

veekaybee · 2023-12-20T20:50:30Z

Thanks so much for your patience on this @haileyschoelkopf !! 👏

ehartford · 2023-12-22T01:12:03Z

how to do this?
I try like this

lm_eval --model local-chat-completions --tasks mmlu,mmlu_flan_cot_fewshot --model_args base_url=http://localhost:8000/v1

then I get error

2023-12-22:01:05:53,757 INFO     [task.py:344] Building contexts for task on rank 0...
2023-12-22:01:05:53,784 INFO     [evaluator.py:314] Running loglikelihood requests
Traceback (most recent call last):
  File "/home/azureuser/miniconda3/envs/eval/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/__main__.py", line 231, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/utils.py", line 402, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/evaluator.py", line 150, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/utils.py", line 402, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/evaluator.py", line 325, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 483, in loglikelihood
    raise NotImplementedError("No support for logits.")
NotImplementedError: No support for logits.

haileyschoelkopf · 2023-12-22T01:20:30Z

Hey! yeah, this is the intended usage (for chat models at the moment, Completions only coming soon!). You can eval generative tasks like gsm8k right now, but not ones that require logprobs like MMLU for now.

This is currently the result because OpenAI's ChatCompletions didn't support logits until very recently. Will try to get support for those in asap!

ehartford · 2023-12-22T01:30:02Z

ok thank you for the update!

…erence server (EleutherAI#1174) * LocalChatCompletionsLM add * clean up completions class * clean up completions class * update tokens * README * fix constructor * eos token * folding local-chat-completions into OpenAIChatCompletions * refactoring to include gen_kwargs as passable option * add todo on chat completion kwarg validation * Ruff and README fix * generalize to **kwargs * remove unnecessary kwargs * README and remove kwargs * README

sergiopperez · 2024-01-24T17:17:18Z

Hi @veekaybee , thanks for your contribution! I'm now testing your command:

lm_eval --model local-chat-completions --tasks gsm8k --model_args model=${model_name},base_url=${servername}:8000/v1

but the output has a value of 0:

|Tasks|Version|  Filter  |n-shot|  Metric   |Value|   |Stderr|
|-----|------:|----------|-----:|-----------|----:|---|-----:|
|gsm8k|      2|get-answer|     5|exact_match|    0|±  |     0|

In the description of this PR, you've reported the same table also with value 0. Is the result of 0 expected? I'm using a llama2-7b model, and from the gsm8k score in the llama paper it should be low but not 0. As a comparison, when I use OpenAI's API, the value is not 0:

$ lm_eval --model openai-completions --tasks gsm8k --model_args model=davinci-002 --limit 5

|Tasks|Version|  Filter  |n-shot|  Metric   |Value|   |Stderr|
|-----|------:|----------|-----:|-----------|----:|---|-----:|
|gsm8k|      2|get-answer|     5|exact_match|  0.2|±  |   0.2|

I'll try with a larger llama model later, but wanted to check with you in case you've spotted the same. Thanks!

nickmitchko · 2024-02-07T21:00:57Z

Hi all,

It appears that this local chat completions PR doesn't actually function. Each benchmark I try (non-logit based) fail and never hit the actual server. Instead I get a 200OK in the log and the GPUs never spool up.

Can someone confirm that this merge works?

veekaybee requested review from haileyschoelkopf and lintangsutawika as code owners December 19, 2023 20:07

haileyschoelkopf reviewed Dec 19, 2023

View reviewed changes

lm_eval/models/openai_completions.py Outdated Show resolved Hide resolved

lm_eval/models/openai_completions.py Outdated Show resolved Hide resolved

haileyschoelkopf mentioned this pull request Dec 20, 2023

Using harness with a local internal API #1177

Closed

veekaybee added 10 commits December 20, 2023 10:29

LocalChatCompletionsLM add

a0a4618

clean up completions class

b4e5ab1

clean up completions class

8624188

update tokens

6a0073b

README

bfd31aa

fix constructor

9039c02

eos token

26954fd

folding local-chat-completions into OpenAIChatCompletions

a5065e8

refactoring to include gen_kwargs as passable option

290fcc0

add todo on chat completion kwarg validation

db85da5

veekaybee force-pushed the local-chat-completions branch from 51a8841 to db85da5 Compare December 20, 2023 15:32

Ruff and README fix

abafff5

veekaybee commented Dec 20, 2023

View reviewed changes

lm_eval/models/openai_completions.py Outdated Show resolved Hide resolved

veekaybee commented Dec 20, 2023

View reviewed changes

lm_eval/models/openai_completions.py Show resolved Hide resolved

veekaybee added 2 commits December 20, 2023 11:12

generalize to **kwargs

7dae152

remove unnecessary kwargs

bd30c9c

haileyschoelkopf reviewed Dec 20, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

lm_eval/models/openai_completions.py Outdated Show resolved Hide resolved

veekaybee added 2 commits December 20, 2023 14:50

README and remove kwargs

b48ea13

README

4ad3dc7

haileyschoelkopf approved these changes Dec 20, 2023

View reviewed changes

haileyschoelkopf enabled auto-merge (squash) December 20, 2023 20:45

haileyschoelkopf approved these changes Dec 20, 2023

View reviewed changes

haileyschoelkopf merged commit fcfc0c6 into EleutherAI:main Dec 20, 2023
8 checks passed

veekaybee mentioned this pull request Dec 20, 2023

Is there a current way to run lm-eval against a self-hosted inference server? #1072

Closed

veekaybee mentioned this pull request Jan 31, 2024

Adding vLLM inference entrypoints to Flamingo mozilla-ai/lm-buddy#18

Merged

0-hero mentioned this pull request Mar 13, 2024

Local API, Getting "No support for logits". How to run command line with generate_until only? #1409

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing local OpenAI API-style chat completions on any given inference server #1174

Implementing local OpenAI API-style chat completions on any given inference server #1174

veekaybee commented Dec 19, 2023

veekaybee commented Dec 19, 2023

haileyschoelkopf left a comment

veekaybee commented Dec 19, 2023

haileyschoelkopf commented Dec 19, 2023

veekaybee commented Dec 20, 2023 •

edited

veekaybee commented Dec 20, 2023 •

edited

haileyschoelkopf left a comment

veekaybee commented Dec 20, 2023

haileyschoelkopf commented Dec 20, 2023 •

edited

haileyschoelkopf commented Dec 20, 2023

veekaybee commented Dec 20, 2023

ehartford commented Dec 22, 2023

haileyschoelkopf commented Dec 22, 2023 •

edited

ehartford commented Dec 22, 2023

sergiopperez commented Jan 24, 2024

nickmitchko commented Feb 7, 2024

Implementing local OpenAI API-style chat completions on any given inference server #1174

Implementing local OpenAI API-style chat completions on any given inference server #1174

Conversation

veekaybee commented Dec 19, 2023

veekaybee commented Dec 19, 2023

haileyschoelkopf left a comment

Choose a reason for hiding this comment

veekaybee commented Dec 19, 2023

haileyschoelkopf commented Dec 19, 2023

veekaybee commented Dec 20, 2023 • edited

veekaybee commented Dec 20, 2023 • edited

haileyschoelkopf left a comment

Choose a reason for hiding this comment

veekaybee commented Dec 20, 2023

haileyschoelkopf commented Dec 20, 2023 • edited

haileyschoelkopf commented Dec 20, 2023

veekaybee commented Dec 20, 2023

ehartford commented Dec 22, 2023

haileyschoelkopf commented Dec 22, 2023 • edited

ehartford commented Dec 22, 2023

sergiopperez commented Jan 24, 2024

nickmitchko commented Feb 7, 2024

veekaybee commented Dec 20, 2023 •

edited

veekaybee commented Dec 20, 2023 •

edited

haileyschoelkopf commented Dec 20, 2023 •

edited

haileyschoelkopf commented Dec 22, 2023 •

edited