-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add support for OpenAI's echo parameter. #699
Comments
is echo supported for chat completion? I only see it for completions: https://platform.openai.com/docs/api-reference/completions/object |
It appears it's only supported in the legacy completion API (of which I just learnt that it's legacy) |
@Vinno97 @ishaan-jaff what're next steps on this? |
I'd be willing to help, if you agree on the value this brings to LiteLLM. If this gets added, EleutherAI/lm-evaluation-harness#804 and EleutherAI/lm-evaluation-harness#869 are perhaps both solved already. |
@Vinno97 recapping my understanding:
Open Questions
|
I'm sorry for the confusion. My main point was about the If you run a prompt through an LLM, it inherently outputs next-token logits for every token, not only the last one. OpenAI decided to expose this information via the As an example: I can send two prompts: "the doctor is a man" and "the doctor is a woman", I can use I'd provide an example OpenAI response if I had access atm, but here's a TGI response (look at Prompt: "The doctor is a man" {
"generated_text": " of",
"details": {
"finish_reason": "length",
"generated_tokens": 1,
"seed": null,
"prefill": [
{
"id": 1410,
"text": "the",
"logprob": null
},
{
"id": 5032,
"text": " doctor",
"logprob": -25.640625
},
{
"id": 304,
"text": " is",
"logprob": -2.6445312
},
{
"id": 241,
"text": " a",
"logprob": -2.8496094
},
{
"id": 546,
"text": " man",
"logprob": -4.2695312
}
],
"tokens": [
{
"id": 275,
"text": " of",
"logprob": -1.8183594,
"special": false
}
],
"top_tokens": null,
"best_of_sequences": null
}
} Prompt: "The doctor is a woman" {
"generated_text": ",",
"details": {
"finish_reason": "length",
"generated_tokens": 1,
"seed": null,
"prefill": [
{
"id": 1410,
"text": "the",
"logprob": null
},
{
"id": 5032,
"text": " doctor",
"logprob": -25.640625
},
{
"id": 304,
"text": " is",
"logprob": -2.6445312
},
{
"id": 241,
"text": " a",
"logprob": -2.8496094
},
{
"id": 2961,
"text": " woman",
"logprob": -3.1992188
}
],
"tokens": [
{
"id": 23,
"text": ",",
"logprob": -1.8242188,
"special": false
}
],
"top_tokens": null,
"best_of_sequences": null
}
} Here you can see that the model I'm using actually thinks that "the doctor is a woman" is a more likely sentence than "the doctor is a man" |
This PR vllm-project/vllm#959 supports Using this branch, I can obtain log-probs for prompt tokens. Please give a try. |
Hey @Vinno97 we'd welcome the PR for echo - excited see the approach! |
working on this PR |
|
it looks like lm-eval harness is not adding support for gpt-3.5-turbo since it does not return logprobs: |
was trying to use our text_completion with eval harness and it failed, lm harness passes prompt as a list - we need to add support for this
|
fixed here: b4e14ae |
it looks like llm eval harness passes Current issues:
"choices": [
{
"text": "on Guardian you get:\n\n1. Secure",
"index": 0,
"logprobs": {
"tokens": [
"on",
" Guardian",
" you",
" get",
":",
"\n",
"\n",
"1",
".",
" Secure"
],
"token_logprobs": [
-3.7846956,
-12.922583,
-2.2359743,
-3.0041907,
-2.0863824,
-0.029573089,
-0.013009035,
-1.3277724,
-0.06319551,
-1.4571579
],
"top_logprobs": [
{
"ac": -2.6180239,
"acey": -3.0217085,
"usted": -3.2943392,
"im": -3.4510107,
"ish": -3.5101204
},
{
",": -1.683592,
"\n": -3.2098136,
"bytes:\\xe2\\x80": -3.2249804,
"Wallet": -3.2496285,
" Legacy": -3.2982492
},
{
" you": -2.2359743,
"\n": -1.2495747,
",": -1.2551193,
" and": -3.8073368,
"bytes:\\xe2\\x80": -4.5486817
},
{
" get": -3.0041907,
" can": -0.5294326,
" will": -2.023661,
" are": -2.8523924,
" have": -3.0540316
},
{
":": -2.0863824,
" a": -1.7326131,
"\n": -1.8805203,
" the": -1.9610744,
" access": -2.7664504
},
{
"\n": -0.029573089,
" ": -4.1331725,
"\n\n": -4.616098,
"</": -7.5522456,
" ": -7.807361
},
{
"\n": -0.013009035,
"-": -5.246068,
"*": -6.2495985,
" ": -6.570405,
" \u00a7\u00a7": -6.9500294
},
{
"1": -1.3277724,
"-": -0.9693186,
"\u2022": -1.1041319,
"*": -4.3544083,
"T": -5.119253
},
{
".": -0.06319551,
")": -2.8080018,
" -": -7.942205,
"-": -8.040881,
" ": -9.436379
},
{
" Secure": -1.4571579,
" Security": -1.6510513,
" A": -1.8732746,
" Enhanced": -2.6642444,
" Increased": -3.5049694
}
],
"text_offset": [
7,
9,
18,
22,
26,
27,
28,
29,
30,
31
]
},
"finish_reason": "length"
}
],
|
since we read and translate the chatcompletions output in the textcompletions endpoint, can't we just do the same for logprobs? @ishaan-jaff |
added support for transformed logprobs for TGI LLMs {
"id":"chatcmpl-8e87a54f-5cf7-401f-8ff4-e5d32c20c41a",
"object":"text_completion",
"created":1698797307.028908,
"model":"bigcode/starcoder",
"choices":[
{
"text":", I'm going to make you a sand",
"index":0,
"logprobs":{
"tokens":[
",",
" I",
"'m",
" going",
" to",
" make",
" you",
" a",
" s",
"and"
],
"token_logprobs":[
-2.2285156,
-2.734375,
-2.0957031,
-2.0917969,
-0.09429932,
-3.1132812,
-1.3203125,
-1.2304688,
-1.6201172,
-0.010292053
]
},
"finish_reason":"length"
}
],
"usage":"<Usage at 0x1231fd210> JSON":{
"completion_tokens":9,
"prompt_tokens":2,
"total_tokens":11
}
} |
this is done we added support for echo for HF TGI LLMs - here's how you can use it @Vinno97 from litellm import text_completion
response = text_completion(
model="huggingface/bigcode/starcoder",
prompt="good morning",
max_tokens=10, logprobs=10,
echo=True
) Here's the response - you can see the input prompt part of the log probs {
"id":"chatcmpl-3fc71792-c442-4ba1-a611-19dd0ac371ad",
"object":"text_completion",
"created":1698801125.936519,
"model":"bigcode/starcoder",
"choices":[
{
"text":", I'm going to make you a sand",
"index":0,
"logprobs":{
"tokens":[
"good",
" morning",
",",
" I",
"'m",
" going",
" to",
" make",
" you",
" a",
" s",
"and"
],
"token_logprobs":[
"None",
-14.96875,
-2.2285156,
-2.734375,
-2.0957031,
-2.0917969,
-0.09429932,
-3.1132812,
-1.3203125,
-1.2304688,
-1.6201172,
-0.010292053
]
},
"finish_reason":"length"
}
],
"usage":{
"completion_tokens":9,
"prompt_tokens":2,
"total_tokens":11
}
} |
docs on how to do this too: |
The Feature
Motivation, pitch
from @Vinno97
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: