Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making /v1beta/chat/completions streaming output compatible with openai #1076

Closed
wsxiaoys opened this issue Dec 19, 2023 · 3 comments · Fixed by #1094
Closed

Making /v1beta/chat/completions streaming output compatible with openai #1076

wsxiaoys opened this issue Dec 19, 2023 · 3 comments · Fixed by #1094
Labels
enhancement New feature or request

Comments

@wsxiaoys
Copy link
Member

Please describe the feature you want

For now /v1beta/chat/completions generate streaming outputs as below (streamming json lines):

{"content":" In"}
{"content":" Python"}
{"content":","}
{"content":" you"}
{"content":" can"}
{"content":" convert"}
{"content":" a"}
{"content":" list"}
{"content":" of"}
{"content":" strings"}
{"content":" to"}
{"content":" numbers"}
{"content":" using"}
{"content":" the"}
{"content":" `"}
{"content":"map"}
{"content":"()"}
{"content":"`"}
{"content":" function"}
{"content":" and"}
{"content":" the"}
{"content":" `"}
{"content":"int"}
{"content":"()"}
{"content":"`"}
{"content":" function"}
{"content":"."}
{"content":" Here"}
{"content":"'"}
{"content":"s"}
{"content":" an"}
{"content":" example"}
{"content":":"}

We'd like to make the response format compatible with openai's text/event-stream streaming response.

Additional context

Discuss in slack: https://tabbyml.slack.com/archives/C05CWLZ0Y85/p1701451409878009

Code Location: https://github.com/TabbyML/tabby/blob/main/crates/tabby/src/routes/chat.rs#L39

llama.cpp's server example on text/event-stream: https://github.com/ggerganov/llama.cpp/blob/master/examples/server/server.cpp#L2775

@wsxiaoys wsxiaoys added the enhancement New feature or request label Dec 19, 2023
@heurainbow
Copy link

heurainbow commented Dec 21, 2023

Should consider following common cases (within a max output token limit):
1 max generate some lines, say 3
2 complete a function with some clear return signal w.r.t language
3 special tokens
Better to directly support vLLM backend in #795

@brian316
Copy link

can we setup openai chat with tabby? per docs doesnt show how https://tabby.tabbyml.com/docs/administration/model/#chat-model

@wsxiaoys
Copy link
Member Author

Hi - the http backend can be configured in following way to use openai chat

[model.chat.http]
kind = "openai/chat"
model_name = "<model name>"
api_endpoint = "https://api.openai.com/v1"
api_key = "secret-api-key"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants