Streaming support? #345

ProjCRys · 2023-11-07T02:22:26Z

This could be a roadmap so that text output should be streaming as the llm generates the message or thought. A use case I can think for this is would be the implementation of TTS with shorter response time (TTS would speak every sentence generated).

Though this would have to refractor a lot of MemGPT's code as the LLM would generally have to output a JSON but I think this could be solved by having each functions be done by agents. One handles the thought, one handles the message (Both could be using streaming output), and the other would be function calling (The one that doesn't necessarily need text streaming as an output.)

This could also make it easier for developers to make the GUI with the model showing the users the live outputting of the LLMs

cpacker · 2023-12-02T00:02:47Z

This is definitely on the roadmap - it's a little tricky due to how we use structured outputs, but it's possible.

renatokuipers · 2023-12-15T11:58:08Z

If you take a look at (for example) LMstudio, there is a little snippet in there, that causes realtime text-streaming.

# Chat with an intelligent assistant in your terminal
from openai import OpenAI

# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")

history = [
    {"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
    {"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]

while True:
    completion = client.chat.completions.create(
        model="local-model", # this field is currently unused
        messages=history,
        temperature=0.7,
        stream=True,
    )

    new_message = {"role": "assistant", "content": ""}
    
    for chunk in completion:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
            new_message["content"] += chunk.choices[0].delta.content

    history.append(new_message)
    
    # Uncomment to see chat history
    # import json
    # gray_color = "\033[90m"
    # reset_color = "\033[0m"
    # print(f"{gray_color}\n{'-'*20} History dump {'-'*20}\n")
    # print(json.dumps(history, indent=2))
    # print(f"\n{'-'*55}\n{reset_color}")

    print()
    history.append({"role": "user", "content": input("> ")})

with in particular the part:

    new_message = {"role": "assistant", "content": ""}
    
    for chunk in completion:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
            new_message["content"] += chunk.choices[0].delta.content

    history.append(new_message)

Maybe this is a good start to get this implemented in memgpt.

I was already looking into it myself, but I can't seem to figure it out on my own I am afraid...

gavsgav · 2023-12-18T12:54:02Z

I have also played about with the streaming text. Each llm servers have slightly different approaches to this function, but the for loop is key to each. I think the best way to figure it out for each server is to play about with a stand alone script first. Follow the relevant servers docs and then once confirmed, test it out with memgpt.

spjcontextual · 2024-04-04T18:22:23Z

Have a similar issue here with vLLM. For now my work around might just be wait for a full generation by Mem and then do a fake delay which iterates over the assistant_message output and streams that back to my client.

cpacker added the roadmap Planned features label Dec 2, 2023

cpacker added roadmap Planned features and removed roadmap Planned features labels Feb 6, 2024

cpacker mentioned this issue Apr 16, 2024

feat: add streaming support for OpenAI-compatible endpoints #1262

Merged

3 tasks

cpacker closed this as completed in #1262 Apr 18, 2024

cpacker mentioned this issue Apr 29, 2024

feat: add token streaming to the MemGPT API #1280

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming support? #345

Streaming support? #345

ProjCRys commented Nov 7, 2023

cpacker commented Dec 2, 2023

renatokuipers commented Dec 15, 2023 •

edited

Loading

gavsgav commented Dec 18, 2023

spjcontextual commented Apr 4, 2024

Streaming support? #345

Streaming support? #345

Comments

ProjCRys commented Nov 7, 2023

cpacker commented Dec 2, 2023

renatokuipers commented Dec 15, 2023 • edited Loading

gavsgav commented Dec 18, 2023

spjcontextual commented Apr 4, 2024

renatokuipers commented Dec 15, 2023 •

edited

Loading