Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple response streaming #14

Closed
neon-sunset opened this issue May 22, 2023 · 6 comments · Fixed by #18
Closed

Simple response streaming #14

neon-sunset opened this issue May 22, 2023 · 6 comments · Fixed by #18

Comments

@neon-sunset
Copy link
Collaborator

neon-sunset commented May 22, 2023

Problem Statement

As of now, it can take more than 60 seconds to get final response if the user query requires BingChat to search for information or conduct any other intermediate actions

Proposed Change

Adjust API surface to allow the user to either stream intermediate response state or opt in to await the final response.

Discussion

Because Bing Chat implementation is world-class, it does not bother with trivialities such as efficiency, saving bandwidth or having an event type to send response parts over websocket (and cancel previous output where BC refuses to honor user prompt) and instead just re-sends the whole thing instead (see below).

This creates an issue where we can't just expose IAsyncEnumerable<string> because it can confuse the users of the library by sending the same message with extra words appended, making for cumbersome user experience. Thus, I'm creating this issue for feedback while I'm trying out a couple of prototypes to encapsulate the response into a public type ConversationResponse (placeholder) so that the users can either explicitly .GetCompletedResponse() or .ReadIntermediateMessages() (also placeholders).

Ideally, I want to make it more user-friendly than what WSS API exposes by detecting whether we're still receiving the continuations of the previous response message or if BingChat decided to completely rewrite it so that the user can just Console.Write(msg) until it is done.

Let me know what you think.

Example WSS output (cut out for brevity):

t":"Generating answers for you...","hiddenText":"Generating answers for you...","
18:32:33.849
{"type":1,"target":"update","arguments":[{"cursor":{"j":"$['cb8f7fa8-4055-4452-9a
18:32:33.858
{"type":1,"target":"update","arguments":[{"messages":[{"text":"The","author":"bot
18:32:33.864
{"type":1,"target":"update","arguments":[{"messages":[{"text":"The weather","auth
18:32:33.864
{"type":1,"target":"update","arguments":[{"messages":[{"text":"The weather in","a
18:32:33.864
{"type":1,"target":"update","arguments":[{"messages":[{"text":"The weather in Ky"
@Caulm
Copy link
Contributor

Caulm commented May 23, 2023

Thanks for creating the discussion. I am also concerned about the problem of waiting too long for a complete answer.

I agree with the idea @neon-sunset provided, which allowing users to explicitly choose to receive a complete response or intermediate message. Here is my understanding. GetCompletedResponse is an asynchronous method, which users should wait for a complete answer to be returned. ReadIntermediateMessages is a synchronization method that users may call as many times as they wish to update the latest status.

I wonder if we can combine the two together. Based on the existing IBingChattable interface, add an optional parameter as a callback function to the AskAsync method. The definition is as follows

public interface IBingChattable
{
    /// <summary>
    /// Ask for a answer.
    /// </summary>
    /// <param name="message">Question content</param>
    /// <param name="callback">Callback function for streaming response</param>
    Task<string> AskAsync(string message, Action<string>? callback = null);
}

In this way, user provides a callback function, and the client passes the intermediate message (containing the previous part) to the user. Then user decides how to make updates, either by overwriting previous contents, or picking out the new parts and writing it.

Example codes for different users are as follows

//For single response
string answer = await conversation.AskAsync(prompt);
Console.WriteLine(answer);

//For streaming response - example 1
int lastCursorTop = Console.CursorTop;
string answer = await conversation.AskAsync(prompt, (string message) =>
{
    //update
    Console.SetCursorPosition(0, lastCursorTop);
    Console.Write(message);
});
//final answer
Console.SetCursorPosition(0, lastCursorTop);
Console.WriteLine(answer);

//For streaming response - example 2
int lastCursorTop = Console.CursorTop;
int lastMessageLength = 0;
string answer = await conversation.AskAsync(prompt, (string message) =>
{
    //update
    Console.Write(message[lastMessageLength..]);
    lastMessageLength = message.Length;
});
//final answer
Console.SetCursorPosition(0, lastCursorTop);
Console.WriteLine(answer);

What do you think of this?

@neon-sunset
Copy link
Collaborator Author

neon-sunset commented May 23, 2023

Ah, I think I did not explain it in full detail. The intention is to expose an object that represents a response state but it won't have any synchronization methods - it will be internally filled by websocket listener. The user will have the ability to either stream the intermediate messages with IAsyncEnumerable<string>-returning method or simply wait for the complete response instead .GetCompletedResponse().

As for JS-style callbacks, I'm strictly opposed to their usage in C# unless absolutely necessary and would prefer to keep AskAsync as simple as possible. The design does not assume whether it is used in CLI or as a component of mobile application, or anywhere else.

Ideally, it could be used as follows:

var response = client.Ask("Latest stock price of NVDA?");
await foreach (var msg in response)
{
   // Append msg to however the response is displayed to the user
}

var completed = await response.GetFullResponse();
// Do whatever is appropriate to the context, like replacing previous output, doing nothing or something else entirely

with API probably (would like to know your point of view) looking like this:

class BingChatClientResponse : IAsyncEnumerable<string> // Shorter name?
{
   public Task<string> GetFullResponse(CancellationToken ct = default);
   
   public IAsyncEnumerator<string> GetAsyncEnumerator(CancellationToken ct = default);
}

This would also allow extending this class in the future when adding support for generating images, enumerating references used or providing other kinds of rich metadata.

@bsdayo
Copy link
Owner

bsdayo commented May 23, 2023

Maybe we can introduce another API?

For example:

public Task<BingChatResponse> SendAsync(string message);


class BingChatResponse : IAsyncEnumerable<string>
{
    public IAsyncEnumerator<string> GetAsyncEnumerator();

    public Task<string> GetCompletedResponseAsync();
}

Then the AskAsync can simply be a wrapper for SendAsync.

@neon-sunset
Copy link
Collaborator Author

neon-sunset commented May 23, 2023

Thank you. Yes, I agree with this idea.

One question: why append Async suffix even though there is no synchronous alternative (i.e. SendAsync is the same the HttpClient has, but then GetCompletedResponseAsync is uhh, a long method name? :D

Anyways, I'll get back to working on this, will submit a PR soon.

@bsdayo
Copy link
Owner

bsdayo commented May 23, 2023

Thank you. Yes, I agree with this idea.

One question: why append Async suffix even though there is no synchronous alternative (i.e. SendAsync is the same the HttpClient has, but then GetCompletedResponseAsync is uhh, a long method name? :D

Anyways, I'll get back to working on this, will submit a PR soon.

Then maybe GetFullResponseAsync, GetCompletedAnswerAsync or GetFullAnswerAsync? :D

I think we should add the Async suffix to fit the .NET naming convention. I think the reasons are as follows:

  • Microsoft's documentation clearly states that publicly exposed asynchronous methods in .NET need to have the Async suffix.
  • Having the Async suffix clearly reflects that this is an asynchronous method without causing confusion, as this is the default naming convention for most official, or third-party libraries.
  • Even though we don't provide a synchronous method directly, users can write their own extension method to convert it to synchronous. This extension method can then be simply named without Async. As an example, Microsoft.Extensions.Hosting does this (IHost.StartAsync() and HostExtensions.Start()).

Anyways look good to me, thanks!

@neon-sunset
Copy link
Collaborator Author

neon-sunset commented May 23, 2023

(this is a tangent and a personal opinion)

Microsoft themselves do not follow the convention often. It is also outdated (pay attention the examples that start with events) and does not reflect modern state of the language. It was initially introduced in the early days of C# when a lot of code was synchronous and/or the distinction was necessary for methods that exposed both async and sync paths.

However, a lot of new code that deals with IO or just long-running operations often is async-only, and it is at this point strange to expect the developer to be unfamiliar with Task return type, especially given the popularity of the pattern in other languages such as JS/TS (Future/Promise), Rust (Future) and even Python (which too supposedly has await now?).

This is mostly a rant against five kilometers long method and type names in C# which will unfortunately continue plaguing it now that GPT-4 has also been trained on the swathes of poorly written C# code :(

It is probably best to just use your own judgement (like when to use a struct) than to strictly follow what the old docs say.

@neon-sunset neon-sunset changed the title Response streaming Simple response streaming May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants