Skip to content

How to implement a multiple loop chat with memory(chat context) using onnx.genai and deepseek reasonning model #1312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
John0King opened this issue Mar 8, 2025 · 3 comments

Comments

@John0King
Copy link

I use follow code with the deepseek-r1-1b mode, but it not work well .
I start ask it 1+1=?
I continue ask it with "add 1 more" , and it start give me the result 2 only , and it lost the begin token <think>

using GeneratorParams generatorParams = new(model);
generatorParams.SetSearchOption("max_length", 4096);
using var tokenizerStream = tokenizer.CreateStream();

List<string> chatHistory = new List<string>();

ulong s = 0;
var sb = new StringBuilder();
do
{

    using var generator = new Generator(model, generatorParams);
    Console.Write("请输入提示词:");
    string prompt = Console.ReadLine()!;
    //var sequences = tokenizer.Encode($"<|begin▁of▁sentence|><|User|>{prompt}<|end▁of▁sentence|>\n<|Assistant|>");
    chatHistory.Add($"<|begin▁of▁sentence|><|User|>{prompt}<|end▁of▁sentence|>\n<|Assistant|>");
    //var sequences = tokenizer.EncodeBatch(chatHistory.ToArray());
    var sequences = tokenizer.Encode(string.Join('\n',chatHistory));
    generator.AppendTokenSequences(sequences);
    sb.Clear();
    sb.Append("<|begin▁of▁sentence|>");
    sb.Append("");
    while (!generator.IsDone())
    {
        //generator.ComputeLogits();
        generator.GenerateNextToken();
        var str = tokenizerStream.Decode(generator.GetSequence(s)[^1]);
        Console.Write(str);
        sb.Append(str);

    }
    sb.Append("<|end▁of▁sentence|>\n");
    chatHistory.Add(sb.ToString());
    Console.WriteLine();
    //s++;
}
while (true);
@natke
Copy link
Contributor

natke commented Mar 9, 2025

Hi @John0King,

Have a look at the snippet and see if that helps: https://onnxruntime.ai/docs/genai/howto/migrate.html#add-chat-mode-to-your-c-application-1

You don't need to re-create the generator each time around the loop and you only need to append the new prompt - the model takes care of the previous context.

Let us know how that goes!

@John0King
Copy link
Author

@natke it desn't help , the reson I use using var generator = new Generator(model, generatorParams);· inside the loop , is because it'll lose the reasonning start token <think> (and it doen't help to fix that, that'y why I asked here on github).

I'm looking for a example on how to create a openAI compatible webapi.

@John0King
Copy link
Author

John0King commented Mar 12, 2025

##the problem

Image

code

using OnnxRuntimeGenAIChatClient client = new OnnxRuntimeGenAIChatClient(new OnnxRuntimeGenAIChatClientOptions
{
    PromptFormatter = (prompt, context) =>
    {
        var sb = new StringBuilder();
        sb.Append("");
        sb.Append(string.Join("", prompt.Select(x => $"<|begin▁of▁sentence|><|{x.Role}|>{x.Text}<|end▁of▁sentence|>\n<|Assistant|>")));
        sb.Append("");
        return sb.ToString();
    },
}, model, false);
List<ChatMessage> chatMessage = new List<ChatMessage>();
do
{
    Console.WriteLine();
    Console.WriteLine("Prompt:");
    var prompt = Console.ReadLine();
    if (prompt == "exit")
    {
        break;
    }
    chatMessage.Add(new ChatMessage(ChatRole.User, prompt));
    List<ChatResponseUpdate> chatMessageUpdates = [];
    await foreach (var x in client.GetStreamingResponseAsync(chatMessage, new ChatOptions
    {
        MaxOutputTokens = 4096,
        AdditionalProperties = new() { { "max_length", 4096 } },
    }))
    {
        chatMessageUpdates.Add(x);
        Console.Write(x.ToString());
    }
    var resposne = chatMessageUpdates.ToChatResponse();
    chatMessage.Add(resposne.Message);
    Console.WriteLine();
}
while (true);

@John0King John0King changed the title How to implement a multiple loop chat with memory(chat context) using onnx.genai How to implement a multiple loop chat with memory(chat context) using onnx.genai and deepseek reasonning model Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants