Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

Interactive #47

Open
luca-saggese opened this issue May 6, 2023 · 11 comments
Open

Interactive #47

luca-saggese opened this issue May 6, 2023 · 11 comments
Labels
question Further information is requested

Comments

@luca-saggese
Copy link

luca-saggese commented May 6, 2023

I'm new to llm and llama but learning fast, I've wrote a small piece of code to chat via cli, but it seems to not follow the context (ie work in interactive mode).

import { LLM } from "llama-node";
import readline from "readline";
import { LLamaCpp } from "llama-node/dist/llm/llama-cpp.js";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
const saveSession = path.resolve(process.cwd(), "./tmp/session.bin");
const loadSession = path.resolve(process.cwd(), "./tmp/session.bin");

import path from "path";
const model = path.resolve(process.cwd(), "./ggml-vic7b-q4_1.bin"); 

const llama = new LLM(LLamaRS);
llama.load({ path: model });


var rl = readline.createInterface(process.stdin, process.stdout);
console.log("Chatbot started!");
rl.setPrompt("> ");
rl.prompt();
rl.on("line", async function (line) {
    const prompt = `A chat between a user and an assistant.
    USER: ${line}
    ASSISTANT:`;
    llama.createCompletion({
        prompt,
        numPredict: 128,
        temp: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
        feedPrompt: true,
        saveSession,
        loadSession,
    }, (response) => {
        if(response.completed) {
            process.stdout.write('\n'); 
            rl.prompt(); 
        } else {
            process.stdout.write(response.token);
        }  
    });
});

I'm missing something?

@hlhr202
Copy link
Member

hlhr202 commented May 6, 2023

@luca-saggese you need to maintain the context on the nodejs side. ie. you should maintain a list of chatting histories where every items of the list should not exceed the context length of your model. thats why llama-node also expose the tokenizer to node.js.

@luca-saggese
Copy link
Author

@hlhr202 thanks for the comment, where should I pass the context to the new query? within the prompt?

@hlhr202
Copy link
Member

hlhr202 commented May 7, 2023

@hlhr202 thanks for the comment, where should I pass the context to the new query? within the prompt?

yes, your prompt should be a string that compose chatting list. at the same time you also have to make sure it doesnt exceed the context length limit of the model

@luca-saggese
Copy link
Author

understood, and what is the point of saveSession and loadSession?

@hlhr202
Copy link
Member

hlhr202 commented May 7, 2023

understood, and what is the point of saveSession and loadSession?

#24

They are used for accelerating loading.

@hlhr202 hlhr202 added the question Further information is requested label May 7, 2023
@end-me-please
Copy link

@luca-saggese
i had great success using saveSession/loadSession for chatbots. (thanks for implementing it hlhr202 <3 it made everything so much easier)

Keeping a list of previous messages in every prompt (as he suggested) works, but is slow.

Instead, during startup, i call createCompletion (initial prompt) with feedPromptOnly and saveSession once. (can also copy the initial cache file to make future startup faster)

Every new message is added individually with feedPromptOnly, saveSession+loadSession

to get a bot response, just call without feedPromptOnly as usual

This is still limited by context length, with the added disadvantage that you can't clear old messages (takes a while to run into the 2048 token ctx limit tho)

also seems to improve "conversation memory" without extra cost of including more messages in the chat history

@end-me-please
Copy link

regarding the context length limit; rustformers/llm#77 might be related

@luca-saggese
Copy link
Author

luca-saggese commented May 10, 2023

@end-me-please thanks fo the help, here is a working version for anyone interested:

import { LLM } from "llama-node";
import readline from "readline";
import fs from "fs";
import { LLamaRS } from "llama-node/dist/llm/llama-rs.js";
import path from "path";

const sessionFile = path.resolve(process.cwd(), "./tmp/session.bin");
const saveSession = sessionFile;
const loadSession = sessionFile;
// remove old session
if(fs.existsSync(sessionFile)) fs.unlinkSync(sessionFile);


const model = path.resolve(process.cwd(), "./ggml-vic7b-q4_1.bin"); // ggml-vicuna-7b-1.1-q4_1.bin");

const llama = new LLM(LLamaRS);
llama.load({ path: model });

var rl = readline.createInterface(process.stdin, process.stdout);
console.log("Chatbot started!");
rl.setPrompt("> ");
rl.prompt();
let cnt = 0;
rl.on("line", async function (line) {
    // Here Passing our input text to the manager to get response and display response answer.
    const prompt = `USER: ${line}
                    ASSISTANT:`;
    llama.createCompletion({
        prompt: cnt ==0 ? 'A chat between a user and an assistant.\n\n' + prompt : prompt,
        numPredict: 1024,
        temp: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
        feedPrompt: true, //: cnt == 0,
        saveSession,
        loadSession,
    }, (response) => {
        if(response.completed) {
            process.stdout.write('\n'); 
            rl.prompt(); 
            cnt ++;
        } else {
            process.stdout.write(response.token);
        }  
    });
});

@ralyodio
Copy link

can we make it so previous prompts are part of an array? Otherwise it would continuously show the entire history with every response.

@CodeJjang
Copy link

@end-me-please @luca-saggese I can't make it work.
I am calling:

llama.load(config).then(() => {
    return llama.createCompletion({
      nThreads: 4,
      nTokPredict: 2048,
      topK: 40,
      topP: 0.1,
      temp: 0.8,
      repeatPenalty: 1,
      prompt: instructions,
      feedPrompt: true,
      feedPromptOnly: true,
      saveSession,
      loadSession
    }, (resp) => {console.log(resp)})
  }).then(() => console.log('Finished init llm'))

Two weird things:

  1. No session file created
  2. Why is the callback of "console.log(resp)" being called if feedPromptOnly is true (i.e. shouldn't do inference)?

And then:

    const resp = await llama.createCompletion({
      nThreads: 4,
      nTokPredict: 2048,
      topK: 40,
      topP: 0.1,
      temp: 0.8,
      repeatPenalty: 1,
      prompt,
      loadSession
    }, (cbResp) => {process.stdout.write(cbResp.token);})

The first prompt that I fed is completely ignored...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants