Skip to content

Commit

Permalink
Replace EOS with newline to prevent context/memory being flushed by E…
Browse files Browse the repository at this point in the history
…OS in interactive mode (ggerganov#333)

* Improve interactive mode's coherence after EOS

Aims to improve coherence and ability to resume the interactive session when the user is given input back after an end of text token is reached.
Not sure what token 13 is or why it seems to help. See conversation for examples.

* Make newline token a constant

* dynamically determine newline token

* relocate previous newline token const

* cleanup whitespace

* print a new line on end of text in interactive

this may need to be looked into further when not using a reverse prompt

* only print manual newline with reverse prompt

fix formatting of reverse prompts so they don't end up at the end of the current line while not introducing unnecessary new lines otherwise

* alternate approach to replace end of text tokens

* Inject the reverse prompt again after eos in interactive mode

* tokenize reverse prompt when needed

makes this PR compatible with ggerganov/llama.cpp#330

* tokenize and inject only first reverse prompt

thanks to tjohnman

* tokenize first reverse prompt once

* add newline token

* add newline token

* tokenize/inject reverse prompt for refactor

this doesn't seem right though

* tokenize nothing for antiprompt if no reverse

* Update main.cpp

* Update main.cpp

* tokenize and inject reverse prompt as needed

this doesn't seem to work if the reverse prompt is tokenized outside earlier on

* not needed

* remove newline token

* remove newline token

* tokenize newline token

* add space to comment

* Update main.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Slaren <2141330+slaren@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
  • Loading branch information
3 people committed Mar 23, 2023
1 parent 20a1a4e commit 2e17dfd
Showing 1 changed file with 15 additions and 6 deletions.
21 changes: 15 additions & 6 deletions main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,9 @@ int main(int argc, char ** argv) {
params.interactive = true;
}

// determine newline token
auto llama_token_newline = ::llama_tokenize(ctx, "\n", false);

fprintf(stderr, "\n");
fprintf(stderr, "%s: prompt: '%s'\n", __func__, params.prompt.c_str());
fprintf(stderr, "%s: number of tokens in prompt = %zu\n", __func__, embd_inp.size());
Expand Down Expand Up @@ -359,6 +362,16 @@ int main(int argc, char ** argv) {
last_n_tokens.push_back(id);
}

// replace end of text token with newline token when in interactive mode
if (id == llama_token_eos() && params.interactive) {
id = llama_token_newline.front();
if (params.antiprompt.size() != 0) {
// tokenize and inject first reverse prompt
const auto first_antiprompt = ::llama_tokenize(ctx, params.antiprompt.front(), false);
embd_inp.insert(embd_inp.end(), first_antiprompt.begin(), first_antiprompt.end());
}
}

// add it to the context
embd.push_back(id);

Expand Down Expand Up @@ -451,12 +464,8 @@ int main(int argc, char ** argv) {

// end of text token
if (embd.back() == llama_token_eos()) {
if (params.interactive) {
is_interacting = true;
} else {
fprintf(stderr, " [end of text]\n");
break;
}
fprintf(stderr, " [end of text]\n");
break;
}

// In interactive mode, respect the maximum number of tokens and drop back to user input when reached.
Expand Down

0 comments on commit 2e17dfd

Please sign in to comment.