Token context window reaches max too fast #126

TheWhiteWord · 2023-09-07T16:16:01Z

I Keep running out of tokens before managing to accomplish any "multiple" steps task. The program encounters errors for mistakes or, like in this case, it does not manage to pip install the libraries that are needed for the task.
I have looked into the documentation but I can not find a way to give admin power to the bot to install the pip it needs at any given time. Also, I can not find a way to extend the amount of tokens.
Can anyone help?
(Model=Llama2 7B Large)

Edit: (it turns out that it actually manages to pip install the libraries but still crashes from going over the amount of the token. My intuition is that it keeps making mistakes and trying again until it manages to do it, but by doing so it runs out of tokens)

Edit2:(I have managed to extend the context window by manually modifying the number of tokens on the Llama2.py file. However (FEATURE REQUEST) would be good to be able to do this inside the program itself. But maybe I am just too noob to know how to)
also
(By testing further it seems that the problem of the utilization of the token may also be connected to the fact that when there is an error the model often repeats the error text multiple times over in the same prompt, eventually creating a long list of the same repetition that eats up the allowance. I can paste examples if you think is a correct intuition and may be helpful)

EXAMPLE OF THE ISSUE:

find the latest 10 BBC news from their feed and print them.

curl -s https://www.bbc.com/news/rss.xml | xmllint --format - | grep "<title>" | head -n 10 | sed 's/<[^>]*>//g'

Would you like to run this code? (y/n)
y

curl -s https://www.bbc.com/news/rss.xml | xmllint --format - | grep "<title>" | head -n 10 | sed 's/<[^>]*>//g'

Microsoft Windows [Version 10.0.22621.2134]
(c) Microsoft Corporation. All rights reserved.
(open-interpreter) C:\Users\DCinz>curl -s https://www.bbc.com/news/rss.xml | xmllint --format - | grep "<title>" |
head -n 10 | sed 's/<[^>]*>//g'
'xmllint' is not recognized as an internal or external command,
operable program or batch file.
(open-interpreter) C:\Users\DCinz>

it appears that you are using Windows 10 and have xmllint installed on your system. However, the command is not
recognized as a valid internal or external command.

To fix this issue, you can try installing xmllint by following these steps:

1 Open the Command Prompt as an administrator. To do this, right-click on the Start button and select "Command
Prompt (Admin)".
2 Type the following command and press Enter:

choco install xmllint

Would you like to run this code? (y/n)
y

choco install xmllint

Traceback (most recent call last):
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\site-packages\interpreter\code_interpreter.py", line 158,
in run
code = self.add_active_line_prints(code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\site-packages\interpreter\code_interpreter.py", line 249,
in add_active_line_prints
return add_active_line_prints_to_python(code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\site-packages\interpreter\code_interpreter.py", line 411,
in add_active_line_prints_to_python
tree = ast.parse(code)
^^^^^^^^^^^^^^^
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\ast.py", line 50, in parse
return compile(source, filename, mode, flags,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1
choco install xmllint
^^^^^^^
SyntaxError: invalid syntax

it appears that you are using Windows 10 and have xmllint installed on your system. However, the command is not
recognized as a valid internal or external command.

To fix this issue, you can try

how do i give you administrator power?

llama_tokenize_with_model: too many tokens
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\DCinz\anaconda3\envs\open-interpreter\Scripts\interpreter.exe_main.py", line 7, in
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\site-packages\interpreter\interpreter.py", line 104, in cli
cli(self)
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\site-packages\interpreter\cli.py", line 46, in cli
interpreter.chat()
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\site-packages\interpreter\interpreter.py", line 256, in chat
self.respond()
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\site-packages\interpreter\interpreter.py", line 469, in respond
for chunk in response:
File "C:\Users\DCinz\anaconda3\envs\Open-interpreter\Lib\site-packages\llama_cpp\llama.py", line 901, in _create_completion
raise ValueError(
ValueError: Requested tokens (1114) exceed context window of 1048

(open-interpreter) C:\Users\DCinz>

KillianLucas · 2023-09-07T21:51:44Z

Hey @TheWhiteWord, thanks for opening.

we may need to do some tricky stuff to solve this, but it's possible.

is there a way you envision these parameters being set? perhaps we should add a flag like interpreter --local --context-window 4096?

the problem with raising the context window:

small context window? super fast. doesn't hurt my computer. doesn't remember much
large context window? my computer is red hot. i cook eggs on it. it remembers the singularity

so we likely need to provide custom flags for both, or a wizard, same as we do for model params.
and perhaps we should, to solve the problem you're having, when it reaches max like this, run it again-- chopping off the top of the prompt and letting it keep going so it writes forever. (it's free so why not?)

TheWhiteWord · 2023-09-07T22:28:39Z

Hey @KillianLucas , the pleasure is mine. You and your team have been doing a great job. This is so exciting.

I think that setting the parameters on activation is definitely something with investigating. However, and this comes from a "user" point of view, (I do not have much tech knowledge of what is possible) if the parameter was able to be changed during the session through some command, that would allow one to adjust to the situation, as often is unpredictable if the issue is going to happen, or when.

1. ANSWER I totally see the problem with raising the token count too high... my PC did not like that too much (xd). A wizard is a good solution to avoid people pushing the limits too much.

2. ANSWER Still I think the proper solution is indeed to find a compromise and probably chop off the top of the prompt, even thou my instinct tells me that it may cause "memory" problems, but again I have low tech knowledge. I wonder if there is some clever way for it to recognize when the repetitions are identical, and over a certain amount, and force it to sum them up as one. In fact if the repetitions are identical means that no progress was made, and therefore the information can be reduced.

Anyhow, I am sure you know way better than me what can and can't be done. I mean you built the whole thing, and that in itself is a great accomplishment. Well done! And thanks for replying so fast... i guess you are getting no sleep for the next few weeks XD GOOD LUCK!

merlinfrombelgium · 2023-09-08T13:53:52Z

Adding to this conversation, I changed the tokentrim code to assign a max token size of 4096 to code-llama
The tokentrim.py file is in your /Lib/site-packages/tokentrim folder, or you can follow the crumb trail if you're using VS Code, like me, by right clicking the tt in interpreter.py and selecting Go to References 🤓

As the coder(s) of tokentrim mention in tokentrim.py
(edit: I've just realized @KillianLucas is the author of tokentrim as well 😊)

'code-llama': 1048, # I'm not sure this is correct.

they're not sure of the token size for this model. Referring to the CodeLlama model card in their white paper, its models

were fine-tuned with up to 16K tokens, and support up to 100K tokens at inference time.

(Read the white paper about issues when surpassing 16K though.)

So I went with 4096 and put that in tokentrim AND in interpreter.py

interpreter.py

    if self.local:
      messages = tt.trim(self.messages, max_tokens=8192, system_message=system_message)

tokentrim.py

MODEL_MAX_TOKENS = {
  'gpt-4': 8192,
  'gpt-4-0613': 8192,
  'gpt-4-32k': 32768,
  'gpt-4-32k-0613': 32768,
  'gpt-3.5-turbo': 4096,
  'gpt-3.5-turbo-16k': 16384,
  'gpt-3.5-turbo-0613': 4096,
  'gpt-3.5-turbo-16k-0613': 16384,
  'code-llama': 4096, # I'm not sure this is correct.
}

I now got passed that earlier-than-expected boundary and am running into some other issues - of course 😁

slumberling · 2023-09-08T15:17:10Z

@merlinfrombelgium Thank you! Your solution works for me. Max context is also hard-coded at line 176 in /Lib/site-packages/interpreter/llama_2.py. This file also needs to be updated with the correct value to make it work.

llama_2.py

llama_2 = Llama(model_path=model_path, n_gpu_layers=n_gpu_layers, verbose=False, n_ctx=4096) # n_ctx = context window. smaller is faster

brunoboto96 · 2023-09-08T22:47:30Z

Tried changing the tokens on these 3 files. But for me, it always stops at around a certain char count.
Running 32B Medium on macOS M1, tried 13B medium as well.

These mods make no difference in my case, it keeps repeating the same code over and over:

▌ Model set to Code-Llama

Open Interpreter will require approval before running code. Use interpreter -y to bypass this.

Press CTRL-C to exit.

> write a pygame and store in game.py




  import pygame

  pygame.init()

  # Set up the window
  screen = pygame.display.set_mode((800, 600))
  pygame.display.set_caption("My Game")

  # Load images
  player_image = pygame.image.load("player.png").convert_alpha()
  enemy_image = pygame.image.load("enemy.png").convert_alpha()
  background_image = pygame.image.load("background.png").convert_alpha()

  # Set up the player and enemy
  player


>


▌ Entered debug mode

[
    {'role': 'user', 'content': 'write a pygame and store in game.py'},
    {
        'content': '\n```python\nimport pygame\n\npygame.init()\n\n# Set up the window\nscreen = pygame.display.set_mode((800, 600))\npygame.display.set_caption("My Game")\n\n# Load images\nplayer_image =
pygame.image.load("player.png").convert_alpha()\nenemy_image = pygame.image.load("enemy.png").convert_alpha()\nbackground_image = pygame.image.load("background.png").convert_alpha()\n\n# Set up the player and
enemy\nplayer',
        'function_call': {
            'parsed_arguments': {
                'code': 'import pygame\n\npygame.init()\n\n# Set up the window\nscreen = pygame.display.set_mode((800, 600))\npygame.display.set_caption("My Game")\n\n# Load images\nplayer_image =
pygame.image.load("player.png").convert_alpha()\nenemy_image = pygame.image.load("enemy.png").convert_alpha()\nbackground_image = pygame.image.load("background.png").convert_alpha()\n\n# Set up the player and
enemy\nplayer',
                'language': 'python'
            }
        }
    },
    {'role': 'user', 'content': '%debug'}
]

Another example

> Can you make me a simple pomodor app?

  I'll start by creating a simple pomodoro app that will help you stay focused and productive. Here is the plan:

   1 Create a new file called "pomodoro.py" in your current working directory.
   2 Write a function called "start_timer" that takes two arguments: the number of minutes for the work session, and the number of minutes for the break session.
   3 In the "start_timer" function, use the "time" module to create a timer that will run for the specified number of minutes.
   4 After the work session is complete

>```

zono50 · 2023-09-10T12:15:41Z

Are there instructions on how to do this with linux? I don't even see any of these files, or a libs directory. Seems like i get these limits every 3rd or 4th prompt.

twobob · 2023-09-10T22:49:17Z

its in interpreter there @zono50
sed -i 's/n_ctx=[0-9]\{4\}/n_ctx=4048/g' /usr/local/lib/python3.10/dist-packages/interpreter/llama_2.py

change it to the actual path

sed -i 's/n_ctx=[0-9]\{4\}/n_ctx=4048/g' ./interpreter/llama_2.py

or run it from where you were

zono50 · 2023-09-11T13:49:54Z

There's no llama_2.py file in my interpreter folder.

jordanbtucker · 2023-09-11T16:14:15Z

The code was refactored recently, so it's in get_hf_llm.py now.

twobob · 2023-09-12T16:58:00Z

yeah may bad. should still apply

krrishdholakia · 2023-09-15T22:02:54Z

cc: @ishaan-jaff feels relevant to your token trimmer work

adamxyang · 2023-09-19T11:51:48Z

I've also seen CodeLlama-34B repeating its own code when asking it to fix a bug, sometimes it even repeats simple things, there must be something wrong with open-interpreter's prompt chaining/processing

> hi
                                                                                                                                                        
  I'll start by asking you some questions about what you want me to do. Please answer them as best you can.                                             
                                                                                                                                                        
  What is the goal of this project? What problem are you trying to solve?                                                                               
                                                                                                                                                        
> I'm writing a tutorial for botorch, I'd like to optimise a toy 4D quadratic function using EI
                                                                                                                                                        
  I understand that you want me to optimize a toy 4D quadratic function using EI. Can you provide more details about the function? What are its         
  coefficients, and what is the range of values for each variable? Additionally, can you tell me what you mean by "EI"? Is it a specific algorithm or   
  a set of algorithms that you want me to use?                                                                                                          
                                                                                                                                                        
  Please let me know if there's anything else I need to know.                                                                                           
                                                                                                                                                        
> just artificial 4D quadratic toy function
                                                                                                                                                        
  I understand that you want me to optimize a toy 4D quadratic function using EI. Can you provide more details about the function? What are its         
  coefficients, and what is the range of values for each variable? Additionally, can you tell me what you mean by "EI"? Is it a specific algorithm or   
  a set of algorithms that you want me to use?                                                                                                          
                                                                                                                                                        
  Please let me know if there's anything else I need to know.                                                                                           
                                                                                                                                                        
> EI means expected improvement in botorch
                                                                                                                                                        
  I understand that you want me to optimize a toy 4D quadratic function using EI. Can you provide more details about the function? What are its         
  coefficients, and what is the range of values for each variable? Additionally, can you tell me what you mean by "EI"? Is it a specific algorithm or   
  a set of algorithms that you want me to use?                                                                                                          
                                                                                                                                                        
  Please let me know if there's anything else I need to know.                                                                                           
                                                                                                                                                        
> pick your own random coefficient
                                                                                                                                                        
  I understand that you want me to optimize a toy 4D quadratic function using EI. Can you provide more details about the function? What are its         
  coefficients, and what is the range of values for each variable? Additionally, can you tell me what you mean by "EI"? Is it a specific algorithm or   
  a set of algorithms that you want me to use?                                                                                                          
                                                                                                                                                        
  Please let me know if there's anything else I need to know.                                                                                           
                                                                                                                                                        
> pick your own range
                                                                                                                                                        
  I understand that you want me to optimize a toy 4D quadratic function using EI. Can you provide more details about the function? What are its         
  coefficients, and what is the range of values for each variable? Additionally, can you tell me what you mean by "EI"? Is it a specific algorithm or   
  a set of algorithms that you want me to use?                                                                                                          
                                                                                                                                                        
  Please let me know if there's anything else I need to know.                                                                                           
                                                                                                                                                        
> expected improvement (EI), do you know botorch?
                                                                                                                                                        
  I understand that you want me to optimize a toy 4D quadratic function using EI. Can you provide more details about the function? What are its         
  coefficients, and what is the range of values for each variable? Additionally, can you tell me what you mean by "EI"? Is it a specific algorithm or   
  a set of algorithms that you want me to use?                                                                                                          
                                                                                                                                                        
  Please let me know if there's anything else I need to know.                                                                                           
                                                                                                                                                        
> why are you repeating yourself
                                                                                                                                                        
  I understand that you want me to optimize a toy 4D quadratic function using EI. Can you provide more details about the function? What are its         
  coefficients, and what is the range of values for each variable? Additionally, can you tell me what you mean by "EI"? Is it a specific algorithm or   
  a set of algorithms that you want me to use?                                                                                                          
                                                                                                                                                        
  Please let me know if there's anything else I need to know.

ericrallen · 2023-10-25T13:24:36Z

I’m going to close this one as stale for now.

Feel free to reopen if there’s more to discuss with the latest version.

TheWhiteWord changed the title ~~Token context window~~ Token context window reaches max too fast Sep 7, 2023

TheWhiteWord mentioned this issue Sep 7, 2023

Issue when executing JavaScript in Windows PowerShell #133

Closed

jordanbtucker added the duplicate label Sep 8, 2023

This was referenced Sep 8, 2023

update MODEL_MAX_TOKENS for code-llama KillianLucas/tokentrim#2

Closed

Token size limit. I hit by a token limit of 1024 after two to three rounds of python run. How to increase it to 2048? #184

Closed

jordanbtucker removed the duplicate label Sep 8, 2023

This was referenced Sep 8, 2023

interpreter local: llama_tokenize_with_model: too many tokens #158

Closed

Automatically destroy context #161

Closed

jordanbtucker added the Bug Something isn't working label Sep 11, 2023

brunoboto96 mentioned this issue Sep 11, 2023

Llama does not finish... #30

Closed

ericrallen closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token context window reaches max too fast #126

Token context window reaches max too fast #126

TheWhiteWord commented Sep 7, 2023 •

edited

KillianLucas commented Sep 7, 2023

TheWhiteWord commented Sep 7, 2023

merlinfrombelgium commented Sep 8, 2023 •

edited

slumberling commented Sep 8, 2023

brunoboto96 commented Sep 8, 2023 •

edited

zono50 commented Sep 10, 2023 •

edited

twobob commented Sep 10, 2023 •

edited

zono50 commented Sep 11, 2023

jordanbtucker commented Sep 11, 2023

twobob commented Sep 12, 2023

krrishdholakia commented Sep 15, 2023

adamxyang commented Sep 19, 2023

ericrallen commented Oct 25, 2023

Token context window reaches max too fast #126

Token context window reaches max too fast #126

Comments

TheWhiteWord commented Sep 7, 2023 • edited

KillianLucas commented Sep 7, 2023

TheWhiteWord commented Sep 7, 2023

merlinfrombelgium commented Sep 8, 2023 • edited

interpreter.py

tokentrim.py

slumberling commented Sep 8, 2023

llama_2.py

brunoboto96 commented Sep 8, 2023 • edited

zono50 commented Sep 10, 2023 • edited

twobob commented Sep 10, 2023 • edited

zono50 commented Sep 11, 2023

jordanbtucker commented Sep 11, 2023

twobob commented Sep 12, 2023

krrishdholakia commented Sep 15, 2023

adamxyang commented Sep 19, 2023

ericrallen commented Oct 25, 2023

TheWhiteWord commented Sep 7, 2023 •

edited

merlinfrombelgium commented Sep 8, 2023 •

edited

brunoboto96 commented Sep 8, 2023 •

edited

zono50 commented Sep 10, 2023 •

edited

twobob commented Sep 10, 2023 •

edited