-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement basic auto-context generation #115
Conversation
I don't think auto-tokens is useful yet because it's not prioritizing the files, and it fills-in a ton of irrelevant items. This will change with embeddings, but for now I think the default should be auto-tokens=0. We could also (now) repurpose the feature-knapsack to downsize a diff or paths context that's too big. Or scale it down, add to prompt and send to gpt-3.5 for feedback on what to include. |
Something I realized is that might also be hurting usefulness is that the system doesn't know which files the user added, vs which were auto-added. Maybe we should list the files the user explicitly added when sending the user query, so it knows that's what the user is referring to or wants to focus on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of wish you'd split up your 1,2,3 instead of doing them all in one pr. I have a lot of comments on 3 but I think 1 and 2 are basically fine.
mentat/code_context.py
Outdated
_code_message: str | None = None | ||
_code_message_checksum: str | None = None | ||
|
||
async def _get_code_message_checksum(self, max_tokens: Optional[int] = None) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this function to be async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet no - I'm planning to make this thread-safe soon but it's not needed yet.
mentat/llm_api.py
Outdated
@@ -199,3 +199,57 @@ async def display_total_cost(self) -> None: | |||
await stream.send( | |||
f"Total session cost: ${self.total_cost:.2f}", color="light_blue" | |||
) | |||
|
|||
|
|||
# Copied from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd at least remove the print statements
mentat/code_context.py
Outdated
max_tokens: int, | ||
) -> list[CodeFile]: | ||
git_root = GIT_ROOT.get() | ||
# Generate all possible permutations for all files in the project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I don't understand what is happening here very well. To me a permutation means an ordering of something. Is this code generating every combination of CodeMessageLevel and diff/nodiff for every file? Instead of generating this list of tuples and then iterating over it can we have a nested for loop? Something like:
levels = [CodeMessageLevel.CODE, CodeMessageLevel.FILE_NAME]
if not self.settings.no_code_map:
levels += [CodeMessageLevel.CMAP_FILL, CodeMessageLevel.CMAP]
diffs = [None]
if self.diff_context.target and path in self.diff_context.files:
diffs += [self.diff_context.target]
for level in levels:
for diff in diffs:
candidate_features.append(CodeFile(path, level=level, diff=diff)
mentat/code_context.py
Outdated
if feature.diff is not None: | ||
score += 1 | ||
if feature.level == CodeMessageLevel.FILE_NAME: | ||
score += 0.1 | ||
if feature.level == CodeMessageLevel.CMAP: | ||
score += 0.25 | ||
elif feature.level == CodeMessageLevel.CMAP_FULL: | ||
score += 0.5 | ||
elif feature.level == CodeMessageLevel.CODE: | ||
score += 0.75 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm understanding correctly this algorithm really wants to include the cmap and really doesn't want to include the full file? I suppose that makes sense since the cmap will be far fewer tokens and is only weighted half.
mentat/code_context.py
Outdated
continue | ||
if _longer_feature_already_included(feature, all_features): | ||
continue | ||
to_replace = _shorter_features_already_included(feature, all_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place this is used and this algorithm could include at most one other shorter feature right? Maybe this function could be _remove_shorter_feature
and do the below modification itself? It would save a for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now that's right, thought in the future I expect a single feature (full-file code) could replace multiple (intervals or individual functions).
Also we need to adjust the running token balance (line 327). I think that'd involve either passing tokens to the helper func (not ideal) or re-counting the entire spliced feature set afterwards, which seems also not ideal.
mentat/code_context.py
Outdated
if to_replace: | ||
for f in to_replace: | ||
f_index = all_features.index(f) | ||
reclaimed_tokens = await all_features[f_index].count_tokens(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count_tokens
may be called for a single feature 3 times in this function right? Should we cache it?
mentat/code_file.py
Outdated
code_file_manager = CODE_FILE_MANAGER.get() | ||
parser = PARSER.get() | ||
|
||
file_message: list[str] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe change this variable name to code_message
if _max_auto == 0 or _max_user == 0: | ||
self.features = features | ||
else: | ||
auto_tokens = _max_auto if _max_user is None else min(_max_auto, _max_user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate this pattern and did something similar (but worse) here. I wonder if we should define our own safe_min
and safe_max
functions in util that return the min/max non None parameter and have sensible default returns like 0/infinity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha why do you hate it? Seems concise, and does something pretty specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's fine. I guess I just want min/max to accept None
values and be sane about it. But it's probably better that they don't as that could mask other bugs.
) | ||
_max_auto = max(0, max_tokens - include_feature_tokens) | ||
_max_user = self.settings.auto_tokens | ||
if _max_auto == 0 or _max_user == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this check is supposed to accomplish very well. If _max_auto
is 0 doesn't that mean features
has too many tokens and shouldn't be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We return include_features
regardless of whether it's over max_tokens
, and if it's too big we handle the error in Conversation
.
I ran The first time I asked to explain the changes in the PR and it listed some of them but didn't mention the auto context at all. When I asked for more it got confused and kept saying it couldn't see files as a language model 🤦 The second conversation I directly asked about the changes in I haven't used the 32k model much so maybe it's just less reliable, or maybe our message ordering is confusing it. First Conversation
Second Conversation
|
That's good feedback! In hindsight that would've been a lot simpler. I'm realizing that the 'algorithm' I put together here doesn't make sense without Embeddings, and is doing some awfully specific calculations which are unproven. For this PR it'd be much simpler to just follow the old pattern - if there's extra space try to include (1) code_maps with signatures, otherwise (2) code_maps, otherwise (3) just filenames. We can revisit ranking/scoring when there's more signal available. |
I've scaled back |
Makes sense, thanks! Sorry about the trouble, I believe I suggested implementing the algo before embeddings! |
mentat/conversation.py
Outdated
parser = PARSER.get() | ||
|
||
messages = self.messages.copy() | ||
|
||
code_file_manager.read_all_file_lines() | ||
# Rebuild code context with active code and available tokens | ||
tokens = await num_tokens_from_messages(messages, self.model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this new function num_tokens_from_messages
, only supports gpt 3.5 and 4 models - can we use the count_tokens function we use in other places? This one is probably more exact, I'm not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - so I'll get rid of num_tokens_from_messages, add a comment with the link for future reference, and set Conversation back to the way it was before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me now. Thanks for reducing the scope a little bit.
This PR is tasks 1-3 of #109 (quoting):
This is functional and ready to review, thought there'll probably be some changes after the #106 merge. Some things I'd like feedback on:
Interface
Right now, your entire context window (8192 tokens) minus 1000 tokens (response buffer) is automatically filled with code, code_maps & etc. This probably hurts performance in most cases, not to mention price. There some be some variable, maybe a cli arg, to control this. Some ideas are:
level
for auto-context features. The current (before this PR) behavior is equivalent tomentat <path> --auto-context-level=diff
: Include anything at or belowdiff
(ctags_full
,ctags
,file_name
).Display Context
The context ends up being
code_context.features: list[CodeFile]
, each with a path, interval, level and diff. How should we summarize this to the user?