-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add high-level code map to system prompt #66
Conversation
Super pumped for this! |
Code maps are minified based on some token limit. The minification process happens as the maps are generated in the event that we don't try to generate mapes if a user forgot to add some large directory to their .gitignore. The code map's will be created and minified in the following order: 1. Full map 2. No signatures 3. File-structure only 4. No map
For sessions where there aren't a bunch of files/code, the updated system prompt is pretty good at not suggesting edits to files it doesn't have in it's context (but does have in it's Code Map). It does start to break down as LOC and number of files grows however, and at a certain point it just keeps suggesting edits no matter what I try with the system prompt. I think a solution for this should be handled in a separate PR which does something like:
|
Here are some partial snippets of different sized code maps: Full Code Map
Code Map without signatures
Code map with file structure only
|
Fantastic! I'll review this tomorrow. Thank you for the detailed code map examples! |
Just looked over a lot of the PR; so far it looks great! Super excited to get this merged! Left a couple of comments on a few things, but nothing too massive; I also didn't have a lot of time so I didn't end up finishing reviewing the code_map.py file; I'll leave that up to @biobootloader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great new feature! I've left a few comments, here are some more general points:
Since there's a big change to the system prompt, we should run benchmarks with and without to make sure we don't lose performance. I can do that once we are closer to merging. I'll add this task to the PR description as well.
If it suggests edits to files it can't fully see, it'd be good to suggest the user adds them to the context. We'll soon have "commands" (#59) and a /add command to add a specific file. In this PR we can just warn the user that it tried to edit a file it couldn't see the code for and have them rerun with it added.
Let's make this feature optional as well, since big maps would add to api costs. We can have it be default on and have a command line flag to turn off? (For example if a user is editing an isolated file or two in a big codebase, they may want to turn it off).
mentat/conversation.py
Outdated
for message in messages: | ||
prompt_token_count += count_tokens(message["content"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should make self.messages
more than just a list of messages, but instead also store token counts for each message and/or a total, so we don't recalculate again and again?
don't have to make that change as part of this PR, we can make a separate issue for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering how much of a performance increase this would be? I don't think it's much work to set this up though so it's worth it to see if it helps a good amount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it may not really matter, but if done cleanly would be nice. Doesn't have to be this PR though
Agreed, let's do this in a follow-up PR |
For testing, lets add a benchmark (https://github.com/biobootloader/mentat/blob/main/tests/benchmark_test.py) that asks it to make change that it could only make successfully with the map. Set up some simple scenario where you only show it one file, and ask it to make a change to that file that requires a call to a function in another file, that it could see in the map. Since benchmarks don't run on Github actions it'd also be good to add some sanity check tests as well. Especially with things that we wouldn't notice if we accidentally broke, like which map is used with different token limits, etc. |
Thanks for the reviews! I agree w/ all of the suggestions/comments, will get to work on addressing everything today. |
@biobootloader I added a benchmark test that forces the model to import from a module in the code map that's outside the code files context. Also, I decided to revert the system prompt closer to what it was originally for a couple reasons:
|
Thanks for running the benchmarks, and I yeah I agree changing the system prompt is hard. More and better benchmarking would help us change it with more confidence. I like the call to keep it more similar to original. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 awesome!
Summary
This PR adds a high-level code map generated from ctags into the system prompt which gives the model more context on what other code/classes/methods/etc. it can use that haven't been directly added into the model context by the user.
This implementation takes inspiration from how Aider works with large codebases. Here's a writeup from the author of Aider on how it works.
Why this is needed
This work is a prerequisite for having Mentat automatically add new source code from files outside the context.
To-Do
Need to finish these before this PR is ready for review:
prevent mentat from suggesting edits for files which exist in the code map outside the model context]Prior to Merge