Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add high-level code map to system prompt #66

Merged
merged 19 commits into from
Aug 27, 2023

Conversation

waydegg
Copy link
Contributor

@waydegg waydegg commented Aug 19, 2023

Summary

This PR adds a high-level code map generated from ctags into the system prompt which gives the model more context on what other code/classes/methods/etc. it can use that haven't been directly added into the model context by the user.

This implementation takes inspiration from how Aider works with large codebases. Here's a writeup from the author of Aider on how it works.

Why this is needed

This work is a prerequisite for having Mentat automatically add new source code from files outside the context.

To-Do

Need to finish these before this PR is ready for review:

  • create a 'minified' code map that fits into the model context if the full code map is too large]
  • prevent mentat from suggesting edits for files which exist in the code map outside the model context]
  • add a token limit to the code map, filter out less relevant ctags/files as needed
  • update test cases
  • handle user not having ctags installed

Prior to Merge

  • @biobootloader: run benchmarks with and without, to determine if system prompt changes degrade performance on other tasks.

@biobootloader
Copy link
Member

Super pumped for this!

Code maps are minified based on some token limit. The
minification process happens as the maps are generated in the event that
we don't try to generate mapes if a user forgot to add some large
directory to their .gitignore.

The code map's will be created and minified in the following order:

1. Full map
2. No signatures
3. File-structure only
4. No map
@waydegg
Copy link
Contributor Author

waydegg commented Aug 20, 2023

For sessions where there aren't a bunch of files/code, the updated system prompt is pretty good at not suggesting edits to files it doesn't have in it's context (but does have in it's Code Map). It does start to break down as LOC and number of files grows however, and at a certain point it just keeps suggesting edits no matter what I try with the system prompt.

I think a solution for this should be handled in a separate PR which does something like:

  • Get a list of files to add/edit/delete at the beginning of the response
  • As the response/file list is streamed in, check if the files have been added into the context
  • If they haven't been added, stop the response

@waydegg waydegg marked this pull request as ready for review August 20, 2023 21:53
@waydegg
Copy link
Contributor Author

waydegg commented Aug 20, 2023

Here are some partial snippets of different sized code maps:

Full Code Map

mentat/__main__.py

mentat/app.py
        function
                expand_paths (paths: Iterable[str])
                get_user_feedback_on_changes ( config: ConfigManager, conv: Conversation, user_input_manager: UserInputManager, code_file_manager: CodeFileManager, code_changes: Iterable[CodeChange], )
                loop ( paths: Iterable[str], exclude_paths: Optional[Iterable[str]], cost_tracker: CostTracker, )
                run (paths: Iterable[str], exclude_paths: Optional[Iterable[str]] = None)
                run_cli ()
                user_filter_changes ( user_input_manager: UserInputManager, code_changes: Iterable[CodeChange] )

mentat/change_conflict_resolution.py
        function
                resolve_insertion_conflicts ( changes: list[CodeChange], user_input_manager: UserInputManager, code_file_manager )
                resolve_non_insertion_conflicts ( changes: list[CodeChange], user_input_manager: UserInputManager )

mentat/code_change.py
        CodeChange
                member
                        __init__ ( self, json_data: dict, code_lines: list[str], code_file_manager, )
                        __lt__ (self, other)
                        apply (self, cur_file_lines: list[str])
        CodeChangeAction
                member
                        has_additions (self)
                        has_removals (self)
                        has_surrounding_lines (self)
                variable
                        CreateFile
                        Delete
                        DeleteFile
                        Insert
                        Replace
        class
                CodeChange
                CodeChangeAction

Code Map without signatures

mentat/__main__.py

mentat/app.py
        function
                expand_paths
                get_user_feedback_on_changes
                loop
                run
                run_cli
                user_filter_changes

mentat/change_conflict_resolution.py
        function
                resolve_insertion_conflicts
                resolve_non_insertion_conflicts

mentat/code_change.py
        class
                CodeChange
                CodeChangeAction
        member
                __init__
                __lt__
                apply
                has_additions
                has_removals
                has_surrounding_lines
        variable
                CreateFile
                Delete
                DeleteFile
                Insert
                Replace

Code map with file structure only

.gitignore
LICENSE
README.md
docs
        configuration.md
mentat
        __init__.py
        __main__.py
        app.py
        change_conflict_resolution.py
        code_change.py
        code_change_display.py
        code_file_manager.py
        code_map.py
        config_manager.py
        conversation.py
        default_config.json
        errors.py
        git_handler.py
        llm_api.py
        logging_config.py
        parsing.py
        prompts.py
        streaming_printer.py
        user_input_manager.py
pyproject.toml
requirements.txt
setup.py

@biobootloader
Copy link
Member

Fantastic! I'll review this tomorrow. Thank you for the detailed code map examples!

mentat/conversation.py Outdated Show resolved Hide resolved
mentat/conversation.py Outdated Show resolved Hide resolved
mentat/prompts.py Outdated Show resolved Hide resolved
mentat/prompts.py Outdated Show resolved Hide resolved
mentat/code_map.py Outdated Show resolved Hide resolved
mentat/code_map.py Outdated Show resolved Hide resolved
mentat/code_map.py Outdated Show resolved Hide resolved
@PCSwingle
Copy link
Member

Just looked over a lot of the PR; so far it looks great! Super excited to get this merged! Left a couple of comments on a few things, but nothing too massive; I also didn't have a lot of time so I didn't end up finishing reviewing the code_map.py file; I'll leave that up to @biobootloader

Copy link
Member

@biobootloader biobootloader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great new feature! I've left a few comments, here are some more general points:

Since there's a big change to the system prompt, we should run benchmarks with and without to make sure we don't lose performance. I can do that once we are closer to merging. I'll add this task to the PR description as well.

If it suggests edits to files it can't fully see, it'd be good to suggest the user adds them to the context. We'll soon have "commands" (#59) and a /add command to add a specific file. In this PR we can just warn the user that it tried to edit a file it couldn't see the code for and have them rerun with it added.

Let's make this feature optional as well, since big maps would add to api costs. We can have it be default on and have a command line flag to turn off? (For example if a user is editing an isolated file or two in a big codebase, they may want to turn it off).

mentat/code_map.py Outdated Show resolved Hide resolved
mentat/conversation.py Outdated Show resolved Hide resolved
Comment on lines 66 to 67
for message in messages:
prompt_token_count += count_tokens(message["content"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should make self.messages more than just a list of messages, but instead also store token counts for each message and/or a total, so we don't recalculate again and again?

don't have to make that change as part of this PR, we can make a separate issue for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how much of a performance increase this would be? I don't think it's much work to set this up though so it's worth it to see if it helps a good amount.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it may not really matter, but if done cleanly would be nice. Doesn't have to be this PR though

mentat/conversation.py Outdated Show resolved Hide resolved
mentat/conversation.py Outdated Show resolved Hide resolved
@biobootloader
Copy link
Member

For sessions where there aren't a bunch of files/code, the updated system prompt is pretty good at not suggesting edits to files it doesn't have in it's context (but does have in it's Code Map). It does start to break down as LOC and number of files grows however, and at a certain point it just keeps suggesting edits no matter what I try with the system prompt.

I think a solution for this should be handled in a separate PR which does something like:

  • Get a list of files to add/edit/delete at the beginning of the response
  • As the response/file list is streamed in, check if the files have been added into the context
  • If they haven't been added, stop the response

Agreed, let's do this in a follow-up PR

@biobootloader
Copy link
Member

For testing, lets add a benchmark (https://github.com/biobootloader/mentat/blob/main/tests/benchmark_test.py) that asks it to make change that it could only make successfully with the map. Set up some simple scenario where you only show it one file, and ask it to make a change to that file that requires a call to a function in another file, that it could see in the map.

Since benchmarks don't run on Github actions it'd also be good to add some sanity check tests as well. Especially with things that we wouldn't notice if we accidentally broke, like which map is used with different token limits, etc.

@waydegg
Copy link
Contributor Author

waydegg commented Aug 23, 2023

Thanks for the reviews! I agree w/ all of the suggestions/comments, will get to work on addressing everything today.

@waydegg
Copy link
Contributor Author

waydegg commented Aug 25, 2023

@biobootloader I added a benchmark test that forces the model to import from a module in the code map that's outside the code files context.

Also, I decided to revert the system prompt closer to what it was originally for a couple reasons:

  • Since we aren't trying to prevent edits to files not added in CodeFiles for this PR, we shouldn't introduce that logic into the system prompt just yet
  • There was some variability in the benchmark tests passing/failing with the new system prompt after I ran the benchmarks several times back-to-back. This is to be expected since we have temperature set to a non-zero value and while the benchmarks on the main branch pass/fail with some variance, there's less of a chance of introducing any regressions by keeping the closer-to-original system prompt

@biobootloader
Copy link
Member

Thanks for running the benchmarks, and I yeah I agree changing the system prompt is hard. More and better benchmarking would help us change it with more confidence. I like the call to keep it more similar to original.

@biobootloader
Copy link
Member

Looks great! This will be ready to merge after addressing my one comment about letting the user know which kind of map (level of detail) is being used.

I've added two follow up issues to add an additional benchmark and test after this is merged: #76 , #77

Copy link
Member

@biobootloader biobootloader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 awesome!

@biobootloader biobootloader merged commit 5b8e25d into AbanteAI:main Aug 27, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants