Add high-level code map to system prompt #66

waydegg · 2023-08-19T00:33:45Z

Summary

This PR adds a high-level code map generated from ctags into the system prompt which gives the model more context on what other code/classes/methods/etc. it can use that haven't been directly added into the model context by the user.

This implementation takes inspiration from how Aider works with large codebases. Here's a writeup from the author of Aider on how it works.

Why this is needed

This work is a prerequisite for having Mentat automatically add new source code from files outside the context.

To-Do

Need to finish these before this PR is ready for review:

create a 'minified' code map that fits into the model context if the full code map is too large]
~~prevent mentat from suggesting edits for files which exist in the code map outside the model context]~~
add a token limit to the code map, filter out less relevant ctags/files as needed
update test cases
handle user not having ctags installed

Prior to Merge

@biobootloader: run benchmarks with and without, to determine if system prompt changes degrade performance on other tasks.

biobootloader · 2023-08-19T23:04:59Z

Super pumped for this!

Code maps are minified based on some token limit. The minification process happens as the maps are generated in the event that we don't try to generate mapes if a user forgot to add some large directory to their .gitignore. The code map's will be created and minified in the following order: 1. Full map 2. No signatures 3. File-structure only 4. No map

waydegg · 2023-08-20T21:48:45Z

For sessions where there aren't a bunch of files/code, the updated system prompt is pretty good at not suggesting edits to files it doesn't have in it's context (but does have in it's Code Map). It does start to break down as LOC and number of files grows however, and at a certain point it just keeps suggesting edits no matter what I try with the system prompt.

I think a solution for this should be handled in a separate PR which does something like:

Get a list of files to add/edit/delete at the beginning of the response
As the response/file list is streamed in, check if the files have been added into the context
If they haven't been added, stop the response

waydegg · 2023-08-20T22:00:08Z

Here are some partial snippets of different sized code maps:

Full Code Map

mentat/__main__.py

mentat/app.py
        function
                expand_paths (paths: Iterable[str])
                get_user_feedback_on_changes ( config: ConfigManager, conv: Conversation, user_input_manager: UserInputManager, code_file_manager: CodeFileManager, code_changes: Iterable[CodeChange], )
                loop ( paths: Iterable[str], exclude_paths: Optional[Iterable[str]], cost_tracker: CostTracker, )
                run (paths: Iterable[str], exclude_paths: Optional[Iterable[str]] = None)
                run_cli ()
                user_filter_changes ( user_input_manager: UserInputManager, code_changes: Iterable[CodeChange] )

mentat/change_conflict_resolution.py
        function
                resolve_insertion_conflicts ( changes: list[CodeChange], user_input_manager: UserInputManager, code_file_manager )
                resolve_non_insertion_conflicts ( changes: list[CodeChange], user_input_manager: UserInputManager )

mentat/code_change.py
        CodeChange
                member
                        __init__ ( self, json_data: dict, code_lines: list[str], code_file_manager, )
                        __lt__ (self, other)
                        apply (self, cur_file_lines: list[str])
        CodeChangeAction
                member
                        has_additions (self)
                        has_removals (self)
                        has_surrounding_lines (self)
                variable
                        CreateFile
                        Delete
                        DeleteFile
                        Insert
                        Replace
        class
                CodeChange
                CodeChangeAction

Code Map without signatures

mentat/__main__.py

mentat/app.py
        function
                expand_paths
                get_user_feedback_on_changes
                loop
                run
                run_cli
                user_filter_changes

mentat/change_conflict_resolution.py
        function
                resolve_insertion_conflicts
                resolve_non_insertion_conflicts

mentat/code_change.py
        class
                CodeChange
                CodeChangeAction
        member
                __init__
                __lt__
                apply
                has_additions
                has_removals
                has_surrounding_lines
        variable
                CreateFile
                Delete
                DeleteFile
                Insert
                Replace

Code map with file structure only

.gitignore
LICENSE
README.md
docs
        configuration.md
mentat
        __init__.py
        __main__.py
        app.py
        change_conflict_resolution.py
        code_change.py
        code_change_display.py
        code_file_manager.py
        code_map.py
        config_manager.py
        conversation.py
        default_config.json
        errors.py
        git_handler.py
        llm_api.py
        logging_config.py
        parsing.py
        prompts.py
        streaming_printer.py
        user_input_manager.py
pyproject.toml
requirements.txt
setup.py

biobootloader · 2023-08-22T04:02:29Z

Fantastic! I'll review this tomorrow. Thank you for the detailed code map examples!

mentat/conversation.py

mentat/prompts.py

mentat/code_map.py

PCSwingle · 2023-08-22T14:35:08Z

Just looked over a lot of the PR; so far it looks great! Super excited to get this merged! Left a couple of comments on a few things, but nothing too massive; I also didn't have a lot of time so I didn't end up finishing reviewing the code_map.py file; I'll leave that up to @biobootloader

biobootloader

Great new feature! I've left a few comments, here are some more general points:

Since there's a big change to the system prompt, we should run benchmarks with and without to make sure we don't lose performance. I can do that once we are closer to merging. I'll add this task to the PR description as well.

If it suggests edits to files it can't fully see, it'd be good to suggest the user adds them to the context. We'll soon have "commands" (#59) and a /add command to add a specific file. In this PR we can just warn the user that it tried to edit a file it couldn't see the code for and have them rerun with it added.

Let's make this feature optional as well, since big maps would add to api costs. We can have it be default on and have a command line flag to turn off? (For example if a user is editing an isolated file or two in a big codebase, they may want to turn it off).

mentat/code_map.py

mentat/conversation.py

biobootloader · 2023-08-22T19:33:15Z

mentat/conversation.py

+                for message in messages:
+                    prompt_token_count += count_tokens(message["content"])


perhaps we should make self.messages more than just a list of messages, but instead also store token counts for each message and/or a total, so we don't recalculate again and again?

don't have to make that change as part of this PR, we can make a separate issue for that

I'm wondering how much of a performance increase this would be? I don't think it's much work to set this up though so it's worth it to see if it helps a good amount.

Yeah it may not really matter, but if done cleanly would be nice. Doesn't have to be this PR though

mentat/conversation.py

biobootloader · 2023-08-22T19:52:18Z

For sessions where there aren't a bunch of files/code, the updated system prompt is pretty good at not suggesting edits to files it doesn't have in it's context (but does have in it's Code Map). It does start to break down as LOC and number of files grows however, and at a certain point it just keeps suggesting edits no matter what I try with the system prompt.

I think a solution for this should be handled in a separate PR which does something like:

Get a list of files to add/edit/delete at the beginning of the response

As the response/file list is streamed in, check if the files have been added into the context

If they haven't been added, stop the response

Agreed, let's do this in a follow-up PR

biobootloader · 2023-08-22T20:23:26Z

For testing, lets add a benchmark (https://github.com/biobootloader/mentat/blob/main/tests/benchmark_test.py) that asks it to make change that it could only make successfully with the map. Set up some simple scenario where you only show it one file, and ask it to make a change to that file that requires a call to a function in another file, that it could see in the map.

Since benchmarks don't run on Github actions it'd also be good to add some sanity check tests as well. Especially with things that we wouldn't notice if we accidentally broke, like which map is used with different token limits, etc.

waydegg · 2023-08-23T00:15:12Z

Thanks for the reviews! I agree w/ all of the suggestions/comments, will get to work on addressing everything today.

waydegg · 2023-08-25T02:26:33Z

@biobootloader I added a benchmark test that forces the model to import from a module in the code map that's outside the code files context.

Also, I decided to revert the system prompt closer to what it was originally for a couple reasons:

Since we aren't trying to prevent edits to files not added in CodeFiles for this PR, we shouldn't introduce that logic into the system prompt just yet
There was some variability in the benchmark tests passing/failing with the new system prompt after I ran the benchmarks several times back-to-back. This is to be expected since we have temperature set to a non-zero value and while the benchmarks on the main branch pass/fail with some variance, there's less of a chance of introducing any regressions by keeping the closer-to-original system prompt

biobootloader · 2023-08-25T21:54:57Z

Thanks for running the benchmarks, and I yeah I agree changing the system prompt is hard. More and better benchmarking would help us change it with more confidence. I like the call to keep it more similar to original.

mentat/conversation.py

biobootloader · 2023-08-26T00:18:57Z

Looks great! This will be ready to merge after addressing my one comment about letting the user know which kind of map (level of detail) is being used.

I've added two follow up issues to add an additional benchmark and test after this is merged: #76 , #77

biobootloader

🎉 awesome!

waydegg added 4 commits August 18, 2023 11:53

add install instructions for ctags

101b0b5

add code map, create maps for added files

a7f2af5

create code map from repo

2da7e60

update initial system prompt

7592f03

waydegg added 5 commits August 19, 2023 20:58

create mape of only file structure

1126ad0

add default token limit

39422bc

bug fixes

0a051e8

update system prompt

748bca1

removed imports

a8d330b

waydegg marked this pull request as ready for review August 20, 2023 21:53