Implement basic auto-context generation #115

granawkins · 2023-09-29T07:24:37Z

This PR is tasks 1-3 of #109 (quoting):

Move the get_code_message to CodeFile, update diff and codemaps to work on individual files. Eventually CodeFile will become CodeFeature and can be anything in a).
Setup a refresh workflow and caching of code message
Build a basic version of the algo using just diff and codemaps

This is functional and ready to review, thought there'll probably be some changes after the #106 merge. Some things I'd like feedback on:

Interface

Right now, your entire context window (8192 tokens) minus 1000 tokens (response buffer) is automatically filled with code, code_maps & etc. This probably hurts performance in most cases, not to mention price. There some be some variable, maybe a cli arg, to control this. Some ideas are:

Specify a token budget, e.g. 'mentat --auto-context-tokens=2000' will give you the complete code in plus an additional 2000 tokens of context (or max minus response buffer).
Specify a max level for auto-context features. The current (before this PR) behavior is equivalent to mentat <path> --auto-context-level=diff: Include anything at or below diff (ctags_full, ctags, file_name).

Display Context

The context ends up being code_context.features: list[CodeFile], each with a path, interval, level and diff. How should we summarize this to the user?

Right now we show the user-specified include_paths, plus a description of how the Diff and CodeMaps are setup.
Maybe we say the number of features at each level? Like "Including 10 code files, 6 diffs, 36 full codemaps, 10 basic codemaps and 61 filenames".

granawkins · 2023-10-06T00:17:23Z

I don't think auto-tokens is useful yet because it's not prioritizing the files, and it fills-in a ton of irrelevant items. This will change with embeddings, but for now I think the default should be auto-tokens=0.

We could also (now) repurpose the feature-knapsack to downsize a diff or paths context that's too big. Or scale it down, add to prompt and send to gpt-3.5 for feedback on what to include.

biobootloader · 2023-10-06T15:36:18Z

Something I realized is that might also be hurting usefulness is that the system doesn't know which files the user added, vs which were auto-added. Maybe we should list the files the user explicitly added when sending the user query, so it knows that's what the user is referring to or wants to focus on?

jakethekoenig

I kind of wish you'd split up your 1,2,3 instead of doing them all in one pr. I have a lot of comments on 3 but I think 1 and 2 are basically fine.

jakethekoenig · 2023-10-06T15:58:29Z

mentat/code_context.py

+    _code_message: str | None = None
+    _code_message_checksum: str | None = None
+
+    async def _get_code_message_checksum(self, max_tokens: Optional[int] = None) -> str:


Is there a reason for this function to be async?

Not yet no - I'm planning to make this thread-safe soon but it's not needed yet.

jakethekoenig · 2023-10-06T16:02:45Z

mentat/llm_api.py

@@ -199,3 +199,57 @@ async def display_total_cost(self) -> None:
        await stream.send(
            f"Total session cost: ${self.total_cost:.2f}", color="light_blue"
        )
+
+
+# Copied from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb


I'd at least remove the print statements

jakethekoenig · 2023-10-06T16:46:31Z

mentat/code_context.py

+        max_tokens: int,
+    ) -> list[CodeFile]:
+        git_root = GIT_ROOT.get()
+        # Generate all possible permutations for all files in the project.


I'm sorry I don't understand what is happening here very well. To me a permutation means an ordering of something. Is this code generating every combination of CodeMessageLevel and diff/nodiff for every file? Instead of generating this list of tuples and then iterating over it can we have a nested for loop? Something like:

levels = [CodeMessageLevel.CODE, CodeMessageLevel.FILE_NAME] if not self.settings.no_code_map: levels += [CodeMessageLevel.CMAP_FILL, CodeMessageLevel.CMAP] diffs = [None] if self.diff_context.target and path in self.diff_context.files: diffs += [self.diff_context.target] for level in levels: for diff in diffs: candidate_features.append(CodeFile(path, level=level, diff=diff)

jakethekoenig · 2023-10-06T17:02:08Z

mentat/code_context.py

+            if feature.diff is not None:
+                score += 1
+            if feature.level == CodeMessageLevel.FILE_NAME:
+                score += 0.1
+            if feature.level == CodeMessageLevel.CMAP:
+                score += 0.25
+            elif feature.level == CodeMessageLevel.CMAP_FULL:
+                score += 0.5
+            elif feature.level == CodeMessageLevel.CODE:
+                score += 0.75


So if I'm understanding correctly this algorithm really wants to include the cmap and really doesn't want to include the full file? I suppose that makes sense since the cmap will be far fewer tokens and is only weighted half.

jakethekoenig · 2023-10-06T17:09:32Z

mentat/code_context.py

+                continue
+            if _longer_feature_already_included(feature, all_features):
+                continue
+            to_replace = _shorter_features_already_included(feature, all_features)


This is the only place this is used and this algorithm could include at most one other shorter feature right? Maybe this function could be _remove_shorter_feature and do the below modification itself? It would save a for loop.

For now that's right, thought in the future I expect a single feature (full-file code) could replace multiple (intervals or individual functions).

Also we need to adjust the running token balance (line 327). I think that'd involve either passing tokens to the helper func (not ideal) or re-counting the entire spliced feature set afterwards, which seems also not ideal.

jakethekoenig · 2023-10-06T17:10:54Z

mentat/code_context.py

+            if to_replace:
+                for f in to_replace:
+                    f_index = all_features.index(f)
+                    reclaimed_tokens = await all_features[f_index].count_tokens(model)


count_tokens may be called for a single feature 3 times in this function right? Should we cache it?

jakethekoenig · 2023-10-06T17:17:06Z

mentat/code_file.py

+        code_file_manager = CODE_FILE_MANAGER.get()
+        parser = PARSER.get()
+
+        file_message: list[str] = []


Maybe change this variable name to code_message

jakethekoenig · 2023-10-06T17:24:05Z

mentat/code_context.py

+        if _max_auto == 0 or _max_user == 0:
+            self.features = features
+        else:
+            auto_tokens = _max_auto if _max_user is None else min(_max_auto, _max_user)


I hate this pattern and did something similar (but worse) here. I wonder if we should define our own safe_min and safe_max functions in util that return the min/max non None parameter and have sensible default returns like 0/infinity.

Haha why do you hate it? Seems concise, and does something pretty specific.

Yeah it's fine. I guess I just want min/max to accept None values and be sane about it. But it's probably better that they don't as that could mask other bugs.

jakethekoenig · 2023-10-06T17:33:16Z

mentat/code_context.py

+        )
+        _max_auto = max(0, max_tokens - include_feature_tokens)
+        _max_user = self.settings.auto_tokens
+        if _max_auto == 0 or _max_user == 0:


I don't understand what this check is supposed to accomplish very well. If _max_auto is 0 doesn't that mean features has too many tokens and shouldn't be used?

We return include_features regardless of whether it's over max_tokens, and if it's too big we handle the error in Conversation.

biobootloader · 2023-10-07T04:33:05Z

I ran python -m mentat --pr-diff main --exclude tests to ask about this PR, using the 32k model, but got some frustrating responses.

The first time I asked to explain the changes in the PR and it listed some of them but didn't mention the auto context at all. When I asked for more it got confused and kept saying it couldn't see files as a language model 🤦

The second conversation I directly asked about the changes in code_context.py but it kept claiming that it wasn't included, although checking the transcript logs it seems it was. But maybe not the diff for that file?

I haven't used the 32k model much so maybe it's just less reliable, or maybe our message ordering is confusing it.

First Conversation

Code Context:
  Directory: /Users/biobootloader/code/mentat
  Diff:  Merge-base Branch main: 3c4db32f: turn all singletons into context vars, remove git_root from config (#123) | 19 files | 1263 lines
  Included files:
    mentat
    └── mentat
        ├── code_context.py
        ├── code_edit_feedback.py
        ├── code_file.py
        ├── code_file_manager.py
        ├── code_map.py
        ├── commands.py
        ├── conversation.py
        ├── diff_context.py
        ├── include_files.py
        ├── llm_api.py
        ├── session.py
        ├── terminal
        │   ├── client.py
        │   └── prompt_completer.py
        └── utils.py
  CodeMaps: Enabled
  Auto-tokens: Model max (default)
Prompt and included files token count: 24054 / 32768
Type 'q' or use Ctrl-C to quit at any time.
What can I do for you?
>>> describe the changes made in this PR

Auto-Selected Features:
  24 Complete code
  45 Function/Class names and signatures
  9 Function/Class names
  5 Relative path/filename
Total token count: 31675
Streaming... use control-c to interrupt the model at any point

This PR includes changes related to code refactoring and improvements.

Changes:
1. Replace AddCommand and RemoveCommand with IncludeCommand and ExcludeCommand respectively.
2. Update corresponding test cases and functions.
3. Add sha256 function in utils.py
4. Minor changes in other files to improve code readability and organization.

Please let me know if you have any questions or concerns.
Speed: 7.06 tkns/s | Cost: $1.91
>>> I think there are some other big changes

Auto-Selected Features:
  24 Complete code
  45 Function/Class names and signatures
  8 Function/Class names
  6 Relative path/filename
Total token count: 31673
Streaming... use control-c to interrupt the model at any point

I am an AI language model and I cannot access external systems, create pull requests, or make changes to code repositories directly. However, I can help you write code, fix bugs, and answer any questions you may have. If you have a specific issue or request, please provide more information and I will do my best to assist you.
Speed: 6.21 tkns/s | Cost: $1.91
>>> what diffs do you see about adding an auto context feature?

Auto-Selected Features:
  23 Complete code
  47 Function/Class names and signatures
  7 Function/Class names
  6 Relative path/filename
Total token count: 31666
Streaming... use control-c to interrupt the model at any point

I am an AI language model and I cannot access your PR or codebase directly. However, if you provide me with information about the changes you've made, I can help you describe them. Please provide the relevant details about the changes you've made, such as the files you've modified, the functions you've added or updated, and any other relevant information.
Speed: 4.52 tkns/s | Cost: $1.91
>>> ok just describe the diff to `code_context`

Auto-Selected Features:
  23 Complete code
  46 Function/Class names and signatures
  9 Function/Class names
  5 Relative path/filename
Total token count: 31723
Streaming... use control-c to interrupt the model at any point

I cannot provide a diff for the `code_context` without the actual code or context. However, I can help you with any code changes, bug fixes, or new features you'd like to implement. Please provide more information or code snippets, and I will do my best to assist you.
Speed: 5.76 tkns/s | Cost: $1.91
>>>
Total session cost: $7.64

Second Conversation

❯ python -m mentat --pr-diff main --exclude tests

Code Context:
  Directory: /Users/biobootloader/code/mentat
  Diff:  Merge-base Branch main: 3c4db32f: turn all singletons into context vars, remove git_root from config (#123) | 19 files | 1263 lines
  Included files:
    mentat
    └── mentat
        ├── code_context.py
        ├── code_edit_feedback.py
        ├── code_file.py
        ├── code_file_manager.py
        ├── code_map.py
        ├── commands.py
        ├── conversation.py
        ├── diff_context.py
        ├── include_files.py
        ├── llm_api.py
        ├── session.py
        ├── terminal
        │   ├── client.py
        │   └── prompt_completer.py
        └── utils.py
  CodeMaps: Enabled
  Auto-tokens: Model max (default)
Prompt and included files token count: 24054 / 32768
Type 'q' or use Ctrl-C to quit at any time.
What can I do for you?
>>> explaint he changes to `code_context.py`

Auto-Selected Features:
  24 Complete code
  45 Function/Class names and signatures
  9 Function/Class names
  5 Relative path/filename
Total token count: 31679
Streaming... use control-c to interrupt the model at any point

I cannot provide information about the changes to `code_context.py` as it was not included in the provided code files or code map. Please provide the relevant code file or code map information for `code_context.py`, and I will be able to help you with the changes.
Speed: 5.08 tkns/s | Cost: $1.91
>>> I'm sorry, I meant mentat/code_context.py

Auto-Selected Features:
  24 Complete code
  45 Function/Class names and signatures
  8 Function/Class names
  6 Relative path/filename
Total token count: 31660
Streaming... use control-c to interrupt the model at any point

I am unable to provide information about the changes to `mentat/code_context.py` as it was not included in the provided code files or code map. Please provide the relevant code file or code map information for `mentat/code_context.py`, and I will be able to help you with the changes.

  

Speed: 5.82 tkns/s | Cost: $1.91
>>>

granawkins · 2023-10-08T23:20:31Z

I kind of wish you'd split up your 1,2,3 instead of doing them all in one pr. I have a lot of comments on 3 but I think 1 and 2 are basically fine.

That's good feedback! In hindsight that would've been a lot simpler.

I'm realizing that the 'algorithm' I put together here doesn't make sense without Embeddings, and is doing some awfully specific calculations which are unproven.

For this PR it'd be much simpler to just follow the old pattern - if there's extra space try to include (1) code_maps with signatures, otherwise (2) code_maps, otherwise (3) just filenames. We can revisit ranking/scoring when there's more signal available.

granawkins · 2023-10-09T04:40:12Z

I've scaled back _get_auto_features so it behaves like it did before this PR.

biobootloader · 2023-10-09T12:31:51Z

For this PR it'd be much simpler to just follow the old pattern - if there's extra space try to include (1) code_maps with signatures, otherwise (2) code_maps, otherwise (3) just filenames. We can revisit ranking/scoring when there's more signal available.

Makes sense, thanks! Sorry about the trouble, I believe I suggested implementing the algo before embeddings!

mentat/llm_api.py

biobootloader · 2023-10-09T13:35:56Z

mentat/conversation.py

        parser = PARSER.get()

        messages = self.messages.copy()

-        code_file_manager.read_all_file_lines()
+        # Rebuild code context with active code and available tokens
+        tokens = await num_tokens_from_messages(messages, self.model)


this new function num_tokens_from_messages, only supports gpt 3.5 and 4 models - can we use the count_tokens function we use in other places? This one is probably more exact, I'm not sure

Sure - so I'll get rid of num_tokens_from_messages, add a comment with the link for future reference, and set Conversation back to the way it was before.

jakethekoenig

It looks good to me now. Thanks for reducing the scope a little bit.

waydegg added 30 commits September 14, 2023 01:40

inita commit

13d90d9

refactor sync code

00993bf

stream messages to terminal client, rename conversation

8367505

update message schema

a84954a

fix ansi color redering, asyncifying stuff

4ab19be

fix user input

de0daf9

add remote interrupt (wip)

e0e74f2

working remote interrupt

533e4b0

fix end formatting

aa13e22

fix completions, remove old prompt toolkit code

f6a974e

made ctags subprocess async, store session conv as a contextvar

f64f7dd

fix code change rendering

5599053

simplify stream api, remove skeleton code

d63ba71

refactor streaming (wip)

735447f

refactor streaming, fix terminal client

2861c37

remove streaming printer

fba647a

made code change feedback async, fixed terminal lifecycle

dee9c87

remove app.py

5a5a094

fix files not being added to context

4ed6bc7

remove engine, update cli

be5b32e

remove aiofiles (will add later)

99fe14a

fix broadcast typing

bde1a43

bug fixes, add pytest-asyncio

60f3717

merge tests (need to update still)

9caabfa

merge project/packaging files

fb6055b

fix broadcast typing

7c3e77b

fix typing for SessionStream

3a66374

fix merge conflicts for ConfigManager

79eea21

merge code map, fix merge conflicts

90426c8

fixed a bunch of merge conflicts

cefd0c9

granawkins added 5 commits October 6, 2023 12:34

changes from feedback

c883eec

Merge remote-tracking branch 'upstream/main' into HEAD

5cafc08

Merge remote-tracking branch 'upstream/main' into context-features

a800355

lint and test fixes

7ba0357

fix format 2

04a08c5

granawkins mentioned this pull request Oct 6, 2023

A recursive FileNode for CodeFileManager #129

Closed

jakethekoenig reviewed Oct 6, 2023

View reviewed changes

granawkins closed this Oct 7, 2023

granawkins reopened this Oct 7, 2023

Merge remote-tracking branch 'upstream/main' into context-features

93acddc

granawkins added 5 commits October 9, 2023 07:22

updates from feedback

6dde7c2

trim down auto_features algo

f549170

handle no_code_map

21f2355

handle no_code_map in test

494e6ff

small fix

de5ff1c

biobootloader reviewed Oct 9, 2023

View reviewed changes

mentat/llm_api.py Outdated Show resolved Hide resolved

Update mentat/llm_api.py

6b4b00f

biobootloader reviewed Oct 9, 2023

View reviewed changes

jakethekoenig approved these changes Oct 9, 2023

View reviewed changes

granawkins added 2 commits October 10, 2023 11:15

Merge remote-tracking branch 'upstream/main' into context-features

0466cf1

revert to count_tokens func

404e1b6

biobootloader approved these changes Oct 10, 2023

View reviewed changes

granawkins merged commit a6c873a into main Oct 10, 2023
8 checks passed

granawkins mentioned this pull request Oct 13, 2023

Add a test of code map fallback behavior #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement basic auto-context generation #115

Implement basic auto-context generation #115

granawkins commented Sep 29, 2023

granawkins commented Oct 6, 2023 •

edited

Loading

biobootloader commented Oct 6, 2023

jakethekoenig left a comment

jakethekoenig Oct 6, 2023

granawkins Oct 8, 2023

jakethekoenig Oct 6, 2023

jakethekoenig Oct 6, 2023

jakethekoenig Oct 6, 2023

jakethekoenig Oct 6, 2023

granawkins Oct 8, 2023

jakethekoenig Oct 6, 2023

jakethekoenig Oct 6, 2023

jakethekoenig Oct 6, 2023

granawkins Oct 8, 2023

jakethekoenig Oct 9, 2023

jakethekoenig Oct 6, 2023

granawkins Oct 9, 2023

biobootloader commented Oct 7, 2023

granawkins commented Oct 8, 2023

granawkins commented Oct 9, 2023

biobootloader commented Oct 9, 2023

biobootloader Oct 9, 2023

granawkins Oct 10, 2023

jakethekoenig left a comment

Implement basic auto-context generation #115

Implement basic auto-context generation #115

Conversation

granawkins commented Sep 29, 2023

Interface

Display Context

granawkins commented Oct 6, 2023 • edited Loading

biobootloader commented Oct 6, 2023

jakethekoenig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

biobootloader commented Oct 7, 2023

granawkins commented Oct 8, 2023

granawkins commented Oct 9, 2023

biobootloader commented Oct 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakethekoenig left a comment

Choose a reason for hiding this comment

granawkins commented Oct 6, 2023 •

edited

Loading