Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic auto-context generation #115

Merged
merged 116 commits into from
Oct 10, 2023
Merged

Implement basic auto-context generation #115

merged 116 commits into from
Oct 10, 2023

Conversation

granawkins
Copy link
Member

This PR is tasks 1-3 of #109 (quoting):

  1. Move the get_code_message to CodeFile, update diff and codemaps to work on individual files. Eventually CodeFile will become CodeFeature and can be anything in a).
  2. Setup a refresh workflow and caching of code message
  3. Build a basic version of the algo using just diff and codemaps

This is functional and ready to review, thought there'll probably be some changes after the #106 merge. Some things I'd like feedback on:

Interface

Right now, your entire context window (8192 tokens) minus 1000 tokens (response buffer) is automatically filled with code, code_maps & etc. This probably hurts performance in most cases, not to mention price. There some be some variable, maybe a cli arg, to control this. Some ideas are:

  • Specify a token budget, e.g. 'mentat --auto-context-tokens=2000' will give you the complete code in plus an additional 2000 tokens of context (or max minus response buffer).
  • Specify a max level for auto-context features. The current (before this PR) behavior is equivalent to mentat <path> --auto-context-level=diff: Include anything at or below diff (ctags_full, ctags, file_name).

Display Context

The context ends up being code_context.features: list[CodeFile], each with a path, interval, level and diff. How should we summarize this to the user?

  • Right now we show the user-specified include_paths, plus a description of how the Diff and CodeMaps are setup.
  • Maybe we say the number of features at each level? Like "Including 10 code files, 6 diffs, 36 full codemaps, 10 basic codemaps and 61 filenames".

@granawkins
Copy link
Member Author

granawkins commented Oct 6, 2023

I don't think auto-tokens is useful yet because it's not prioritizing the files, and it fills-in a ton of irrelevant items. This will change with embeddings, but for now I think the default should be auto-tokens=0.

We could also (now) repurpose the feature-knapsack to downsize a diff or paths context that's too big. Or scale it down, add to prompt and send to gpt-3.5 for feedback on what to include.

@biobootloader
Copy link
Member

Something I realized is that might also be hurting usefulness is that the system doesn't know which files the user added, vs which were auto-added. Maybe we should list the files the user explicitly added when sending the user query, so it knows that's what the user is referring to or wants to focus on?

Copy link
Member

@jakethekoenig jakethekoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wish you'd split up your 1,2,3 instead of doing them all in one pr. I have a lot of comments on 3 but I think 1 and 2 are basically fine.

_code_message: str | None = None
_code_message_checksum: str | None = None

async def _get_code_message_checksum(self, max_tokens: Optional[int] = None) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this function to be async?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet no - I'm planning to make this thread-safe soon but it's not needed yet.

@@ -199,3 +199,57 @@ async def display_total_cost(self) -> None:
await stream.send(
f"Total session cost: ${self.total_cost:.2f}", color="light_blue"
)


# Copied from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd at least remove the print statements

max_tokens: int,
) -> list[CodeFile]:
git_root = GIT_ROOT.get()
# Generate all possible permutations for all files in the project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I don't understand what is happening here very well. To me a permutation means an ordering of something. Is this code generating every combination of CodeMessageLevel and diff/nodiff for every file? Instead of generating this list of tuples and then iterating over it can we have a nested for loop? Something like:

levels = [CodeMessageLevel.CODE, CodeMessageLevel.FILE_NAME]
if not self.settings.no_code_map:
  levels += [CodeMessageLevel.CMAP_FILL, CodeMessageLevel.CMAP]
diffs = [None]
if self.diff_context.target and path in self.diff_context.files:
  diffs += [self.diff_context.target]
for level in levels:
  for diff in diffs:
    candidate_features.append(CodeFile(path, level=level, diff=diff)

Comment on lines 338 to 347
if feature.diff is not None:
score += 1
if feature.level == CodeMessageLevel.FILE_NAME:
score += 0.1
if feature.level == CodeMessageLevel.CMAP:
score += 0.25
elif feature.level == CodeMessageLevel.CMAP_FULL:
score += 0.5
elif feature.level == CodeMessageLevel.CODE:
score += 0.75
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm understanding correctly this algorithm really wants to include the cmap and really doesn't want to include the full file? I suppose that makes sense since the cmap will be far fewer tokens and is only weighted half.

continue
if _longer_feature_already_included(feature, all_features):
continue
to_replace = _shorter_features_already_included(feature, all_features)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place this is used and this algorithm could include at most one other shorter feature right? Maybe this function could be _remove_shorter_feature and do the below modification itself? It would save a for loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now that's right, thought in the future I expect a single feature (full-file code) could replace multiple (intervals or individual functions).

Also we need to adjust the running token balance (line 327). I think that'd involve either passing tokens to the helper func (not ideal) or re-counting the entire spliced feature set afterwards, which seems also not ideal.

if to_replace:
for f in to_replace:
f_index = all_features.index(f)
reclaimed_tokens = await all_features[f_index].count_tokens(model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count_tokens may be called for a single feature 3 times in this function right? Should we cache it?

code_file_manager = CODE_FILE_MANAGER.get()
parser = PARSER.get()

file_message: list[str] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change this variable name to code_message

if _max_auto == 0 or _max_user == 0:
self.features = features
else:
auto_tokens = _max_auto if _max_user is None else min(_max_auto, _max_user)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate this pattern and did something similar (but worse) here. I wonder if we should define our own safe_min and safe_max functions in util that return the min/max non None parameter and have sensible default returns like 0/infinity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha why do you hate it? Seems concise, and does something pretty specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's fine. I guess I just want min/max to accept None values and be sane about it. But it's probably better that they don't as that could mask other bugs.

)
_max_auto = max(0, max_tokens - include_feature_tokens)
_max_user = self.settings.auto_tokens
if _max_auto == 0 or _max_user == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this check is supposed to accomplish very well. If _max_auto is 0 doesn't that mean features has too many tokens and shouldn't be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return include_features regardless of whether it's over max_tokens, and if it's too big we handle the error in Conversation.

@biobootloader
Copy link
Member

I ran python -m mentat --pr-diff main --exclude tests to ask about this PR, using the 32k model, but got some frustrating responses.

The first time I asked to explain the changes in the PR and it listed some of them but didn't mention the auto context at all. When I asked for more it got confused and kept saying it couldn't see files as a language model 🤦

The second conversation I directly asked about the changes in code_context.py but it kept claiming that it wasn't included, although checking the transcript logs it seems it was. But maybe not the diff for that file?

I haven't used the 32k model much so maybe it's just less reliable, or maybe our message ordering is confusing it.

First Conversation
Code Context:
  Directory: /Users/biobootloader/code/mentat
  Diff:  Merge-base Branch main: 3c4db32f: turn all singletons into context vars, remove git_root from config (#123) | 19 files | 1263 lines
  Included files:
    mentat
    └── mentat
        ├── code_context.py
        ├── code_edit_feedback.py
        ├── code_file.py
        ├── code_file_manager.py
        ├── code_map.py
        ├── commands.py
        ├── conversation.py
        ├── diff_context.py
        ├── include_files.py
        ├── llm_api.py
        ├── session.py
        ├── terminal
        │   ├── client.py
        │   └── prompt_completer.py
        └── utils.py
  CodeMaps: Enabled
  Auto-tokens: Model max (default)
Prompt and included files token count: 24054 / 32768
Type 'q' or use Ctrl-C to quit at any time.
What can I do for you?
>>> describe the changes made in this PR

Auto-Selected Features:
  24 Complete code
  45 Function/Class names and signatures
  9 Function/Class names
  5 Relative path/filename
Total token count: 31675
Streaming... use control-c to interrupt the model at any point

This PR includes changes related to code refactoring and improvements.

Changes:
1. Replace AddCommand and RemoveCommand with IncludeCommand and ExcludeCommand respectively.
2. Update corresponding test cases and functions.
3. Add sha256 function in utils.py
4. Minor changes in other files to improve code readability and organization.

Please let me know if you have any questions or concerns.
Speed: 7.06 tkns/s | Cost: $1.91
>>> I think there are some other big changes

Auto-Selected Features:
  24 Complete code
  45 Function/Class names and signatures
  8 Function/Class names
  6 Relative path/filename
Total token count: 31673
Streaming... use control-c to interrupt the model at any point

I am an AI language model and I cannot access external systems, create pull requests, or make changes to code repositories directly. However, I can help you write code, fix bugs, and answer any questions you may have. If you have a specific issue or request, please provide more information and I will do my best to assist you.
Speed: 6.21 tkns/s | Cost: $1.91
>>> what diffs do you see about adding an auto context feature?

Auto-Selected Features:
  23 Complete code
  47 Function/Class names and signatures
  7 Function/Class names
  6 Relative path/filename
Total token count: 31666
Streaming... use control-c to interrupt the model at any point

I am an AI language model and I cannot access your PR or codebase directly. However, if you provide me with information about the changes you've made, I can help you describe them. Please provide the relevant details about the changes you've made, such as the files you've modified, the functions you've added or updated, and any other relevant information.
Speed: 4.52 tkns/s | Cost: $1.91
>>> ok just describe the diff to `code_context`

Auto-Selected Features:
  23 Complete code
  46 Function/Class names and signatures
  9 Function/Class names
  5 Relative path/filename
Total token count: 31723
Streaming... use control-c to interrupt the model at any point

I cannot provide a diff for the `code_context` without the actual code or context. However, I can help you with any code changes, bug fixes, or new features you'd like to implement. Please provide more information or code snippets, and I will do my best to assist you.
Speed: 5.76 tkns/s | Cost: $1.91
>>>
Total session cost: $7.64
Second Conversation
❯ python -m mentat --pr-diff main --exclude tests

Code Context:
  Directory: /Users/biobootloader/code/mentat
  Diff:  Merge-base Branch main: 3c4db32f: turn all singletons into context vars, remove git_root from config (#123) | 19 files | 1263 lines
  Included files:
    mentat
    └── mentat
        ├── code_context.py
        ├── code_edit_feedback.py
        ├── code_file.py
        ├── code_file_manager.py
        ├── code_map.py
        ├── commands.py
        ├── conversation.py
        ├── diff_context.py
        ├── include_files.py
        ├── llm_api.py
        ├── session.py
        ├── terminal
        │   ├── client.py
        │   └── prompt_completer.py
        └── utils.py
  CodeMaps: Enabled
  Auto-tokens: Model max (default)
Prompt and included files token count: 24054 / 32768
Type 'q' or use Ctrl-C to quit at any time.
What can I do for you?
>>> explaint he changes to `code_context.py`

Auto-Selected Features:
  24 Complete code
  45 Function/Class names and signatures
  9 Function/Class names
  5 Relative path/filename
Total token count: 31679
Streaming... use control-c to interrupt the model at any point

I cannot provide information about the changes to `code_context.py` as it was not included in the provided code files or code map. Please provide the relevant code file or code map information for `code_context.py`, and I will be able to help you with the changes.
Speed: 5.08 tkns/s | Cost: $1.91
>>> I'm sorry, I meant mentat/code_context.py

Auto-Selected Features:
  24 Complete code
  45 Function/Class names and signatures
  8 Function/Class names
  6 Relative path/filename
Total token count: 31660
Streaming... use control-c to interrupt the model at any point

I am unable to provide information about the changes to `mentat/code_context.py` as it was not included in the provided code files or code map. Please provide the relevant code file or code map information for `mentat/code_context.py`, and I will be able to help you with the changes.

  

Speed: 5.82 tkns/s | Cost: $1.91
>>>

@granawkins granawkins closed this Oct 7, 2023
@granawkins granawkins reopened this Oct 7, 2023
@granawkins
Copy link
Member Author

I kind of wish you'd split up your 1,2,3 instead of doing them all in one pr. I have a lot of comments on 3 but I think 1 and 2 are basically fine.

That's good feedback! In hindsight that would've been a lot simpler.

I'm realizing that the 'algorithm' I put together here doesn't make sense without Embeddings, and is doing some awfully specific calculations which are unproven.

For this PR it'd be much simpler to just follow the old pattern - if there's extra space try to include (1) code_maps with signatures, otherwise (2) code_maps, otherwise (3) just filenames. We can revisit ranking/scoring when there's more signal available.

@granawkins
Copy link
Member Author

I've scaled back _get_auto_features so it behaves like it did before this PR.

@biobootloader
Copy link
Member

For this PR it'd be much simpler to just follow the old pattern - if there's extra space try to include (1) code_maps with signatures, otherwise (2) code_maps, otherwise (3) just filenames. We can revisit ranking/scoring when there's more signal available.

Makes sense, thanks! Sorry about the trouble, I believe I suggested implementing the algo before embeddings!

mentat/llm_api.py Outdated Show resolved Hide resolved
parser = PARSER.get()

messages = self.messages.copy()

code_file_manager.read_all_file_lines()
# Rebuild code context with active code and available tokens
tokens = await num_tokens_from_messages(messages, self.model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this new function num_tokens_from_messages, only supports gpt 3.5 and 4 models - can we use the count_tokens function we use in other places? This one is probably more exact, I'm not sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - so I'll get rid of num_tokens_from_messages, add a comment with the link for future reference, and set Conversation back to the way it was before.

Copy link
Member

@jakethekoenig jakethekoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me now. Thanks for reducing the scope a little bit.

@granawkins granawkins merged commit a6c873a into main Oct 10, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants