Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate-Context Algorithm #109

Closed
5 tasks done
granawkins opened this issue Sep 27, 2023 · 7 comments
Closed
5 tasks done

Generate-Context Algorithm #109

granawkins opened this issue Sep 27, 2023 · 7 comments

Comments

@granawkins
Copy link
Member

granawkins commented Sep 27, 2023

The way we currently generate context is:

  1. Add files that the user has selected
  2. Add diff annotations to those files for the diff or pr-diff they select
  3. Calculate how many tokens we've used. If it's over the model's max, throw an error.
  4. Else, If no_code_map is false:
    a) Try to include filename/functions/signatures
    b) If it's too big, try to include filename/functions
    c) If it's too big, try to include just filenames

At 4), we want to use the remaining context in the most valuable way - not just fill it in with code_map. To do this we will:

  • a) Make a list of all the features we could potentially include. Would include (mentat/app.py, 'code'), (mentat/app.py, 'diff'), (mentat/app.py, 'cmap_signatures') etc. for different features of a file, as well as smaller chunks, e.g. (mentat/app.py:run 'code'), (mentat/app.py:loop 'code'). Chunks within files should cover the entire file without overlap.
  • b) Assign a relevance score to each feature based on (i) its embedding, relative to the current prompt (ii) if/how it relates to user-specified paths/diff, (iii) which functions it calls and is called by, etc.
  • c) Divide the score by some length factor - maybe the literal number of tokens, maybe a parameter like cmap_signature_weight. Just want to prioritize higher-density information.
  • d) Sort all the features by score, and add one-by-one until context is full. If there's overlap conflicts, e.g. <file>:<func> is already included and you add <file>, keep the higher-level item ().

Happy for questions or suggestions on the approach! My plan moving forward is:

  • Move the get_code_message to CodeFile, update diff and codemaps to work on individual files. Eventually CodeFile will become CodeFeature and can be anything in a).
  • Setup a refresh workflow and caching of code message
  • Build a basic version of the algo using just diff and codemaps
  • Add embeddings (with some type of persistent storage) and use to prioritize items in b)
  • Add Tree-sitter to parse files into smaller chunks.
@biobootloader
Copy link
Member

Thanks for the clear write up!

@biobootloader
Copy link
Member

A couple questions / comments:

Would include (mentat/app.py, 'code'), (mentat/app.py, 'diff'), (mentat/app.py, 'cmap_signatures') etc. for different features of a file

Am I correct in understanding that (mentat/app.py, 'code') would contain the full code and diff for a file / chunk, while (mentat/app.py, 'diff') would just contain the diff?

The diffs certainly complicate things. Without them it's clear to me that the "levels of detail" for files / chunks would go from "full code" to cmaps with less and less detail. But if a file or chunk has a diff, the diff alone might show less detail than a code map (which would show function signatures that might not have been touched by the diff). Maybe we should always include all diffs and just vary showing the surrounding code as full code or cmap levels?

[] Add Tree-sitter to parse files into smaller chunks.

So until adding tree-sitter all of this would be operating at the level of entire files, not chunks? Or are you going to use the CodeFile Intervals somehow?

@granawkins
Copy link
Member Author

Am I correct in understanding that (mentat/app.py, 'code') would contain the full code and diff for a file / chunk, while (mentat/app.py, 'diff') would just contain the diff?

The diffs certainly complicate things. Without them it's clear to me that the "levels of detail" for files / chunks would go from "full code" to cmaps with less and less detail. But if a file or chunk has a diff, the diff alone might show less detail than a code map (which would show function signatures that might not have been touched by the diff). Maybe we should always include all diffs and just vary showing the surrounding code as full code or cmap levels?

I agree, maybe this is the move initially.

I do suspect in many cases including the full diff will overshoot the context. Especially when a PR includes new files, because the whole file is effectively a diff. What if diff is another argument, (mentat/app.py, 'code', <diff_target>)? Would be a treeish if the file is part of active diff, otherwise None. Then you'd expect (mentat/app.py, 'cmap', 'HEAD') to have a higher score than (mentat/app.py, 'cmap', None), and you'd get that unless it was too big.

So until adding tree-sitter all of this would be operating at the level of entire files, not chunks? Or are you going to use the CodeFile Intervals somehow?

I'd like to preserve the functionality we have all the way through but not sure how much cajoling that will take. I'll aim for that and keep you posted.

@biobootloader
Copy link
Member

I do suspect in many cases including the full diff will overshoot the context. Especially when a PR includes new files, because the whole file is effectively a diff.

True. There's two use cases when we'll have diffs: 1) the user runs Mentat with --diff or --pr-diff or 2) the diff is just the uncommitted changes, many of which Mentat may have just made earlier in the conversation or in a previous conversation. Hopefully almost always the diffs will all fit in the second case. And the first case is special anyway, so it's probably ok to tell the user that the diff they chose is really big and not going to work well.

Then you'd expect (mentat/app.py, 'cmap', 'HEAD') to have a higher score than (mentat/app.py, 'cmap', None), and you'd get that unless it was too big.

To do this though we'd have to decide how to score/value diffs (i.e. it'd be another parameter). But that could work!

@granawkins
Copy link
Member Author

Per the discussion above:

CodeFiles how have 3 main properties:

  • path (a Path)
  • level ('code', 'interval', 'cmap_full', 'cmap', 'file_name' - in that order)
  • diff (either a git Treeish or None)

In CodeContext, generate all valid permutations of the above for every file in the workspace. These are then scored and sorted, and added one-by-one until context is full. It's setup so that:

  • Only one permutation is included per path - the longest permutation that the algorithm sees before it runs out of space.
  • Diffs are included as annotations to code or by just appending the changed lines to a cmap. A permutation with a diff is weighted much higher and should usually be preferred.
  • User-specified paths can have an interval like they used to, but and the algorithm could also choose to include the entire file. We don't generate interval permutations yet.

Hope this helps grok the code a bit.

@granawkins
Copy link
Member Author

granawkins commented Nov 9, 2023

This is largely complete now, and we want to move toward an official launch.

Changes to code before launch:

  • Add tqdm for context-generation
  • Make display_features clearer, so you can see exactly what auto-context added
  • Update documentation for auto-context workflow
  • Handle invalid json responses in LLMFeatureFilter gracefully
  • LLM call made during feature selection shouldn't print "Speed: ... Cost: ..." #265
  • Enable edits to files that are not included (ask permission)
  • Auto-truncate included context with warming
  • Add CostTracker to LLMFeatureFilter

Tasks for launch:

  • Flow diagram of the whole process
  • Demo video(s)
  • Blog post
  • Tweet thread

@jakethekoenig
Copy link
Member

Thanks for writing this up. Some other things I think we need to do:

  • Auto context enabled should allow mentat to edit files that are not included. User confirmation is asked anyway so I think this is fine.
  • When auto-context is enabled rather than failing when the included files exceed the context we should give a warning that our auto context system is selecting a subset of the included files and they may not all be in the LLM's context. In the limit I think mentat and mentat . should have the same behavior.
  • LLMFeatureFilter needs to report its cost to the CostTracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants