-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate-Context Algorithm #109
Comments
Thanks for the clear write up! |
A couple questions / comments:
Am I correct in understanding that The diffs certainly complicate things. Without them it's clear to me that the "levels of detail" for files / chunks would go from "full code" to cmaps with less and less detail. But if a file or chunk has a diff, the diff alone might show less detail than a code map (which would show function signatures that might not have been touched by the diff). Maybe we should always include all diffs and just vary showing the surrounding code as full code or cmap levels?
So until adding tree-sitter all of this would be operating at the level of entire files, not chunks? Or are you going to use the CodeFile Intervals somehow? |
I agree, maybe this is the move initially. I do suspect in many cases including the full diff will overshoot the context. Especially when a PR includes new files, because the whole file is effectively a diff. What if diff is another argument, (mentat/app.py, 'code', <diff_target>)? Would be a treeish if the file is part of active diff, otherwise None. Then you'd expect (mentat/app.py, 'cmap', 'HEAD') to have a higher score than (mentat/app.py, 'cmap', None), and you'd get that unless it was too big.
I'd like to preserve the functionality we have all the way through but not sure how much cajoling that will take. I'll aim for that and keep you posted. |
True. There's two use cases when we'll have diffs: 1) the user runs Mentat with --diff or --pr-diff or 2) the diff is just the uncommitted changes, many of which Mentat may have just made earlier in the conversation or in a previous conversation. Hopefully almost always the diffs will all fit in the second case. And the first case is special anyway, so it's probably ok to tell the user that the diff they chose is really big and not going to work well.
To do this though we'd have to decide how to score/value diffs (i.e. it'd be another parameter). But that could work! |
Per the discussion above:
In
Hope this helps grok the code a bit. |
This is largely complete now, and we want to move toward an official launch. Changes to code before launch:
Tasks for launch:
|
Thanks for writing this up. Some other things I think we need to do:
|
The way we currently generate context is:
diff
orpr-diff
they selectno_code_map
is false:a) Try to include filename/functions/signatures
b) If it's too big, try to include filename/functions
c) If it's too big, try to include just filenames
At 4), we want to use the remaining context in the most valuable way - not just fill it in with code_map. To do this we will:
(mentat/app.py, 'code')
,(mentat/app.py, 'diff')
,(mentat/app.py, 'cmap_signatures')
etc. for different features of a file, as well as smaller chunks, e.g.(mentat/app.py:run 'code')
,(mentat/app.py:loop 'code')
. Chunks within files should cover the entire file without overlap.cmap_signature_weight
. Just want to prioritize higher-density information.<file>:<func>
is already included and you add<file>
, keep the higher-level item ().Happy for questions or suggestions on the approach! My plan moving forward is:
get_code_message
to CodeFile, update diff and codemaps to work on individual files. Eventually CodeFile will become CodeFeature and can be anything in a).diff
andcodemaps
Tree-sitter
to parse files into smaller chunks.The text was updated successfully, but these errors were encountered: