-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use tree sitter to enhance context for code operations #51
Comments
@Robitx even better might be to embed the code-base with a local model (or even OpenAI) to embed into the context, so it can be done cross-file |
I recommend enforcing brainstorming on very large projects, such as Chromium (38 million LoC).
Per my understanding, Treesitter doesn't understand the semantics or the context beyond the structure of the code. Other tools (such as EasyCodeAI) have figured it out, and implemented their self-hosted codebase indexer which comes with its own limitations. I think focusing on LSPs symbol referencing mechanism can be key; however when thinking large codebases there can be additional challenges, such as context overloading with dirty files that might not be of interest. What might be sounding plausible at first glance to me, is a multi-stage prompt operation whereas in the first, GPT is fed context upon referenced symbols, it discards those that are not-of-interest (based on a score) and only then it is feeded a cleaner context. The latest Github Copilot Chat VSCode extension works in that way. To objectively align, we're limited by the context current GPT models can support, whose largest model adds up to 128k tokens. |
To add more thoughts to the topic, I'd also like to highlight the indexing challenge. Looking at Static indexingit is able to pre-index an entire codebase based on Pros and cons are respectively: having instantaneous symbols resolution project-wise, at the expense of having to pre-index the entire codebase (takes arond ~3 hours on projects like Chromium on most powerful Mac) before being able to consume it. Dynamic indexingIndexes symbols dynamically based on the current buffer (by analyzing imports / includes). It is fast and handy, especially for module/sub-module work scope, but does not reach outer contexts: this is a significant drawback if we think objectively about the importance of out-of-scope context when GPT serves as a "codebase-wide assistant". |
https://neovim.io/doc/user/treesitter.html#lua-treesitter
The text was updated successfully, but these errors were encountered: