feat: use tree sitter to enhance context for code operations #51

Robitx · 2023-11-09T20:57:32Z

https://neovim.io/doc/user/treesitter.html#lua-treesitter

sirupsen · 2023-11-19T22:03:46Z

@Robitx even better might be to embed the code-base with a local model (or even OpenAI) to embed into the context, so it can be done cross-file

teocns · 2023-12-04T03:47:19Z

I recommend enforcing brainstorming on very large projects, such as Chromium (38 million LoC).

Disclaimer: I am not a seasoned vim user, hence my knowledge might be limited

Per my understanding, Treesitter doesn't understand the semantics or the context beyond the structure of the code.
In a typical development routine of mine, most of the times the practical challenge is studying the workflow and lifecycle of components (symbols) across the [large] codebase.

Other tools (such as EasyCodeAI) have figured it out, and implemented their self-hosted codebase indexer which comes with its own limitations.

I think focusing on LSPs symbol referencing mechanism can be key; however when thinking large codebases there can be additional challenges, such as context overloading with dirty files that might not be of interest. What might be sounding plausible at first glance to me, is a multi-stage prompt operation whereas in the first, GPT is fed context upon referenced symbols, it discards those that are not-of-interest (based on a score) and only then it is feeded a cleaner context. The latest Github Copilot Chat VSCode extension works in that way.

To objectively align, we're limited by the context current GPT models can support, whose largest model adds up to 128k tokens.

teocns · 2023-12-05T01:39:09Z

To add more thoughts to the topic, I'd also like to highlight the indexing challenge.
So far I have been working with two kinds of indexing mechanisms, being static and dynamic.

Looking at clangd as example:

Static indexing

it is able to pre-index an entire codebase based on compile_commands.json (in the case of clangd), which stores on disk and gives you an instantaneous symbol resolution across all the codebase.

Pros and cons are respectively: having instantaneous symbols resolution project-wise, at the expense of having to pre-index the entire codebase (takes arond ~3 hours on projects like Chromium on most powerful Mac) before being able to consume it.

Dynamic indexing

Indexes symbols dynamically based on the current buffer (by analyzing imports / includes). It is fast and handy, especially for module/sub-module work scope, but does not reach outer contexts: this is a significant drawback if we think objectively about the importance of out-of-scope context when GPT serves as a "codebase-wide assistant".

Robitx changed the title ~~use tree sitter to enhance context for code operations~~ feat: use tree sitter to enhance context for code operations Nov 9, 2023

Robitx self-assigned this Nov 9, 2023

teocns mentioned this issue Dec 17, 2023

Make use of Assistants API, for store whole project jackMort/ChatGPT.nvim#328

Open

Robitx added the enhancement New feature or request label Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use tree sitter to enhance context for code operations #51

feat: use tree sitter to enhance context for code operations #51

Robitx commented Nov 9, 2023 •

edited

Loading

sirupsen commented Nov 19, 2023

teocns commented Dec 4, 2023 •

edited

Loading

teocns commented Dec 5, 2023 •

edited

Loading

feat: use tree sitter to enhance context for code operations #51

feat: use tree sitter to enhance context for code operations #51

Comments

Robitx commented Nov 9, 2023 • edited Loading

sirupsen commented Nov 19, 2023

teocns commented Dec 4, 2023 • edited Loading

teocns commented Dec 5, 2023 • edited Loading

Static indexing

Dynamic indexing

Robitx commented Nov 9, 2023 •

edited

Loading

teocns commented Dec 4, 2023 •

edited

Loading

teocns commented Dec 5, 2023 •

edited

Loading