Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore whether stack graphs may be useful in this tool #69

Open
0xdevalias opened this issue Apr 5, 2024 · 4 comments
Open

Explore whether stack graphs may be useful in this tool #69

0xdevalias opened this issue Apr 5, 2024 · 4 comments

Comments

@0xdevalias
Copy link

Not really a feature request per se, but came across this project after it was mentioned in another issue (Ref), and figured I would create an issue to share the same info here in case it's useful.

I notice that you're already using tree-sitter, so it may not be a lot of extra effort to make use of tree-sitter-graph and/or the stack-graphs project (eg. tree-sitter-stack-graphs-javascript, etc)

From watching the demo video on linkedin + briefly skimming this repo, it looks like there might be some useful crossovers, and it may allow you to add support for a whole bunch more languages without needing to re-invent the wheel to do so.

A few notes/links/references I recently collated RE: stack graphs + related libs:

Stack Graphs (an evolution of Scope Graphs) sound like they could be really interesting/useful with regards to code navigation, symbol mapping, etc. Perhaps we could use them for module identification, or variable/function identifier naming stabilisation or similar?

  • https://github.blog/changelog/2024-03-14-precise-code-navigation-for-typescript-projects/
    • Precise code navigation is now available for all TypeScript repositories.
      Precise code navigation gives more accurate results by only considering the set of classes, functions, and imported definitions that are visible at a given point in your code.

      Precise code navigation is powered by the stack graphs framework.
      You can read about how we use stack graphs for code navigation and visit the stack graphs definition for TypeScript to learn more.

      • https://github.blog/2021-12-09-introducing-stack-graphs/
        • Introducing stack graphs

        • Precise code navigation is powered by stack graphs, a new open source framework we’ve created that lets you define the name binding rules for a programming language using a declarative, domain-specific language (DSL). With stack graphs, we can generate code navigation data for a repository without requiring any configuration from the repository owner, and without tapping into a build process or other CI job.

        • LOTS of interesting stuff in this post..
        • As part of developing stack graphs, we’ve added a new graph construction language to Tree-sitter, which lets you construct arbitrary graph structures (including but not limited to stack graphs) from parsed CSTs. You use stanzas to define the gadget of graph nodes and edges that should be created for each occurrence of a Tree-sitter query, and how the newly created nodes and edges should connect to graph content that you’ve already created elsewhere.

        • Why aren’t we using the Language Server Protocol (LSP) or Language Server Index Format (LSIF)?

          To dig even deeper and learn more, I encourage you to check out my Strange Loop talk and the stack-graphs crate: our open source Rust implementation of these ideas.

  • https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github
    • GitHub has developed two code navigation approaches based on the open source tree-sitter and stack-graphs library:

      • Search-based - searches all definitions and references across a repository to find entities with a given name
      • Precise - resolves definitions and references based on the set of classes, functions, and imported definitions at a given point in your code

      To learn more about these approaches, see "Precise and search-based navigation."

      • https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation
        • Precise and search-based navigation
          Certain languages supported by GitHub have access to precise code navigation, which uses an algorithm (based on the open source stack-graphs library) that resolves definitions and references based on the set of classes, functions, and imported definitions that are visible at any given point in your code. Other languages use search-based code navigation, which searches all definitions and references across a repository to find entities with a given name. Both strategies are effective at finding results and both make sure to avoid inappropriate results such as comments, but precise code navigation can give more accurate results, especially when a repository contains multiple methods or functions with the same name.

  • https://pl.ewi.tudelft.nl/research/projects/scope-graphs/
    • Scope Graphs | A Theory of Name Resolution

    • Scope graphs provide a new approach to defining the name binding rules of programming languages. A scope graph represents the name binding facts of a program using the basic concepts of declarations and reference associated with scopes that are connected by edges. Name resolution is defined by searching for paths from references to declarations in a scope graph. Scope graph diagrams provide an illuminating visual notation for explaining the bindings in programs.

Originally posted by @0xdevalias in 0xdevalias/chatgpt-source-watch#11

@berrazuriz1
Copy link
Contributor

Hi, thanks for the tips and the reading list, I already had read all of them but LSP and LSIF are still pending.

We are definitely looking into the option of using tree-sitter-graphs. I'm currently weighing two options:

  1. Integrating 'tree-sitter-graph'. This has several pros like performance enhancement, not reinventing the wheel, and meeting industry standards. The problem is that I'm not sure about the complexity it will introduce to the code, especially handling a CLI integration with the tool.

  2. Creating my own implementation of a stack-graph in Python. Keeping everything in-house will simplify the architecture, make maintenance easier, and is a fun challenge 😃 . The downside is obvious: it will be complex and time-consuming to replicate stack-graph logic, and Python doesn't perform as well as Rust.

I think I'm going to explore the integration option, try to make it run, and if it's not too difficult, that's the route I'll go.

@berrazuriz1
Copy link
Contributor

I was checking LSP and seems to be a really power full tool. But it does a lot more of what I really need in this stage, because we are focused in give LLM a tool to navigate and understand code base repositories in a easy way. For that I think the best is to implement 'tree-sitter-graph' and later see in what could help LSP.

@0xdevalias
Copy link
Author

We are definitely looking into the option of using tree-sitter-graphs

I think I'm going to explore the integration option, try to make it run, and if it's not too difficult, that's the route I'll go.

@berrazuriz1 I would make sure to look at both tree-sitter-graph (a more generic underlying lib), but also the more specific stack-graphs project (which is built on top of tree-sitter-graph):

eg.


The problem is that I'm not sure about the complexity it will introduce to the code, especially handling a CLI integration with the tool.

@berrazuriz1 Yeah, that's fair enough. There are theoretically ways to call/embed rust within python as well I believe, but that may end up being even more complex than shelling out to the CLI

e.g. Some random related links:


I was checking LSP and seems to be a really power full tool. But it does a lot more of what I really need in this stage

@berrazuriz1 Yeah, I think LSP is likely not relevant at this stage. I really only included the links there as part of the "why we aren't using" context.

@berrazuriz1
Copy link
Contributor

I think running Rust on Python is a good solution. I'm going to give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants