Explore stack graphs / scope graphs #11

0xdevalias · 2024-04-04T00:34:07Z

Stack Graphs (an evolution of Scope Graphs) sound like they could be really interesting/useful with regards to code navigation, symbol mapping, etc. Perhaps we could use them for module identification, or variable/function identifier naming stabilisation or similar?

https://github.blog/changelog/2024-03-14-precise-code-navigation-for-typescript-projects/
- Precise code navigation is now available for all TypeScript repositories.
  Precise code navigation gives more accurate results by only considering the set of classes, functions, and imported definitions that are visible at a given point in your code.
  
  Precise code navigation is powered by the stack graphs framework.
  You can read about how we use stack graphs for code navigation and visit the stack graphs definition for TypeScript to learn more.
  - https://github.blog/2021-12-09-introducing-stack-graphs/
    - Introducing stack graphs
    - Precise code navigation is powered by stack graphs, a new open source framework we’ve created that lets you define the name binding rules for a programming language using a declarative, domain-specific language (DSL). With stack graphs, we can generate code navigation data for a repository without requiring any configuration from the repository owner, and without tapping into a build process or other CI job.
    - LOTS of interesting stuff in this post..
    - As part of developing stack graphs, we’ve added a new graph construction language to Tree-sitter, which lets you construct arbitrary graph structures (including but not limited to stack graphs) from parsed CSTs. You use stanzas to define the gadget of graph nodes and edges that should be created for each occurrence of a Tree-sitter query, and how the newly created nodes and edges should connect to graph content that you’ve already created elsewhere.
      - https://github.com/tree-sitter/tree-sitter-graph
        
        tree-sitter-graph
        The tree-sitter-graph library defines a DSL for constructing arbitrary graph structures from source code that has been parsed using tree-sitter.
        
        https://marketplace.visualstudio.com/items?itemName=tree-sitter.tree-sitter-graph
        
        tree-sitter-graph support for VS Code
        This language extension for VS Code provides syntax support for tree-sitter-graph files.
    - Why aren’t we using the Language Server Protocol (LSP) or Language Server Index Format (LSIF)?
      
      To dig even deeper and learn more, I encourage you to check out my Strange Loop talk and the stack-graphs crate: our open source Rust implementation of these ideas.
      - https://github.com/github/stack-graphs
        
        Stack graphs
        The crates in this repository provide a Rust implementation of stack graphs, which allow you to define the name resolution rules for an arbitrary programming language in a way that is efficient, incremental, and does not need to tap into existing build or program analysis tools.
        
        https://docs.rs/stack-graphs/latest/stack_graphs/
        
        https://github.com/github/stack-graphs/tree/main/languages
        
        This directory contains stack graphs definitions for specific languages.
        
        https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-javascript
        
        tree-sitter-stack-graphs definition for JavaScript
        This project defines tree-sitter-stack-graphs rules for JavaScript using the tree-sitter-javascript grammar.
        
        The command-line program for tree-sitter-stack-graphs-javascript lets you do stack graph based analysis and lookup from the command line.
        
        cargo install --features cli tree-sitter-stack-graphs-javascript
        
        tree-sitter-stack-graphs-javascript index SOURCE_DIR
        
        tree-sitter-stack-graphs-javascript status SOURCE_DIR
        
        tree-sitter-stack-graphs-javascript query definition SOURCE_PATH:LINE:COLUMN
        
        https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-typescript
        
        tree-sitter-stack-graphs definition for TypeScript
        This project defines tree-sitter-stack-graphs rules for TypeScript using the tree-sitter-typescript grammar.
        
        The command-line program for tree-sitter-stack-graphs-typescript lets you do stack graph based analysis and lookup from the command line.
      - https://dcreager.net/talks/2021-strange-loop/
        
        Redirects to https://dcreager.net/talks/stack-graphs/
        
        Incremental, zero-config Code Navigation using stack graphs.
        
        In this talk I’ll describe stack graphs, which use a graphical notation to define the name binding rules for a programming language. They work equally well for dynamic languages like Python and JavaScript, and for static languages like Go and Java. Our solution is fast — processing most commits within seconds of us receiving your push. It does not require setting up a CI job, or tapping into a project-specific build process. And it is open-source, building on the tree-sitter project’s existing ecosystem of language tools.
        
        https://www.youtube.com/watch?v=l2R1PTGcwrE
        
        "Incremental, zero-config Code Nav using stack graphs" by Douglas Creager
        
        https://media.dcreager.net/dcreager-strange-loop-2021-slides.pdf
        
        https://media.dcreager.net/dcreager-2022-ucsc-lsd-slides.pdf
https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github
- GitHub has developed two code navigation approaches based on the open source tree-sitter and stack-graphs library:
  - Search-based - searches all definitions and references across a repository to find entities with a given name
  - Precise - resolves definitions and references based on the set of classes, functions, and imported definitions at a given point in your code
  To learn more about these approaches, see "Precise and search-based navigation."
  - https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation
    - Precise and search-based navigation
      Certain languages supported by GitHub have access to precise code navigation, which uses an algorithm (based on the open source stack-graphs library) that resolves definitions and references based on the set of classes, functions, and imported definitions that are visible at any given point in your code. Other languages use search-based code navigation, which searches all definitions and references across a repository to find entities with a given name. Both strategies are effective at finding results and both make sure to avoid inappropriate results such as comments, but precise code navigation can give more accurate results, especially when a repository contains multiple methods or functions with the same name.
https://pl.ewi.tudelft.nl/research/projects/scope-graphs/
- Scope Graphs | A Theory of Name Resolution
- Scope graphs provide a new approach to defining the name binding rules of programming languages. A scope graph represents the name binding facts of a program using the basic concepts of declarations and reference associated with scopes that are connected by edges. Name resolution is defined by searching for paths from references to declarations in a scope graph. Scope graph diagrams provide an illuminating visual notation for explaining the bindings in programs.

Potentially Related

https://en.wikipedia.org/wiki/Code_property_graph
- A code property graph of a program is a graph representation of the program obtained by merging its abstract syntax trees (AST), control-flow graphs (CFG) and program dependence graphs (PDG) at statement and predicate nodes. The resulting graph is a property graph, which is the underlying graph model of graph databases such as Neo4j, JanusGraph and OrientDB where data is stored in the nodes and edges as key-value pairs. In effect, code property graphs can be stored in graph databases and queried using graph query languages.
- Joern CPG. The original code property graph was implemented for C/C++ in 2013 at University of Göttingen as part of the open-source code analysis tool Joern. This original version has been discontinued and superseded by the open-source Joern Project, which provides a formal code property graph specification applicable to multiple programming languages. The project provides code property graph generators for C/C++, Java, Java bytecode, Kotlin, Python, JavaScript, TypeScript, LLVM bitcode, and x86 binaries (via the Ghidra disassembler).
  - https://github.com/joernio/joern
    - Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs.
    - Joern is a platform for analyzing source code, bytecode, and binary executables. It generates code property graphs (CPGs), a graph representation of code for cross-language code analysis. Code property graphs are stored in a custom graph database. This allows code to be mined using search queries formulated in a Scala-based domain-specific query language. Joern is developed with the goal of providing a useful tool for vulnerability discovery and research in static program analysis.
    - https://joern.io/
    - https://cpg.joern.io/
      - Code Property Graph Specification 1.1
      - This is the specification of the Code Property Graph, a language-agnostic intermediate graph representation of code designed for code querying.
        
        The code property graph is a directed, edge-labeled, attributed multigraph. This specification provides the graph schema, that is, the types of nodes and edges and their properties, as well as constraints that specify which source and destination nodes are permitted for each edge type.
        
        The graph schema is structured into multiple layers, each of which provide node, property, and edge type definitions. A layer may depend on multiple other layers and make use of the types it provides.
https://docs.openrewrite.org/concepts-explanations/lossless-semantic-trees
- A Lossless Semantic Tree (LST) is a tree representation of code. Unlike the traditional Abstract Syntax Tree (AST), OpenRewrite's LST offers a unique set of characteristics that make it possible to perform accurate transformations and searches across a repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore stack graphs / scope graphs #11

Explore stack graphs / scope graphs #11

0xdevalias commented Apr 4, 2024 •

edited

0xdevalias commented May 30, 2024 •

edited

Explore stack graphs / scope graphs #11

Explore stack graphs / scope graphs #11

Comments

0xdevalias commented Apr 4, 2024 • edited

Potentially Related

See Also

0xdevalias commented May 30, 2024 • edited

0xdevalias commented Apr 4, 2024 •

edited

0xdevalias commented May 30, 2024 •

edited