Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Add Tree-sitter grammar injections #17551

Merged
merged 53 commits into from Jul 16, 2018
Merged

Add Tree-sitter grammar injections #17551

merged 53 commits into from Jul 16, 2018

Conversation

maxbrunsfeld
Copy link
Contributor

@maxbrunsfeld maxbrunsfeld commented Jun 22, 2018

Motivation

The TextMate parsing system allows grammars to be composed in order to syntax-highlight things like:

  • JavaScript inside of script tags in an HTML file
  • HTML inside of template strings in JavaScript
  • SQL inside of a string literal in Python

Currently, these types of things don't work when Tree-sitter is enabled.

Solution

This PR adds the concept of 'grammar injection' to Tree-sitter grammars. Specifically, it adds two new APIs associated with Tree-sitter grammars. These APIs might be revised before this PR is merged.

1. Adding Injection Points

atom.grammars.addInjectionPoint(
  languageId,
  {
    type: syntaxNodeType,
    language: languageCallback,
    content: contentCallback
  }
)

This method allows you to express ideas like: "In JavaScript, tagged template strings are injection points. For each tagged template string, try to identify its language by looking at the name of its tag function. Parse the content between the backticks, omitting any template substitutions."

atom.grammars.addInjectionPoint(
 'javascript',
  {
    // tagged template literals are simply parsed as call expressions 
    // with a template string instead of an argument list
    type: 'call_expression', 
  
    // The language name can be found in the template string's "tag"
    language (callNode) {
      if (callNode.lastChild.type === 'template_string') {
        return callNode.firstChild.text
      }
    },

    // Parse the content inside of the template string
    content (callNode) {
      return callNode.lastChild
    }
  }
)

Note that this API does not indicate which grammar to use when parsing the content. That information can be provided by other packages, using the second API:

2. Specifying Injection Patterns

Grammars that use Tree-sitter have a new field called injectionRegExp. This field allows you to express ideas like: "The HTML language can be injected. Whenever there is an injection point where the language-name includes the string 'html', parse that injection point's content using the HTML parser."

id: 'html'

injectionRegExp: 'html|HTML|Html$'

This two-part API allows languages to be embedded within one another without every grammar having to know about every other grammar.

Example

EJS is a popular JavaScript templating system where JavaScript code is interspersed with HTML markup using the delimiters <% and %>. The HTML can of course contain more JavaScript code inside of script tags. And that JavaScript code can contain HTML inside of string literals.

html-in-js-in-html-in-ejs

Tasks

  • Add tests and documentation for the new GrammarRegistry APIs
  • When the buffer changes, only update the affected injections, not all of them
  • Update highlighting when grammars with injectionRegExps are added
  • Allow injections that are spread across many child nodes (needed for PHP, ERB, EJS, mustache, etc)
  • Use injections for the expand selection command
  • Use injections for folding specific lines
  • Fix syntax highlighting bugs
  • Use injections for getting scope descriptors
  • Use injections for folding at a given nesting level

Related Issues / PRs

Closes #17392
Depends on tree-sitter/tree-sitter#181
Depends on tree-sitter/node-tree-sitter#14
Depends on tree-sitter/node-tree-sitter#18

@maxbrunsfeld maxbrunsfeld changed the title Start work on Tree-sitter grammar injections Add Tree-sitter grammar injections Jun 22, 2018
@maxbrunsfeld maxbrunsfeld force-pushed the tree-sitter-injections branch 2 times, most recently from 3542f3c to b944e24 Compare June 22, 2018 23:02
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
maxbrunsfeld and others added 2 commits June 26, 2018 13:46
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
Copy link
Contributor

@thomasjo thomasjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid style comment; any specific reason for prefixing some of these functions with an underscore? I'm assuming they are "private", but I don't think we do this anywhere else in the core code? This wasn't meant to be a review, just a line comment...

}
}

async _performUpdate (containingNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid style comment; any specific reason for prefixing some of these functions with an underscore? I'm assuming they are "private", but I don't think we do this anywhere else in the core code?

maxbrunsfeld and others added 3 commits June 27, 2018 11:14
Also, replace `addInjectionPattern` API with a single `injectionRegExp` 
field on the grammar.

Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
@maxbrunsfeld maxbrunsfeld force-pushed the tree-sitter-injections branch 2 times, most recently from 1f212f8 to d6bb7b2 Compare June 28, 2018 23:47
Co-Authored-By: Ashi Krishnan <queerviolet@github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tree transforms for tree sitter
4 participants