Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tree-sitter json grammar #68

Merged
merged 4 commits into from May 10, 2019

Conversation

Projects
None yet
8 participants
@Ben3eeE
Copy link
Member

commented Nov 21, 2018

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • All new code requires tests to ensure against regressions

Description of the Change

Depends on tree-sitter/tree-sitter-json#6

  • Test bundled themes
  • Test community syntax themes

Benefits

Enhanced code folding and syntax highlighting.

Possible Drawbacks

  • Less links are styled as links because the regex is less advanced than textmates language-hyperlink regex

Applicable Issues

No

@simurai simurai referenced this pull request Nov 30, 2018

Closed

Use Tree-sitter for JSON #18515

4 of 4 tasks complete
@simurai

This comment has been minimized.

Copy link
Member

commented Nov 30, 2018

@Ben3eeE Made a PR for "Fix key/value coloring in one-syntax" here: atom/atom#18515

@simurai

This comment has been minimized.

Copy link
Member

commented Nov 30, 2018

Ok, all other bundled themes are fixed too.

Also tested a bunch of community themes, buuuuut most of them have regressions. 😩 How did we solve this with other languages?

  1. Make PRs to the most popular themes.
  2. Add the styling for most popular themes to this (language-json) package. This is more a fallback for abandoned themes.
  3. Try to add more scopes to match the styling of most themes. This feels pretty ugly and it turns into a mix of both grammars. Might not even be possible because some themes use nested selectors. E.g. .syntax--meta.syntax--structure.syntax--dictionary.syntax--json > .syntax--string.syntax--quoted.syntax--double.syntax--json which the language package would have to match.
  4. Do nothing. It's the theme author's job to keep maintaining the theme. Although there are probably a lot of themes that are abandoned and not maintained anymore.

Hmm.. should we try #1 and see how quickly themes get updated and then use #2 as fallback option.

@Ben3eeE

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2018

I think #1 and #2 sounds best. Maybe we can open issues on all the other themes that we don't submit a PR to. But this sounds like a lot of work.

For other languages we didn't have this problem afaik and things more or less worked because we could use similar enough scopes.

Scope object as meta.structure.dictionary.json
And add json scope to key

@Ben3eeE Ben3eeE force-pushed the b3-tree-sitter-parser branch from 34cfb4e to ebaf11e Nov 30, 2018

@Ben3eeE

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2018

@simurai Alternative solution: Scope object as meta.structure.dictionary.json and add json to the key. It gives the same structure as textmate and works 😅

It does apply the meta.structure.dictionary.json scope multiple times for nested objects but I think that should be fine.

'pair > string': [
{
match: '^"https?://',
scopes: 'markup.underline.link'

This comment has been minimized.

Copy link
@50Wliu

50Wliu Dec 2, 2018

Member

Is this something that can be handled by an injection language instead? Does tree-sitter support that?

This comment has been minimized.

Copy link
@Aerijo

Aerijo Dec 2, 2018

Member

TS can inject TS. Lang JS does it for reflex and JSDoc.

This comment has been minimized.

Copy link
@Ben3eeE
@simurai

This comment has been minimized.

Copy link
Member

commented Dec 3, 2018

Alternative solution

Ohhh.. awesome. Tested and yes, works without making changes to themes. 🙌 If this still makes sense, we should definitely take this route. 👍

I'll close atom/atom#18515 again to not keep "stale" PRs around.

@rsese rsese referenced this pull request Feb 21, 2019

Closed

No highlighting on JSON file if it contains long line #70

1 of 1 task complete
@caleb531

This comment has been minimized.

Copy link

commented Mar 31, 2019

Hi,

What is the status of this PR? I would love to see JSON files using tree-sitter in core Atom, since the JSON files I work with are often very large. Thanks in advance!

@rsese

This comment has been minimized.

Copy link
Member

commented Apr 4, 2019

Hey @caleb531 👋 The PR is on the team's list to review but there's no specific timeframe for it with the work on their plate right now. I can mention that you checked in on it though.

If you'd like, you can see what's being worked on in the biweekly iteration plans.

@Ben3eeE

This comment has been minimized.

Copy link
Member Author

commented Apr 13, 2019

I'll merge this in a few days based on atom/language-python#288 (comment)

Unless someone objects.

@caleb531

This comment has been minimized.

Copy link

commented Apr 13, 2019

@Ben3eeE It looks like array strings are completely un-tokenized. Could you please update the grammar to give them color? Numbers and booleans look fine.

Strings in an object:
Screen Shot 2019-04-13 at 10 16 23 AM

Strings in an array:
Screen Shot 2019-04-13 at 10 16 35 AM

Syntax tree for array strings (for quick reference):
Screen Shot 2019-04-13 at 10 17 06 AM

Update

Looks like you can just modify line 53 from:

'pair > string': [

To:

'string': [

I don't see any adverse affects to other JSON structures (like the object form) after making this change.

@Ben3eeE

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2019

It looks like array strings are completely un-tokenized. Could you please update the grammar to give them color? Numbers and booleans look fine.

Thanks for pointing this out, you are correct 🙂

@caleb531

This comment has been minimized.

Copy link

commented Apr 16, 2019

@Ben3eeE Is it possible to highlight URL strings as string.quoted.double in addition to markup.underline.link? I ask this because if the theme does not have styling for markup.underline.link, then those URL strings currently show up as white (i.e. untokenized) rather than as strings.

For example, in my recent PR to the Monokai theme (kevinsawicki/monokai#105), I had to explicitly add styling for markup.underline.link because those matched strings would otherwise not be tokenized as anything.

@Ben3eeE

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2019

@caleb531 Thank you so much for testing this out 🙇 I intend to take a look at the PR you opened on the syntax theme to see if we can fix it here before merging.

If you want to check out more syntax themes that would be great 🙂 I'll take another look to see if there are any regressions.

@caleb531

This comment has been minimized.

Copy link

commented May 1, 2019

@Ben3eeE I've tested several of the featured, most popular, and default syntax themes. Please see the screenshots below, followed by my thoughts on where to proceed from here (TL;DR—I propose we remove markup.underline.link tokenization entirely and just highlight as strings):

one-dark:
one-dark

one-light:
one-light

atom-dark:
atom-dark

base16-tomorrow-dark:
base16-tomorrow-dark

atom-material-syntax:
atom-material-syntax

seti-syntax:
seti-syntax

Considering the above, I ultimately recommend we remove tokenization for links entirely in this tree-sitter JSON grammar, for several reasons:

  1. As shown above, the most popular themes are missing styles for markup.underline.link. I know this could be fixed with PRs, but by removing the tokenization and simply tokenizing as strings, we eliminate the need to do this, and we can get this grammar merged sooner (provided you make the changes to the key structure, as discussed above)
  2. The quotes are highlighted as markup.underline.link, which is semantically incorrect—only the URL between the quotes should be tokenized as a link, though I recognize that the tree-sitter-json syntax tree doesn't assign anything to the text between the quotes
  3. The JSON format doesn't give any special meaning to URLs, so they shouldn't be colored differently than other string values
  4. The first-mate JSON grammar doesn't tokenize links (the core language-hyperlink package handles any link highlighting you might see for first-mate JSON files); I would prefer to wait for the tree-sitter injection system, which will need to happen anyway for tree-sitter to eventually replace first-mate

Please let me know your thoughts on this—questions, comments or concerns are welcome.

@Arcanemagus

This comment has been minimized.

Copy link

commented May 1, 2019

As shown above, the most popular themes are missing styles for markup.underline.link.

Many themes are missing many tokens, what about this is causing a problem that would cause that tokenization to be removed?

The quotes are highlighted as markup.underline.link, which is semantically incorrect—only the URL between the quotes should be tokenized as a link

👍

The JSON format doesn't give any special meaning to URLs, so they shouldn't be colored differently than other string values

Now this is a valid argument for removing URL highlighting as you propose in 1.. However we should probably still tokenize them as...

The first-mate JSON grammar doesn't tokenize links (the core language-hyperlink package handles any link highlighting you might see for first-mate JSON files); I would prefer to wait for the tree-sitter injection system, which will need to happen anyway for tree-sitter to eventually replace first-mate

This was added in atom/atom#17551, supported since Atom v1.30.0 hit stable. As far as I understand it if no other grammar is in place to handle the injection it acts as if the injection wasn't there (meaning the links would show as regular strings), but once a grammar is installed that understands that injection point it will start working automatically.

@caleb531

This comment has been minimized.

Copy link

commented May 3, 2019

@Arcanemagus

Many themes are missing many tokens, what about this is causing a problem that would cause that tokenization to be removed?

That's fair—we may still end up submitting PRs to these themes anyway to resolve those other tokenization issues—so removing the URL tokenization wouldn't necessarily reduce the effort.

This was added in atom/atom#17551, supported since Atom v1.30.0 hit stable. As far as I understand it if no other grammar is in place to handle the injection it acts as if the injection wasn't there (meaning the links would show as regular strings), but once a grammar is installed that understands that injection point it will start working automatically.

Do you think if we implement a proper tree-sitter injection (via atom.grammars.addInjectionPoint), we will still be able to tokenize the URL in JSON minus the quotes? Or will that ultimately require modifying tree-sitter-json to add a syntax node for the text between the quotes of a string node?

I suppose to implement the injection, we would also need to add a tree-sitter grammar to language-hyperlink? (because, as I recall, a first-mate grammar can't be injected into a tree-sitter grammar)

@Arcanemagus

This comment has been minimized.

Copy link

commented May 3, 2019

Or will that ultimately require modifying tree-sitter-json to add a syntax node for the text between the quotes of a string node?

I think that the injection happens for an entire node, so we would need to recognize the URL area here first from how I understand it, but I don't have personal experience with how it works exactly.

I suppose to implement the injection, we would also need to add a tree-sitter grammar to language-hyperlink? (because, as I recall, a first-mate grammar can't be injected into a tree-sitter grammar)

Correct, that's a known limitation (feature? 😆) of this system that it has no way of embedding a TextMate grammar.

@Ben3eeE

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

The quotes are highlighted as markup.underline.link, which is semantically incorrect—only the URL between the quotes should be tokenized as a link, though I recognize that the tree-sitter-json syntax tree doesn't assign anything to the text between the quotes

Yeah they should be tokenized differently. tree-sitter does recognize them as an anonymous " node but it seems like we can't scope them.

I tried scoping:
'"\""': 'punctuation.blabla'

Which seems like it should work but it doesn't. The grammar fails to load with an unclosed quote error.

This is probably because of a bug somewhere in https://github.com/atom/atom/blob/bc7ec2d3a83b2616713ba7e929e63552b0809c84/src/syntax-scope-map.js

If we can fix this bug and scope anonymous " nodes they would be tokenized differently.

We have a similar issue in language-c:
https://github.com/atom/language-c/blob/a355a4d4a019d5684c40072b81f24c89e58dc56f/grammars/tree-sitter-c.cson#L142

@nathansobo nathansobo self-assigned this May 10, 2019

@nathansobo

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

After reviewing the code and the discussion, it seems to me that this is ready.

@nathansobo nathansobo merged commit a5a95ae into master May 10, 2019

1 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nathansobo nathansobo deleted the b3-tree-sitter-parser branch May 10, 2019

@nathansobo

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

Thanks @Ben3eeE and sorry for the wait. You are awesome.

@nathansobo

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

Ugh. Assumed the AppVeyor failure was spurious but was not. Something changed. @50Wliu does this failure look familiar at all based on apm stuff you've dealt within the past?

@nathansobo

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

@50Wliu nevermind the previous comment. I solved the issue in #72. 😓

@Ben3eeE

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

@nathansobo Thank you 🙇 I have been using this linked locally since I opened it because I have been working with large json files lately. The performance benefit with this is huge 👍.

@50Wliu

This comment has been minimized.

Copy link
Member

commented May 14, 2019

It doesn't seem like the issue with URL strings was ever resolved? Here's what I'm seeing, which is pretty jarring:
json-url-strings

@nathansobo

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

I looked at our treatment of URLs in the other grammar and it looked similar. What am I missing?

@50Wliu

This comment has been minimized.

Copy link
Member

commented May 14, 2019

Here's how Textmate grammars such as language-coffee-script handle links (thanks to language-hyperlink):
textmate-links
And the corresponding scopes:
textmate-link-scopes

Whereas this grammar has the following scopes:
tree-sitter-json-link-scopes

Notably, string.quoted.double is missing from the scope structure, so I only get the underline, not the foreground color that I've assigned to strings.

Additionally, as mentioned above, the quotes themselves should not be part of the link.

@Ben3eeE

This comment has been minimized.

Copy link
Member Author

commented May 14, 2019

So we could add string.quoted.double to https://github.com/atom/language-json/pull/68/files#diff-fce1e3e82e57cd20261fef4754b35797R56 to add the scope and match textmate?

Quotes being included is an issue with every grammar because we can't scope them and tree-sitter recognizes quotes as part of the string.

@50Wliu

This comment has been minimized.

Copy link
Member

commented May 14, 2019

@Ben3eeE pointed out that my previous comment could be really confusing, so here's a better attempt.
Regular JSON strings (tree-sitter):
regular-json-strings-tree-sitter

URL JSON strings (tree-sitter):
url-json-strings-tree-sitter

Regular JSON strings (textmate):
regular-json-strings-textmate

URL JSON strings (textmate):
url-json-strings-textmate

So the key difference is that there's no string.quoted.double scope for the URL JSON tree-sitter case.

@nathansobo

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

Talked to @Ben3eeE and he plans to address this issue tomorrow. atom/atom#19336 should give him the tools he needs to style the quote literals explicitly, and it seems like adding string.quoted.double to the scope in the URL-specific match rule should also help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.