Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntaxes: add fileTypes for TextMate highlighters such as Syntect and Sourcegraph #12

Merged
merged 2 commits into from
Aug 3, 2021

Conversation

slimsag
Copy link
Contributor

@slimsag slimsag commented Jul 30, 2021

I'm working on adding syntax highlighting for Cue to Sourcegraph - and almost had it working today but found out that we're missing the fileTypes declaration here (which is what we use in syntect_server to detect file types.)

Once this is merged, I should be able to get syntax highlighting of Cue files on Sourcegraph.com working sometime next week (and in our next release.) :)

Note that this is just following the TextMate grammar / JSON schema format, nothing here is unique to Sourcegraph:

            "fileTypes": {
              "description": "this is an array of file type extensions that the grammar should (by default) be used with. This is referenced when TextMate does not know what grammar to use for a file the user opens. If however the user selects a grammar from the language pop-up in the status bar, TextMate will remember that choice.",
              "type": "array",
              "items": { "type": "string" }
            },

If there are other common extensions than just .cue that should be considered Cue language files, we should include them here as well.

… Sourcegraph

I'm working on adding syntax highlighting for Cue to Sourcegraph - and almost had it working today but found out that we're missing the `fileTypes` declaration here (which is what we use in [syntect_server](github.com/sourcegraph/syntect_server) to detect file types.)

Once this is merged, I should be able to get syntax highlighting of Cue files on Sourcegraph.com working sometime next week (and in our next release.) :) 

Note that this is just following the TextMate grammar / JSON schema format, nothing here is unique to Sourcegraph:

```
            "fileTypes": {
              "description": "this is an array of file type extensions that the grammar should (by default) be used with. This is referenced when TextMate does not know what grammar to use for a file the user opens. If however the user selects a grammar from the language pop-up in the status bar, TextMate will remember that choice.",
              "type": "array",
              "items": { "type": "string" }
            },
```

If there are other common extensions than just `.cue` that should be considered Cue language files, we should include them here as well.
@myitcv
Copy link
Member

myitcv commented Jul 30, 2021

@slimsag thanks for raising this, and thanks for helping to get CUE syntax highlighting added to Sourcegraph!

I've just triggered the workflows for your PR; as you can see the change needs to be made within some Go code, which then generates the TextMate grammar.

Sourcegraph presumably consumes lots of these sorts of grammars. How do you validate them? Is there a definitive TextMate grammar schema? I was only able to find https://github.com/martinring/tmlanguage/blob/master/tmlanguage.json (and just raised martinring/tmlanguage#11 to see whether that project would consider raising the profile of the schema). Do you have a definitive schema for a TextMate grammar that you use at Sourcegraph?

On that point I'm thinking that in a follow up PR we should define the CUE TextMate grammar as CUE, and vet against a known schema for such grammars (point above).

@myitcv
Copy link
Member

myitcv commented Jul 30, 2021

On that point I'm thinking that in a follow up PR we should define the CUE TextMate grammar as CUE, and vet against a known schema for such grammars (point above).

I've raised #14 as a follow up. Might pull together a PR as a proof of concept.

@slimsag
Copy link
Contributor Author

slimsag commented Jul 30, 2021

Thanks! I'll push the Go change soon.

We do consume a lot of them - around 150 right now. https://github.com/slimsag/Packages

Syntax highlighting grammars come in a dizzying number of formats:

  1. tmLanguage format - the older, less feature-full TextMate format with common support among editors:
    • ... in JSON format
    • ... in plist format
    • ... in yaml format
  2. sublime-syntax YAML format, version 1 - a superset of the TextMate tmLanguage support by Sublime
  3. sublime-syntax YAML format, version 2 - used in Sublime Text 4 - many grammars aren't updated to it yet.

At Sourcegraph we only care about the first two, currently - as that covers the vast majority currently. We use the Rust syntect library to consume them - it requires they be in a .sublime-syntax YAML format - but luckily for us Sublime Text's PackageDev package can convert between all of these formats and (surprise!) it also does a little bit of validation for us. However, it's a bit annoying because it cannot be invoked from a command line :)

There are also a dizzying number of other formats, be it for pygments, JS highlighter libraries, VS code itself - and others. We don't consume those currently.

Worse, though - many older / less modern languages only have a tmLanguage definition that is maintained through a series of hard-to-track forks on GitHub or elsewhere, in which random people have forked/improved them over the last 15-20 years and.. they are now used in everyone's editors :) many of them are not necessarily completely valid, and would fail linting if such a tool existed - so a lot of the syntax highlighting libraries have to do a bit of best-effort work.

Honestly, syntax highlighting in 2021 is a big mess - it's a "try it and see if it works" kind of thing :) But all attempts to fix it lead to just more formats, and less language support than the combined sublime-syntax+tmLanguage definitions - so it really is the best by the metric of language covered..

I could ramble about this for ages, but I'll stop here - you get the picture haha.

@myitcv
Copy link
Member

myitcv commented Jul 30, 2021

but luckily for us Sublime Text's PackageDev package can convert between all of these formats and (surprise!) it also does a little bit of validation for us. However, it's a bit annoying because it cannot be invoked from a command line :)

Ok, great. The reason for my asking is that CUE could very well help with the validation here, whether validating against a JSON schema specification for a grammar or a CUE specification (it converts between the two representations, and more). And you get the command line verification for free :)

so a lot of the syntax highlighting libraries have to do a bit of best-effort work.

This, however, might be the reality you're working with!

But all attempts to fix it lead to just more formats, and less language support than the combined sublime-syntax+tmLanguage definitions - so it really is the best by the metric of language covered..

I wonder whether AST-based highlighters end up being the way forward here, in the same vein that tree-sitter and LSP's semantic tokens are AST-based?

I could ramble about this for ages, but I'll stop here - you get the picture haha.

A great exchange, thank you!

Thanks! I'll push the Go change soon.

Much appreciated, thank you!

Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@slimsag
Copy link
Contributor Author

slimsag commented Aug 2, 2021

@myitcv just updated :) let me know how that looks

@myitcv
Copy link
Member

myitcv commented Aug 3, 2021

Thanks, @slimsag!

@slimsag slimsag deleted the patch-1 branch August 6, 2021 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants