Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor grammar in order to improve readability and address a bug in syntax highlighting of Atom and GitHub #240

Merged
merged 11 commits into from Jun 12, 2020

Conversation

nickborromeo
Copy link
Contributor

@nickborromeo nickborromeo commented Jun 3, 2020

Context

It was brought to the attention of GitHub in issue#144037 that a customer was seeing that syntax highlighting did not work for a specific language.

image

Here we see that the __ is overriding the syntax of the rest of the file by making everything after this bold. At first we thought this was a problem in the objective c language however, after more testing in both Atom and GitHub we see that is actually a problem with the markdown syntax highlighting.

image

The screenshot shows us that if we have an unclosed delimiter for bold or italic it will greedily find the closing delimiter and if it does not, it will scope the tokens that it see as bold or italic thus overriding the syntax highlighting.

Description of the Change

Prior to this change, the grammar file was a CSON file that had all the patterns to match in one list; making it very difficult to parse through and identify where the problem was.

During my investigation, I stumbled on two things

  • VSCode grammar, which also uses TextMate syntax does not have this issue
  • Another atom package for markdown, language-markdown, also did not have this issue.

They both used the grouped collection of patterns that I am introducing in the change. So my first change was to do just a cosmetic refactor. I separated the patterns into groups influenced by the groups that were used in the two packages/extensions that I mentioned above.

After doing that and making sure the tests still pass (unchanged) I then made changes to fix the behavior that was reported. In order to fix that behavior I introduced/replaced the patterns for bold and italic. These were mostly patterned off the language-markdown package since it was already supported in Atom πŸŽ‰

Testing

I was able to test both in Atom and GitHub

Atom

With the new grammar, I overrode the definition of the package and reloaded the editor.

2020-06-02_17-28-52

GitHub

GitHub uses linguist to highlight the files that you see online. Linguist uses the language-gfm package for markdown parsing and syntax highlighting. There is an an app (Lighshow) that one can use to paste in the grammar for testing and it will highlight the block of text that you pass in.

We see the comparison of using the old grammar and new grammar

2020-06-02_17-34-17

2020-06-02_17-33-47

Markdown and GitHub Flavored Markdown

I wanted to make sure that we still correctly highlight basic markdown. So I made a quick test using Lightshow and used the examples found in Mastering Markdown

Body of the text I used

This is an

tag

This is an

tag

This is an
tag

This text will be italic
This will also be italic

This text will be bold
This will also be bold

This will be bold italic
This is also bold italic

β„’ β„’ Β³

β„’ β„’ Β³

isitalic
not_italic_

bold
and bold

You can combine them

  • Item 1
  • Item 2
    • Item 2a
    • Item 2b
  1. Item 1
  2. Item 2
  3. Item 3
    1. Item 3a
    2. Item 3b

GitHub Logo
Format: Alt Text

http://github.com - automatic!
GitHub

As Kanye West said:

We're living the future so
the present is our past.

I think you should use an
<addr> element here instead.

function fancyAlert(arg) {
  if(arg) {
    $.facebox({div:'#foo'})
  }
}
function fancyAlert(arg) {
  if(arg) {
    $.facebox({div:'#foo'})
  }
}

def foo():
if not bar:
return True

  • @mentions, #refs, links, formatting, and tags supported
  • list syntax required (any unordered or ordered list supported)
  • this is a complete item
  • this is an incomplete item
First Header Second Header
Content from cell 1 Content from cell 2
Content in the first column Content in the second column

16c999e8c71134401a78d4d46435517b2271d6ac
mojombo@16c999e8c71134401a78d4d46435517b2271d6ac
mojombo/github-flavored-markdown@16c999e

#1
mojombo#1
mojombo/github-flavored-markdown#1

Mention @nickborromeo

http://www.github.com/

thisthat

This a screen recording of the comparison, on the left is the new grammar and on the right is GitHub in production right now.

2020-06-02_17-47-36 (1)

Alternate Designs

An alternative is to keep the CSON file and just stick in the new rules/patterns to fix the behavior. This would be the quicker option however, will make the file even less readable in my opinion.

Benefits

The grammar is now easier to follow. In editors that support folding, it is quiet easy now to navigate through the blocks since they are grouped and have an indicator of what the patterns in the group are for.

Possible Drawbacks

The tests that are not very exhaustive. I am not 100 percent sure that we have feature parity, at least for the ones that count. However, I think this out weighs the current behavior we have and we can always iterate on as issue come up (if any).

Applicable Issues

Acknowledgments

I would like to thank @maxbrunsfeld who paired with me while investigating this and help me with questions I had to better understand the processes and services used. πŸ™‡

nickborromeo added 11 commits Jun 2, 2020
One of the problems at GitHub that we saw when rending a markdown file
with some code blocks is that when there was an unclosed block, it will
highlight the rest of the remaining text as bold, overriding any
highlights that should take effect.

This test change is based on a refactoring that was done to group
together the patterns and also change the regex for bold to match the
one used in `burodepeper/language-markdown` which is also an atom
package for syntax highlighting
…reedy

One of the problems at GitHub that we saw when rending a markdown file
with some code blocks is that when there was an unclosed block, it will
highlight the rest of the remaining text as bold, overriding any
highlights that should take effect.

This test change is based on a refactoring that was done to group
together the patterns and also change the regex for bold to match the
one used in `burodepeper/language-markdown` which is also an atom
package for syntax highlighting
…greedy

One of the problems at GitHub that we saw when rending a markdown file
with some code blocks is that when there was an unclosed block, it will
highlight the rest of the remaining text as italic, overriding any
highlights that should take effect.

This test change is based on a refactoring that was done to group
together the patterns and also change the regex for bold to match the
one used in `burodepeper/language-markdown` which is also an atom
package for syntax highlighting
…greedy

One of the problems at GitHub that we saw when rending a markdown file
with some code blocks is that when there was an unclosed block, it will
highlight the rest of the remaining text as italic, overriding any
highlights that should take effect.

This test change is based on a refactoring that was done to group
together the patterns and also change the regex for bold to match the
one used in `burodepeper/language-markdown` which is also an atom
package for syntax highlighting

This test change also has a little conflicting test because `is*italic*`
stylizes the italic word while `not_italic_` does not. I think this is
expected behavior given the other uses of `_`. This is also ready the
behavior in GitHub production right now.

![prod-format](https://user-images.githubusercontent.com/1518902/83576959-d4a1de00-a4e7-11ea-802f-0ff7064a9309.png)
The refactor to group together patterns has shown that it will glob the
text after the `>` indicator as one token. The test now reflects that
change
The refactor of the grammar now expects a punctuation scope for an
italic marker match. This just adds that to the expectation of the test
The refactor of the grammar now expects a punctuation scope for a
bold marker match. This just adds that to the expectation of the test
This adds additional context to the text that is parsed that matches
bold italics in HTML. This comes after the refactor of the grammar into
categories
We are removing this file because we will be replacing it with a JSON
file that is refactored into groups. This will allow for better reading
of the the file and additions if needed.

This refactor is not purely cosmetic, the refactor addresses bug that
was raised by a GitHub customer about syntax highlighting not working
for a specific language
This change groups patterns into categories. It then composes these into
patterns that can be mixed and matched. This is not purely as asthetic
refactor, this addresses a limitation that was found by a GitHub
customer. They reported that if they had a `__` (double underscore),
which is a valid character in objective c,inside a code block, it will
override the rest of the file and highlight it as bold.

This change carries over some regular expressions for bold characters
from the `burodepeper/language-markdown` grammar which is also an atom
package. This new regular expression is not as _greedy_ as the previous
one so it does not override the rest of the file.

Please see the description of this PR for a more detailed explantion of
the process.
@maxbrunsfeld
Copy link
Contributor

@maxbrunsfeld maxbrunsfeld commented Jun 3, 2020

It looks like you've tested this very thoroughly, and this fixes a really bad bug. I think at this point we should 🚒 .

@lkashef lkashef self-assigned this Jun 10, 2020
@lkashef lkashef merged commit 26a92a1 into atom:master Jun 12, 2020
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants