Improve Markdown code block tokens #17591

Merged
merged 6 commits into from Jan 3, 2017

Projects

None yet

4 participants

@joshpeng
Contributor

Code blocks without a language weren't tokenized. Code blocks didn't have their ending ``` punctuation tokenized. Both fixed.

Code blocks used to only have one token. Now each block has the following tokens available for syntax highlighters:

  • Starting and ending ``` punctuations
  • Code block's language setting
  • Code snippet
@joshpeng joshpeng Improve Markdown code block tokens
Code blocks without a language weren't tokenized.
Code blocks didn't have their ending ``` punctuation tokenized.
Code blocks used to only have one token. Now each block has the following tokens available for syntax highlighters:
- Starting and ending ``` punctuations
- Code block's language setting
- Code snippet
c5bca42
@msftclas

Hi @joshpeng, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla.microsoft.com.

TTYL, MSBOT;

@mjbvz mjbvz was assigned by bpasero Dec 20, 2016
joshpeng added some commits Dec 20, 2016
@joshpeng joshpeng Variable whitespace for MD code block ``` token
Allow for variable amount of whitespacing before ``` code blocks
4ceebae
@joshpeng joshpeng Reorder raw blocks
Raw blocks were preventing tokenizing as languaged blocks. Putting them on bottom resolves this.
344c3c8
@msftgits

Hi, I am closing and re-opening this PR to bump the CLA bot. Sorry for the inconvenience!

@msftgits msftgits closed this Dec 20, 2016
@msftgits msftgits reopened this Dec 20, 2016
@msftclas

Hi @joshpeng, I'm your friendly neighborhood Microsoft Pull Request Bot (You can call me MSBOT). Thanks for your contribution!
You've already signed the contribution license agreement. Thanks!
We will now validate the agreement and then real humans will evaluate your PR.

TTYL, MSBOT;

@msftclas msftclas added the cla-signed label Dec 20, 2016
@msftgits msftgits removed the cla-required label Dec 20, 2016
@joshpeng joshpeng Fix MD block detection when following paragraph
Used to require a new line inbetween ``` code blocks and preceding paragraph text.
2043f9d
@mjbvz

I like the overall idea, but this introduces a number of important regressions that need to be fixed before we can merge this in. Please take a look at the comments and let me know if you have any questions.

- <key>while</key>
- <string>(^|\G)(?!\s*\2\3*\s*$)</string>
+ <key>end</key>
+ <string>(^|\G)\s*([`~]{3,})\n</string>
@mjbvz
mjbvz Dec 20, 2016 Contributor

We have to use a while clause here. This prevents broken language grammars from leaking outside of the fenced block. Switching to while from end fixed a large number of syntax highlighting issues.

@joshpeng
joshpeng Dec 21, 2016 edited Contributor

Hmm. That is why the closing ``` wasn't captured though. I'll continue thinking about how to achieve both our goals.

Edit: I think I might have a solution by nesting patterns. Testing on my end.

@@ -566,11 +582,32 @@
<key>fenced_code_block_basic</key>
<dict>
<key>begin</key>
- <string>(^|\G)\s*(([`~]){3,})\s*(html|htm|shtml|xhtml|inc|tmpl|tpl)(\s+.*)?$</string>
+ <string>(^|\G)\s*([`~]{3,})\s*(html|htm|shtml|xhtml|inc|tmpl|tpl)\n</string>
@mjbvz
mjbvz Dec 20, 2016 edited Contributor

Keep the (\s+.*)?$bit. We allow arbitrary text on the rest of the line after the language identifier to support passing other attributes (like line numbers specifiers) that some markdown engines support.

- <key>while</key>
- <string>(^|\G)(?!\s*\2\3*\s*$)</string>
+ <key>end</key>
+ <string>(^|\G)\s*([`~]{3,})\n</string>
@mjbvz
mjbvz Dec 20, 2016 Contributor

This should also reverted to how it was before so that we consume any number of spaces after the fence end and the end of line.

@mjbvz
mjbvz Dec 20, 2016 Contributor

Also, the closing fenced code block should match the fence type originally used. That's why we used the back references instead of [`~]

@joshpeng joshpeng Prevent broken language grammar leaks in MD fences
Prevents leaks in MD code fences while also capturing the closing fence punctuations.
5bab6c8
@joshpeng
Contributor
joshpeng commented Dec 21, 2016 edited

Thank you @mjbvz for shedding light on many scenarios I was unaware of. Latest commit addresses previous code review's issues in the following manner:

  1. Originally, using while instead of end prevented broken language grammar from leaking, but it also prevented capturing the closing fence punctuation. Now I am nesting the while-captures inside a primary pattern to achieve capturing of the closing fence while still preventing grammar leaks.

  2. (\s+.*)?$ at the end of begin-captures added back and tokenized as fenced_code.block.language.attributes. This is for any arbitrary text after the language identifier

  3. End-captures reverted to once again consume all whitespace after the closing fence as well as matching the opening fence's punctuation.

  4. End-captures closing fence's prefix whitespace handling improved to cover some scenarios where it should be detected as a raw block starter instead of fenced block closer.

@joshpeng
Contributor

@mjbvz Hope you had a great Christmas. Was wondering if you had a chance to see these changes? Thanks.

@mjbvz
Contributor
mjbvz commented Dec 31, 2016

The change looks good.

Before we merge this in, can you please take a look at the failing tests in travis. You likely have to run the colorization tests again locally and check in the updated markdown test file. See the "VS Code Tokenizer Tests" in the launch.json

@joshpeng joshpeng Update Markdown tokenizer test file
431f543
@joshpeng
Contributor
joshpeng commented Dec 31, 2016 edited

@mjbvz I've uploaded the updated tokenizer Markdown test. The test passes locally, but Travis is still failing. Is that expected?

@mjbvz mjbvz merged commit b9a362a into Microsoft:master Jan 3, 2017

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@mjbvz
Contributor
mjbvz commented Jan 3, 2017

@joshpeng Thank you for this change. I've gone ahead and merged it in. It should be available in the next insiders build

@morags morags added a commit to morags/vscode that referenced this pull request Jan 20, 2017
@morags morags Adjust Go, Rust and Scala's logic as per #17591 6999eeb
@mjbvz mjbvz added a commit that referenced this pull request Jan 20, 2017
@morags @mjbvz morags + mjbvz Markdown fixes (#18704)
* Fix typos

* Add Go, Rust and Scala

* Adjust Go, Rust and Scala's logic as per #17591
22cc4a1
@joshpeng
Contributor
joshpeng commented Feb 4, 2017

@mjbvz How do I get mentioned for my contributions in the release notes of 1.9.0? ;(

@mjbvz mjbvz added this to the January 2017 milestone Feb 8, 2017
@mjbvz
Contributor
mjbvz commented Feb 8, 2017

Sorry about that @joshpeng. Let me see if I can fix things

@gregvanl I thought the contributor lists for the release notes was automatically generated. Did the PR have to be marked with "January 2017" for that to happen? What's the best way to update the list post-release?

@joshpeng
Contributor
joshpeng commented Feb 9, 2017

@mjbvz Didn't make it into 1.9.1 notes either. oh well :[

@mjbvz
Contributor
mjbvz commented Feb 9, 2017

I've added you to the 1.9 release notes: Microsoft/vscode-docs@d0e9826

Sorry for the omission when this was first published and thanks again for the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment