Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for more than one key/value pair per line #21

Merged
merged 1 commit into from Mar 25, 2019

Conversation

Projects
None yet
2 participants
@caleb531
Copy link
Contributor

caleb531 commented Feb 26, 2019

Description of the Change

This PR fixes an error in the grammar which can cause strings to be tokenized incorrectly, leading to equal signs (within strings) to be tokenized and to the rest of file being mis-highlighted. I've tweaked the grammar so this can't happen.

Consider the following Pipfile, which is taken directly from the Pipfile README. Before this fix was applied, the highlighting was screwy when "(.*?)"\s*(=) is matched. Note the red equal sign in the middle of what should be a gold string, and note that the rest of the file is mis-highlighted:

1-incorrect

With my fix applied, highlighting is now looking more consistent. I have added an additional test to confirm this, and all existing tests are still passing as well. :)

2-correct

Alternate Designs

N/A

Benefits

Better syntax highlighting for more sophisticated TOML files.

Possible Drawbacks

It's possible this change could introduce a hidden regression in the highlighting of TOML files, though I have not seen such in my testing, and the rest of the Pipfile I referenced above appears just fine.

Applicable Issues

N/A

@@ -29,7 +29,7 @@
'2': 'name': 'keyword.operator.assignment.toml'
}
{
'match': '((")(.*)("))\\s*(=)' # This one is .* because " can be escaped
'match': '^\s*((")(.*?)("))\\s*(=)' # This one is .* because " can be escaped

This comment has been minimized.

Copy link
@50Wliu

50Wliu Feb 26, 2019

Member

Is the leading whitespace match necessary?

This comment has been minimized.

Copy link
@caleb531

caleb531 Feb 26, 2019

Author Contributor

@50Wliu I believe so, because the property name could be indented. Consider the example from the TOML README. There aren't any key names with spaces or other non-word characters, but when I quote one of the indented key names, the TOML validator still recognizes it as valid (http://toml-online-parser.ovonick.com/).

# This is a TOML document.

title = "TOML Example"

[owner]
name = "Tom Preston-Werner"
dob = 1979-05-27T07:32:00-08:00 # First class dates

[database]
server = "192.168.1.1"
ports = [ 8001, 8001, 8002 ]
connection_max = 5000
enabled = true

[servers]

  # Indentation (tabs and/or spaces) is allowed but not required
  [servers.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [servers.beta]
  "ip" = "10.0.0.2"
  dc = "eqdc10"

[clients]
data = [ ["gamma", "delta"], [1, 2] ]

# Line breaks are OK when inside arrays
hosts = [
  "alpha",
  "omega"
]

screen shot 2019-02-25 at 9 14 48 pm

This comment has been minimized.

Copy link
@50Wliu

50Wliu Feb 26, 2019

Member

Understood, but I still don't think we need it (matches start looking at the last match, not the beginning of a line). Especially as it's currently checking for optional ss, not whitespace characters.

This comment has been minimized.

Copy link
@caleb531

caleb531 Feb 26, 2019

Author Contributor

@50Wliu Fair enough. I have just removed the leading whitespace check as you requested. Let me know if this is good to go or if there is anything else I can do.

This comment has been minimized.

Copy link
@50Wliu

50Wliu Feb 27, 2019

Member

Sorry, I also meant the ^.

This comment has been minimized.

Copy link
@caleb531

caleb531 Feb 28, 2019

Author Contributor

@50Wliu The ^ is what fixes the syntax highlighting, so removing it would render this PR ineffective. I tried several different ways of revising the grammar—moving things around and such—but those attempts all caused the tests to fail. Adding the ^ was the only modification that actually solved the problem and kept all tests passing.

The added question mark for the star-matching ((.*?)) was also added because when you're using .* between two quotes in a regex, it should really be a lazy match in case the string contains an escaped quote.

This comment has been minimized.

Copy link
@50Wliu

50Wliu Feb 28, 2019

Member

Ah, thanks. I'll see if I can figure out why it only works with the anchor!

@caleb531 caleb531 force-pushed the caleb531:string-fix branch from 29e7fa4 to ecae330 Feb 26, 2019

@50Wliu

This comment has been minimized.

Copy link
Member

50Wliu commented Mar 16, 2019

Sorry for the delay...looking at this now 😅

@50Wliu

This comment has been minimized.

Copy link
Member

50Wliu commented Mar 17, 2019

Here's a very large diff, but I think it solves the root cause of the problem. Description below.

diff --git a/grammars/toml.cson b/grammars/toml.cson
index 61aa0fd..e4a06a4 100644
--- a/grammars/toml.cson
+++ b/grammars/toml.cson
@@ -3,10 +3,7 @@
 'fileTypes': ['toml']
 'patterns': [
   {
-    'match': '(#).*$'
-    'name': 'comment.line.number-sign.toml'
-    'captures':
-      '1': 'name': 'punctuation.definition.comment.toml'
+    'include': '#comment'
   }
   {
     'match': '(?:^\\s*)((\\[\\[)[^\\]]+(\\]\\]))'
@@ -23,16 +20,26 @@
       '3': 'name': 'punctuation.definition.table.end.toml'
   }
   {
-    'match': '([A-Za-z0-9_-]+)\\s*(=)' # IMPORTANT: Do not replace with [\\w-].  \\w includes more than just a-z.
-    'captures':
-      '1': 'name': 'variable.other.key.toml'
-      '2': 'name': 'keyword.operator.assignment.toml'
+    'begin': '([A-Za-z0-9_-]+)\\s*(=)\\s*' # IMPORTANT: Do not replace with [\\w-]. \\w includes more than just a-z.
+    'beginCaptures':
+      '1':
+        'name': 'variable.other.key.toml'
+      '2':
+        'name': 'keyword.operator.assignment.toml'
+    'end': '(?!\\G)'
+    'patterns': [
+      {
+        'include': '#values'
+      }
+    ]
   }
   {
-    'match': '^((")(.*?)("))\\s*(=)' # This one is .* because " can be escaped
-    'captures':
-      '1': 'name': 'string.quoted.double.toml'
-      '2': 'name': 'punctuation.definition.string.begin.toml'
+    'begin': '((")(.*?)("))\\s*(=)\\s*' # This one is .* because " can be escaped
+    'beginCaptures':
+      '1':
+        'name': 'string.quoted.double.toml'
+      '2':
+        'name': 'punctuation.definition.string.begin.toml'
       '3':
         'name': 'variable.other.key.toml'
         'patterns': [
@@ -40,90 +47,162 @@
             'include': '#string-escapes'
           }
         ]
-      '4': 'name': 'punctuation.definition.string.end.toml'
-      '5': 'name': 'keyword.operator.assignment.toml'
-  }
-  {
-    'match': "((')([^']*)('))\\s*(=)"
-    'captures':
-      '1': 'name': 'string.quoted.single.toml'
-      '2': 'name': 'punctuation.definition.string.begin.toml'
-      '3': 'name': 'variable.other.key.toml'
-      '4': 'name': 'punctuation.definition.string.end.toml'
-      '5': 'name': 'keyword.operator.assignment.toml'
-  }
-  {
-    'begin': '"""'
-    'beginCaptures':
-      '0': 'name': 'punctuation.definition.string.begin.toml'
-    'end': '"""'
-    'endCaptures':
-      '0': 'name': 'punctuation.definition.string.end.toml'
-    'name': 'string.quoted.double.block.toml'
+      '4':
+        'name': 'punctuation.definition.string.end.toml'
+      '5':
+        'name': 'keyword.operator.assignment.toml'
+    'end': '(?!\\G)'
     'patterns': [
       {
-        'include': '#string-escapes'
-      }
-      {
-        'match': '\\\\$'
-        'name': 'constant.character.escape.toml'
+        'include': '#values'
       }
     ]
   }
   {
-    'begin': "'''"
+    'begin': "((')([^']*)('))\\s*(=)\\s*"
     'beginCaptures':
-      '0': 'name': 'punctuation.definition.string.begin.toml'
-    'end': "'''"
-    'endCaptures':
-      '0': 'name': 'punctuation.definition.string.end.toml'
-    'name': 'string.quoted.single.block.toml'
-  }
-  {
-    'begin': '"'
-    'beginCaptures':
-      '0': 'name': 'punctuation.definition.string.begin.toml'
-    'end': '"'
-    'endCaptures':
-      '0': 'name': 'punctuation.definition.string.end.toml'
-    'name': 'string.quoted.double.toml'
+      '1':
+        'name': 'string.quoted.single.toml'
+      '2':
+        'name': 'punctuation.definition.string.begin.toml'
+      '3':
+        'name': 'variable.other.key.toml'
+      '4':
+        'name': 'punctuation.definition.string.end.toml'
+      '5':
+        'name': 'keyword.operator.assignment.toml'
+    'end': '(?!\\G)'
     'patterns': [
       {
-        'include': '#string-escapes'
+        'include': '#values'
       }
     ]
   }
-  {
-    'begin': "'"
-    'beginCaptures':
-      '0': 'name': 'punctuation.definition.string.begin.toml'
-    'end': "'"
-    'endCaptures':
-      '0': 'name': 'punctuation.definition.string.end.toml'
-    'name': 'string.quoted.single.toml'
-  }
-  {
-    'match': 'true'
-    'name': 'constant.language.boolean.true.toml'
-  }
-  {
-    'match': 'false'
-    'name': 'constant.language.boolean.false.toml'
-  }
-  {
-    'match': '\\d{4}-\\d{2}-\\d{2}(?:(T)\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?(?:(Z)|([+-])\\d{2}:\\d{2})?)?'
-    'name': 'constant.numeric.date.toml'
-    'captures':
-      '1': 'name': 'keyword.other.time.toml'
-      '2': 'name': 'keyword.other.offset.toml'
-      '3': 'name': 'keyword.other.offset.toml'
-  }
-  {
-    'match': '[+-]?(0|[1-9]\\d*)(_\\d+)*((\\.\\d+)(_\\d+)*)?([eE][+-]?\\d+(_\\d+)*)?'
-    'name': 'constant.numeric.toml'
-  }
 ]
 'repository':
+  'comment':
+    'patterns': [
+      {
+        'match': '(#).*$'
+        'name': 'comment.line.number-sign.toml'
+        'captures':
+          '1': 'name': 'punctuation.definition.comment.toml'
+      }
+    ]
   'string-escapes':
     'match': '\\\\[btnfr"\\\\]|\\\\u[A-Fa-f0-9]{4}|\\\\U[A-Fa-f0-9]{8}'
     'name': 'constant.character.escape.toml'
+  'values':
+    'patterns': [
+      {
+        'begin': '\\['
+        'beginCaptures':
+          '0':
+            'name': 'punctuation.definition.array.begin.toml'
+        'end': '\\]'
+        'endCaptures':
+          '0':
+            'name': 'punctuation.definition.array.end.toml'
+        'patterns': [
+          {
+            'include': '#comment'
+          }
+          {
+            'match': ','
+            'name': 'punctuation.definition.separator.comma.toml'
+          }
+          {
+            'include': '#values'
+          }
+        ]
+      }
+      {
+        'begin': '{'
+        'beginCaptures':
+          '0':
+            'name': 'punctuation.definition.table.inline.begin.toml'
+        'end': '}'
+        'endCaptures':
+          '0':
+            'name': 'punctuation.definition.table.inline.end.toml'
+        'patterns': [
+          {
+            'match': ','
+            'name': 'punctuation.definition.separator.comma.toml'
+          }
+          {
+            'include': '$self'
+          }
+        ]
+      }
+      {
+        'begin': '"""'
+        'beginCaptures':
+          '0': 'name': 'punctuation.definition.string.begin.toml'
+        'end': '"""'
+        'endCaptures':
+          '0': 'name': 'punctuation.definition.string.end.toml'
+        'name': 'string.quoted.double.block.toml'
+        'patterns': [
+          {
+            'include': '#string-escapes'
+          }
+          {
+            'match': '\\\\$'
+            'name': 'constant.character.escape.toml'
+          }
+        ]
+      }
+      {
+        'begin': "'''"
+        'beginCaptures':
+          '0': 'name': 'punctuation.definition.string.begin.toml'
+        'end': "'''"
+        'endCaptures':
+          '0': 'name': 'punctuation.definition.string.end.toml'
+        'name': 'string.quoted.single.block.toml'
+      }
+      {
+        'begin': '"'
+        'beginCaptures':
+          '0': 'name': 'punctuation.definition.string.begin.toml'
+        'end': '"'
+        'endCaptures':
+          '0': 'name': 'punctuation.definition.string.end.toml'
+        'name': 'string.quoted.double.toml'
+        'patterns': [
+          {
+            'include': '#string-escapes'
+          }
+        ]
+      }
+      {
+        'begin': "'"
+        'beginCaptures':
+          '0': 'name': 'punctuation.definition.string.begin.toml'
+        'end': "'"
+        'endCaptures':
+          '0': 'name': 'punctuation.definition.string.end.toml'
+        'name': 'string.quoted.single.toml'
+      }
+      {
+        'match': 'true'
+        'name': 'constant.language.boolean.true.toml'
+      }
+      {
+        'match': 'false'
+        'name': 'constant.language.boolean.false.toml'
+      }
+      {
+        'match': '\\d{4}-\\d{2}-\\d{2}(?:(T)\\d{2}:\\d{2}:\\d{2}(?:\\.\\d+)?(?:(Z)|([+-])\\d{2}:\\d{2})?)?'
+        'name': 'constant.numeric.date.toml'
+        'captures':
+          '1': 'name': 'keyword.other.time.toml'
+          '2': 'name': 'keyword.other.offset.toml'
+          '3': 'name': 'keyword.other.offset.toml'
+      }
+      {
+        'match': '[+-]?(0|[1-9]\\d*)(_\\d+)*((\\.\\d+)(_\\d+)*)?([eE][+-]?\\d+(_\\d+)*)?'
+        'name': 'constant.numeric.toml'
+      }
+    ]

The problem was that the current language structure couldn't handle more than one key/value pair per line, which inline tables allow. To fix that, this diff encapsulates each key pattern so that it'll also match the corresponding value before finishing the match, and ensuring that values won't match unless paired with a key. I've added explicit support for arrays and inline tables so that highlighting for those two don't regress.

Unfortunately, this has the side-effect of breaking the vast majority of the specs, which have only been testing the values. They'll need to be updated to use key = value syntax instead.

@caleb531

This comment has been minimized.

Copy link
Contributor Author

caleb531 commented Mar 24, 2019

@50Wliu Fantastic, thank you for the diff! I will work on rewriting the tests to accept these changes, and will let you know when this PR is ready for final review.

@caleb531 caleb531 force-pushed the caleb531:string-fix branch 2 times, most recently from 226baf9 to 25bc2e2 Mar 25, 2019

Allow for more than one key/value pair per line
The rewritten grammar is courtesy of @50Wilu, who had this to say:

"The problem was that the current language structure couldn't handle
more than one key/value pair per line, which inline tables allow. To fix
that, this diff encapsulates each key pattern so that it'll also match
the corresponding value before finishing the match, and ensuring that
values won't match unless paired with a key. I've added explicit support
for arrays and inline tables so that highlighting for those two don't
regress."

These changes originally broke many of the tests, which I have since
refactored to use `key = value` syntax instead.

@caleb531 caleb531 force-pushed the caleb531:string-fix branch from 25bc2e2 to 52b54b9 Mar 25, 2019

@caleb531

This comment has been minimized.

Copy link
Contributor Author

caleb531 commented Mar 25, 2019

@50Wliu Half an hour later... that wasn't so bad! I've incorporated your grammar changes (by directly apply'ing your diff) and have updated the tests to use key = value syntax, as you suggested. All tests are now passing again.

I have rebased this latest commit on my branch (per the contributing guidelines, as I recall), so that there's only one commit to merge.

@50Wliu

50Wliu approved these changes Mar 25, 2019

@50Wliu 50Wliu changed the title Do not tokenize equal signs within strings Allow for more than one key/value pair per line Mar 25, 2019

@50Wliu 50Wliu merged commit 000efcd into atom:master Mar 25, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@50Wliu

This comment has been minimized.

Copy link
Member

50Wliu commented Mar 25, 2019

@caleb531 note that GitHub (more specifically, Linguist) doesn't appear to use Atom's TOML language currently. Rather, it uses https://github.com/textmate/toml.tmbundle, which has better support for TOML 0.5 at the moment.

I'm working on integrating the 0.5 changes into this package (#23, #24), at which point we could consider asking the Linguist maintainers to switch to this package.

@50Wliu 50Wliu referenced this pull request Mar 25, 2019

Merged

Update language-toml #19047

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.