Tokenize variables starting with a non-ASCII letter#112
Conversation
Alhadis
left a comment
There was a problem hiding this comment.
In the event that language-go is used to highlight Go code on GitHub in the future, I want to make sure that this syntax will be recognized by the engine that GitHub uses.
Glad to hear this is being considered. ;)
While \p{L} will work in PCRE as well as Oniguruma, the former won't work unless it's running in UTF-8 mode. GitHub actually runs PCRE in ASCII mode for performance reasons, so we can't rely on \p{L} here.
grammars/go.cson
Outdated
| { | ||
| 'comment': 'Function declarations' | ||
| 'match': '^(\\bfunc\\b)(?:\\s+(\\([^\\)]+\\)\\s+)?([a-zA-Z_]\\w*)(?=\\())?' | ||
| 'match': '^(\\bfunc\\b)(?:\\s+(\\([^\\)]+\\)\\s+)?([\\w&&[^0-9]]\\w*)(?=\\())?' |
There was a problem hiding this comment.
I'm afraid intersections aren't supported by PCRE; only Oniguruma and Java/Python's regex modules.
To remain portable, I suggest using a negative assertion instead:
-[\\w&&[^0-9]]
+(?![0-9])\\w|
Third alternative: just use |
|
Also good. I don't know anything about Go other than the fact it uses tabs (and therefore earns my respect my default :p), but yeah. |
|
New approach introduces error highlighting. |
|
@Alhadis anything wrong with the latest revision? |
|
Oh sorry, I wasn't pinged/notified. Aye, looks good to me, but my knowledge of Go is zilch. |
|
👍 will merge after #111 then. |
…code # Conflicts: # grammars/go.cson
Requirements
Description of the Change
Instead of checking to make sure that the first letter of a variable is part of the US alphabet, check instead that it is a word character and NOT a number. This is primarily for non-ASCII variables.
Alternate Designs
/cc @pchaigno, @Alhadis: In the event that language-go is used to highlight Go code on GitHub in the future, I want to make sure that this syntax will be recognized by the engine that GitHub uses.
/cc @esdoppio: I saw that an alternative was
\p{L}. In the event that both are recognized, which would you prefer? I know in the past that you've been using[\w&&[^0-9]], but\p{L}seems much cleaner to me.Benefits
See description.
Possible Drawbacks
See alternatives :).
Applicable Issues
Fixes #93