Skip to content
This repository was archived by the owner on Dec 15, 2022. It is now read-only.

Comments

Tokenize variables starting with a non-ASCII letter#112

Merged
winstliu merged 3 commits intomasterfrom
wl-unicode
Mar 27, 2017
Merged

Tokenize variables starting with a non-ASCII letter#112
winstliu merged 3 commits intomasterfrom
wl-unicode

Conversation

@winstliu
Copy link
Contributor

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • All new code requires tests to ensure against regressions

Description of the Change

Instead of checking to make sure that the first letter of a variable is part of the US alphabet, check instead that it is a word character and NOT a number. This is primarily for non-ASCII variables.

Alternate Designs

/cc @pchaigno, @Alhadis: In the event that language-go is used to highlight Go code on GitHub in the future, I want to make sure that this syntax will be recognized by the engine that GitHub uses.
/cc @esdoppio: I saw that an alternative was \p{L}. In the event that both are recognized, which would you prefer? I know in the past that you've been using [\w&&[^0-9]], but \p{L} seems much cleaner to me.

Benefits

See description.

Possible Drawbacks

See alternatives :).

Applicable Issues

Fixes #93

Copy link

@Alhadis Alhadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event that language-go is used to highlight Go code on GitHub in the future, I want to make sure that this syntax will be recognized by the engine that GitHub uses.

Glad to hear this is being considered. ;)

While \p{L} will work in PCRE as well as Oniguruma, the former won't work unless it's running in UTF-8 mode. GitHub actually runs PCRE in ASCII mode for performance reasons, so we can't rely on \p{L} here.

grammars/go.cson Outdated
{
'comment': 'Function declarations'
'match': '^(\\bfunc\\b)(?:\\s+(\\([^\\)]+\\)\\s+)?([a-zA-Z_]\\w*)(?=\\())?'
'match': '^(\\bfunc\\b)(?:\\s+(\\([^\\)]+\\)\\s+)?([\\w&&[^0-9]]\\w*)(?=\\())?'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid intersections aren't supported by PCRE; only Oniguruma and Java/Python's regex modules.

To remain portable, I suggest using a negative assertion instead:

-[\\w&&[^0-9]]
+(?![0-9])\\w

@winstliu
Copy link
Contributor Author

Third alternative: just use \w+ and then introduce error highlighting. I think I like this one the best.

@Alhadis
Copy link

Alhadis commented Jan 14, 2017

Also good. I don't know anything about Go other than the fact it uses tabs (and therefore earns my respect my default :p), but yeah.

@winstliu
Copy link
Contributor Author

New approach introduces error highlighting.

@winstliu
Copy link
Contributor Author

@Alhadis anything wrong with the latest revision?

@Alhadis
Copy link

Alhadis commented Mar 20, 2017

Oh sorry, I wasn't pinged/notified.

Aye, looks good to me, but my knowledge of Go is zilch.

@winstliu
Copy link
Contributor Author

👍 will merge after #111 then.

@winstliu winstliu merged commit 7f5dd7b into master Mar 27, 2017
@winstliu winstliu deleted the wl-unicode branch March 27, 2017 17:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

non-ascii variable names

2 participants