Added a test for identifier support across all languages #2371
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivated by this comment, I added a test that goes through all languages and checks that identifiers aren't broken.
What does "identifiers aren't broken" mean?
It means that any identifier (
/[_a-zA-Z][_a-zA-Z0-9]*/
) will be tokenized as either one token or not at all. I.e. the identifierfoo123
would be broken if the language tokenized the123
part as a number. The test will see how the languages handle identifiers like this and others. It will also check for numbers.Why do we need this?
As pointed out in the comment, Markup templating (MT) assumes that its placeholders (which are identifiers) aren't broken up. If they are, MT will stop working. In the past, it caused this issue.
How is this implemented?
The test is quite simple. It has a list (actually 3) of identifiers and just tests that those identifiers aren't broken for any given language. Because some languages don't have identifiers, you can selectively disable the test for a certain class or all classes of identifiers.
The error message of this test includes an explanation of what broken identifiers are and how to fix them. Instructions on disabling are also included.
(The problem with the current implementation of this test is that I only do a
Prism.tokenize
on every identifier. I don't testinside
grammars because these are usually very specific to the parent pattern, so there are almost only false positives.)The actual changes to the languages are just boundary assertions. (I didn't just blindly throw some
\b
in there tho. I went and looked up the spec/doc of every language I didn't know.)In some cases, I even had to change some test cases because they were wrong. Markdown changed the most because I didn't know that
foo_italic_
won't make anything italic at the time. That's fixed now. For languages that had a faulty number pattern, I didn't create any new test files because we now have this test.