-
Notifications
You must be signed in to change notification settings - Fork 200
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support Lexing with custom(None RegExp) Token Patterns.
fixes #331
- Loading branch information
Showing
6 changed files
with
230 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
## Custom Token Patterns | ||
|
||
|
||
### Background | ||
Normally a Token's pattern is defined using a JavaScript regular expression: | ||
|
||
```JavaScript | ||
let IntegerToken = createToken({name: "IntegerToken", pattern: /\d+/}) | ||
``` | ||
|
||
However in some circumstances the capability to provide a custom pattern matching implementation may be required. | ||
Perhaps a special Token which cannot be easily defined using regular expressions, or perhaps | ||
to enable working around performance problems in a specific RegularExpression engine, for example: | ||
|
||
* [WebKit/Safari multiple orders of magnitude performance degradation for specific regExp patterns](https://bugs.webkit.org/show_bug.cgi?id=152578) 😞 | ||
|
||
|
||
### Usage | ||
A custom pattern must conform to the API of the [RegExp.prototype.exec](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec) | ||
function. Additionally it must perform any matches from the **start** of the input. In RegExp semantics this means | ||
that any custom pattern implementations should behave as if the [start of input anchor](http://www.rexegg.com/regex-anchors.html#caret) | ||
has been used. | ||
|
||
|
||
The basic syntax for supplying a custom pattern is defined by the [ICustomPattern](TODO:LINK) interface. | ||
Example: | ||
|
||
```JavaScript | ||
|
||
function matchInteger(text) { | ||
let i = 0 | ||
let charCode = text.charCodeAt(i) | ||
while (charCode >= 48 && charCode <= 57) { | ||
i++ | ||
} | ||
|
||
// No match, must return null to conform with the RegExp.prototype.exec signature | ||
if (i === 0) { | ||
return null | ||
} | ||
else { | ||
let matchedString = text.substring(0, i) | ||
// according to the RegExp.prototype.exec API the first item in the returned array must be the whole matched string. | ||
return [matchedString] | ||
} | ||
} | ||
|
||
let IntegerToken = createToken({ | ||
name: "IntegerToken", | ||
pattern: { | ||
exec: matchInteger, | ||
containsLineTerminator: false | ||
}}) | ||
``` | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.