v0.4.0
\ ˈwīnd \ - to raise to a high level [as of excitement or tension]
npm install --save @candlelib/wind
note: This script uses ES2015 module syntax, and has the extension .mjs. To include this script in a project, you may need to use the node flag
--experimental-modules
; or, use a bundler that supports ES modules, such as rollup.
import wind from "@candlelib/wind"
const sample_string = "The 2345 a 0x3456 + 'a string'";
let lexer = wind(sample_string);
//Example
lexer.text //=> "The"
lexer.n.tx //=> "2345"
lexer.n.text //=> "a"
lexer.assert("b")
lexer.text //=> "0x3456"
lexer.ty == lexer.types.number //=> true
import { Lexer } from "@candlelib/wind"
string
- The input string to parse.INCLUDE_WHITE_SPACE_TOKENS
- Flag to include white space tokens such asTABS
andNEW_LINE
.
note: the default export
wind
has the same form as the Lexer constructor function and is called without the new keyword.let lexer = wind ( string [ , INCLUDE_WHITE_SPACE_TOKENS )
-
char (Read-Only) - Number
The char offset of the token relative to theline
. -
CHARACTERS_ONLY - Boolean
If true the Lexer will only produce tokens that are one character in length; -
END (Read-Only) - Boolean
If true the Lexer has reached the end of the input string.s -
IGNORE_WHITE_SPACE - Boolean
If truewhite_space
andnew_line
tokens will not be generated. -
line (Read-Only) - Number
The index of the current line the token is located at. -
off - Number
The absolute index position of the current token measured from the beginning of the input string. -
p - Wind Lexer
A pointer cache to a peeking Lexer. -
PARSE_STRING - Boolean
If set to true then string tokens will not be generated and instead the contents of string will be individually tokenized. -
sl - Number
The length of the input string. Changingsl
will cause the Lexer to stop parsing onceoff+token_length >= sl
. -
str - String
The string that is being tokenized. -
string (Read-Only) - String
Returns the result ofslice()
-
string_length (Read-Only) - Number
The length of the remaining string to be parsed. Same aslex.sl - lex.off
. -
text - String
The string value for the current token. -
tl - Number
The size of the current token. -
type - Number
The current token type. See types. -
types - Object
Proxy to types object. -
ch
The first character of the current token.
-
n
Property proxy fornext()
; -
string
Returns the result ofslice()
. -
token
Property proxy forcopy()
-
tx
Proxy fortext
. -
ty
Proxy fortype
. -
pos
Proxy foroff
. -
pk
Property proxy forpeek()
.
-
Lexer - assert ( text )
Compares the current token text value to the argumenttext
. If the values are the same then the lexer advances to the next token. If they are not equal, an error message is thrown.- Returns Lexer to allow method chaining.
-
Lexer - assertCharacter ( char )
Same asassert()
except compares a single character only.- Returns Lexer to allow method chaining.
-
Lexer - comment ( [ ASSERT [ , marker ] ] )
Skips to the end of the comment section if the current token is/
and the peek token is/
or*
. If true is passed for theASSERT
argument then an error is thrown if the current token plus the peek token is not/*
or//
.- Returns Lexer to allow method chaining.
-
Lexer - copy ( [ destination ])
Copies the value of the lexer todestination
.destination
defaults to a new Wind Lexer. -
Lexer - fence ( [ marker ] ) - Reduces the input string's parse length by the value of
marker.off
. The value of themarker
must be a Wind Lexer that has the same input string as the callee Wind Lexer.- Returns Lexer to allow method chaining.
-
Lexer - next ( [ marker ] )
Advances themarker
to the next token in its input string. Returnsmarker
or null if the end of the input string has been reached.marker
defaults to the calling Wind Lexer object, which means this will be returned if no value is passed asmarker
.- Returns Lexer to allow method chaining.
-
Lexer - peek ( [ marker [ , peek_marker ] ] )
Returns another Wind Lexer that is advanced one token ahead ofmarker
.marker
defaults to this andpeek_marker
defaults top
. A new Wind Lexer is created if no value is passed aspeek_marker
andmarker.p
is null. -
Lexer - reset ( )
Resets lexer completely. After this is called, the lexer will need to be set with a new input string to allow it to begin parsing again.- Returns Lexer to allow method chaining.
-
Lexer - resetHead ( )
Reset the lexer to the beginning of the string.- Returns Lexer to allow method chaining.
-
Lexer - setString ( string [ , RESET ] )
Changes the input string tostring
. If the optionalRESET
argument is true thenresetHead()
is also called.- Returns Lexer to allow method chaining.
-
String - slice ( [ start ] )
Returns a substring of the input string that starts atstart
and ends at the value ofoff
. Ifstart
is undefined then the substring starts atoff
and ends atsl
. -
Lexer - sync ( [ marker ] )
Copies the current values of themarker
object to the Wind Lexer.marker
defaults to the value ofp
.- Returns Lexer to allow method chaining.
-
throw ( message )
Throws a new Error with a custommessage
and information to indicate where in the input string the current token is positioned. -
String - toString ( )
Returns the result ofslice()
. -
trim ( )
Creates and returns new Lexer with leading and trailing whitespace and line terminator characters removed from the input string.
-
a ( text )
Proxy forassert(text)
. -
aC ( char )
Proxy forassertCharacter(character)
. -
r ( )
Proxy forreset()
. -
s( [ start ] )
Proxy forslice(start)
.
There are 10 types of tokens that the Wind Lexer will create. Type identifiers can be accessed through wind.types, Lexer.types, and the types
property in Lexer instances. Each type is identified with a power of 2 value to allow nested comparisons:
(lexer.type & (lexer.types.identifier | lexer.types.symbol)) ? true : false;
-
types.identifier or types.id
Any set of characters beginning with_
|a-z
|A-Z
, and followed by0-9
|a-z
|A-Z
|-
|_
|#
|$
. -
types.number or types.num
Any set of characters beginning with0-9
|.
, and followed by0-9
|.
. -
types.string or types.str
A set of characters beginning with either'
or"
and ending with a matching'
or"
. -
types.open_bracket or types.ob
A single character from the set<
|(
|{
|[
. -
types.close_bracket or types.cb
A single character from the set>
|)
|}
|]
. -
types.operator or types.op
A single character from the set*
|+
|<
|=
|>
|\
|&
|%
|!
||
|^
|:
. -
types.new_line or types.nl
A singlenewline
(LF
orNL
) character. It may also beLFCR
if the input string has Windows style new lines. -
types.white_space or types.ws
An uninterrupted set oftab
orspace
characters. -
types.symbol or types.sym
All other characters not defined by the the above, with each symbol token being comprised of one character. -
types.data_link or types.dl
A data link ASCII character, followed by two more characters and another data link character.