regex
is a JavaScript library intended for parsing, generation and AST-construction of
various regular expressions, as per the JavaScript variety's definition.
NOTE: the library depends upon the parsers.js
package for parser-making
npm install @hgargg-0710/regex
The package has the following exports:
parse
(function)generate
(function)parser
(submodule)generator
(submodule)tree
(submodule)tokens
(submodule)
function parse(regex: string): Flags
A function taking in a string containing a regular expression, and returning an AST of it.
function generate(AST: Flags): string
Takes in the given AST node (not necessariliy Flags
, but too long to express here),
and returns a string representing it.
NOTE: partial nodes will give only partial results. For example, passing a PatternEnd
will give "$"
.
Various parsing layers APIs
export | description |
---|---|
ExpressionParser |
Function. Parses an Expression , initially tokenizing it |
boundry |
Submodule. Handles parsing of boundries |
chars |
Submodule. Handles tokenization |
classes |
Submodule. Handles parsing of character classes |
deflag |
Submodule. Handles removal of flags |
disjunction |
Submodule. Handles parsing of disjunction expressions |
escaped |
Submodule. Handles parsing of escape-sequences |
group |
Submodule. Handles recursion within a regular expression |
nogreedy |
Submodule. Handles the "no-greedy" quantifiers |
quantifier |
Submodule. Handles the quantifiers |
The submodule exports are a part of the parse
function's final definition.
The order in which they (layers) are passed within the parse
function are:
deflag
chars
classes
escaped
boundry
group
(recursive, looped)quantifier
nogreedy
disjunction
export | description |
---|---|
DeFlag |
Functions for the de-flagging of a string with regular expression in it. Returns a Flags object, with the .expression field containing the expressions's string |
flagTable |
Table for identification of flags with appropriate TokenInstance s |
flagInstance |
Function based off flagTable . Returns the TokenType of a given flag string |
identifyFlags |
Maps flagInstance to an array of string s |
export | description |
---|---|
ExpressionTokenizer |
A PatternTokenizer for tokenizing the given Pattern with a regular expression in it |
tokenizerMap |
The RegExpMap , on which ExpressionTokenizer is based |
export | description |
---|---|
CharacterClassParser |
Main parser for character classes |
classLimit |
Limits the given stream up to the next RectOp from the current element |
classMap |
TypeMap , on which CharacterClassParser is based |
HandleClass |
The handler for the RectOp token inside the classMap |
ClassHandler |
A multistep function, serving as the main component of HandleClass |
EscapeInner |
A parser function, first component of the ClassHandler . Escapes inside characters |
HandleEscaped |
Handler for the escaped characters, main part of the EscapeInner |
IdentifyRanges |
Second parsing function of ClassHandler . Identifies and parsers ranges |
HandleRange |
The main component of IdentifyRanges , parses encountered ranges |
InClassEscapedHandler |
A slightly modified version of the escapedMap from escaped module for escaping |
export | description |
---|---|
EscapedParser |
Main parser of the escaped characters |
escapePreface |
The TypeMap , on which EscapedParser is based |
escapeMap |
The ValueMap , on which defines the global-scope escaping |
escapedHandler |
Creates a function for handling escaped characters based off given map |
parseBackreference |
Returns a Backreference based on given arguments of curr, input |
parseMultControl |
Returns a ControlCharacter of lengths 4-5 based on curr, input |
parseDoubleControl |
Returns a ControlCharacter of length 2 based on curr, input |
parseSingleControl |
Returns a ControlCharacter of length 1 based on curr, input |
readUnicodeClassProperty |
Parses a UnicodeClassProperty based on curr, input |
readBraced |
Reads the given Stream , until a ClBrace is encountered |
readNamedBackreference |
Reads a NamedBackreference based on readIdentifier |
readUBrace |
Reads a sequence of {hhhh} or {hhhhh} where isHex(h) === true |
readu |
Reads a sequence of hhhh , where isHex(h) === true |
readx |
Reads a sequence of hh , where isHex(h) === true |
isHex |
Returns whether a character given is a hexidecimal |
export | description |
---|---|
BoundryParser |
Main parser of the submodule. Separates boundries into TokenInstance s |
boundryMap |
The TypeMap , on which the BoundryParser is based |
HandleEscaped |
Handles the NonWordBoundry TokenInstance s |
export | description |
---|---|
EndParser |
The main parser of the submodule. The ExpressionParser ends with it |
GroupParser |
The first parsing layer of the EndParser . Recursive. Handles recursion, groups/captures, look-aheads/-behinds |
groupMap |
The TypeMap , on which the GroupParser is based |
GroupHandler |
The main component of the groupMap |
nestedBrack |
Function for limiting the current-level nested bracket-expression |
CollectionHandler |
Function for handling current collection |
HandleQMark |
Function for handling "collections" starting with ? ((?<!...) , (?<...>...) , ...) |
HandleCollectionBase |
Function for recursively handling a capture group |
QMarkHandler |
Underlying TableParser of HandleQMark |
HandleQMarkExclMark |
Handles a negative look-ahead |
HandleQMarkEq |
Handles a look-ahead |
HandleLeftAngular |
Handles all "collections" starting with < ((?<...>...) , (?<=...) , ...) |
HandleColon |
Handles a no-capture group |
LeftAngularHandler |
Underlying TableParser for HandleLeftAngular |
HandleLeftAngularBase |
Handles a named capture |
HandleLeftAngularExclMark |
Handles a negative look-behind |
HandleLeftAngularEq |
Handles a look-behind |
readIdentifier |
Reads an identifier (for the named capture/backreference) |
export | description |
---|---|
QuantifierParser |
Main parser of the submodule. Parses quantifiers |
QuantifierHandler |
A TableParser , main component of the QuantifierParser |
HandlePlus |
Handles a Plus token encountered |
HandleStar |
Handles a Star token encountered |
HandleQMark |
Handles a QMark token encountered |
BraceHandler |
Handles a OpBrace token encountered |
HandleBraced |
Returns a handling function for either one of NtoM , NPlus , or NOnly |
readNumber |
Reads a number from the given Stream (note: up to the first isNaN token) |
limitBraced |
Limits the given Stream up to the point of the first encountered ClBrace |
export | description |
---|---|
ParseNoGreedy |
Main parser of the submodule. Parsers NoGreedy tokens |
noGreedyMap |
The TypeMap , on which ParseNoGreedy is based |
HandleQuantifier |
Handler for quantifiers |
QuantifierHandler |
The underlying TableParser -function of HandleQuantifiers |
HandleQMark |
Handles QMark following a quantifier (no-greedy quantifiers) |
export | description |
---|---|
DisjunctionParser |
The main export of the submodule. Parses disjunctions |
EmptyFixer |
First parsing layer of DisjunctionParser . Fixes empty expressions || |
DisjunctionTokenizer |
Second parsing layer of DisjunctionParser . Puts non-Pipe bits of current Stream into DisjucntionArgument s |
DisjunctionDelimiter |
Third and final parsing layer of DisjunctionParser . Delimits the Stream based off Pipe tokens |
hasDisjunctions |
Checks whether a given Stream has disjunctions to parse from given point on |
limitPipe |
Limits the given Stream until the moment the next Pipe is encountered |
skipTilPipes |
Skips Stream until a Pipe is discovered |
Provides regex-generation related exports based off the package's AST
export | description |
---|---|
RegexGenerator |
The SourceGenerator for the package's AST (generate is based on it) |
generatorMap |
The TypeMap , on which RegexGenerator is based |
GenerateBackspaceClass |
Generates a regex for BackspaceClass |
GenerateWordBoundry |
Generates a regex for WordBoundry |
GenerateNonWordBoundry |
Generates a regex for NonWordBoundry |
GenerateNewline |
Generates a regex for Newline |
GenerateCarriageReturn |
Generates a regex for CarriageReturn |
GenerateWordClass |
Generates a regex for WordClass |
GenerateNonWordClass |
Generates a regex for NonWordClass |
GenerateFormFeed |
Generates a regex for FormFeed |
GenerateDigitClass |
Generates a regex for DigitClass |
GenerateNonDigitClass |
Generates a regex for NonDigitClass |
GenerateNULClass |
Generates a regex for NULClass |
GenerateVerticalTab |
Generates a regex for VerticalTab |
GenerateHorizontalTab |
Generates a regex for HorizontalTab |
GenerateNonWhitespaceClass |
Generates a regex for NonWhitespaceClass |
GenerateWhitespaceClass |
Generates a regex for WhitespaceClass |
GenerateEmptyExpression |
Generates a regex for EmptyExpression |
GenerateMatchIndicies |
Generates a regex for MatchIndicies flag |
GenerateGlobalSearch |
Generates a regex for GlobalSearch flag |
GenerateCaseInsensitive |
Generates a regex for CaseInsensitive flag |
GenerateMultline |
Generates a regex for Multline flag |
GenerateDotAll |
Generates a regex for DotAll flag |
GenerateUnicode |
Generates a regex for Unicode flag |
GenerateUnicodeSets |
Generates a regex for UnicodeSets flag |
GenerateSticky |
Generates a regex for Sticky flag |
GeneratePatterStart |
Generates a regex for PatternStart |
GeneratePatternEnd |
Generates a regex for PatternEnd |
GenerateFlags |
Generates a regex for Flags |
GenerateExpression |
Generates an regex for Expression |
GenerateNOnly |
Generates an regex for NOnly |
GenerateNtoM |
Generates an regex for NtoM |
GenerateNPlus |
Generates an regex for NPlus |
GenerateEscaped |
Generates an regex for Escaped |
GenerateBackreference |
Generates a regex for Backreference |
GenerateUnicodeClassProperty |
Generates a regex for UnicodeClassProperty |
GenerateControlCharacter |
Generates a regex for ControlCharacter |
GenerateNamedBackreference |
Generates a regex for NamedBackreference |
GenerateClassRange |
Generates a regex for ClassRange |
GenerateNoGreedy |
Generates a regex for NoGreedy |
GenerateOptional |
Generates anregex for Optional |
GenerateZeroPlus |
Generates a regex for ZeroPlus |
GenerateOnePlus |
Generates a regex for OnePlus |
GenerateClass |
Generates a regex for CharacterClass |
GenerateNegClass |
Generates a regex for NegCharacterClass |
GenerateDisjunction |
Generates a regex for Disjunction |
GenerateDisjunctionArgument |
Generates a regex for DisjunctionArgument |
GenerateNonCaptureGroup |
Generates a regex for NonCaptureGroup |
GenerateCaptureGroup |
Generates a regex for CaptureGroup |
GenerateLookAhead |
Generates a regex for LookAhead |
GenerateLookBehind |
Generates a regex for LookBehind |
GenerateNegLookAhead |
Generates a regex for NegLookAhead |
GenerateNegLookBehind |
Generates a regex for NegLookBehind |
GenerateNamedCapture |
Generates a regex for NamedCapture |
GenerateWildcard |
Generates a regex for Wildcard |
GeneratePipe |
Generates a regex for Pipe |
GenerateComma |
Generates a regex for Comma |
GenerateTrivial |
Generates a regex for anything else not in the table already (with a typeof .value === 'string' ) |
export | description |
---|---|
RegexStream |
A TreeStream for the library's AST (note: accepts THE AST ITSELF) |
RegexTree |
A Tree interface implementation for the library's AST |
treeMap |
The TypeMap , on which RegexTree is based |
NamedCaptureTree |
The function for conversion of a NamedCapture to a Tree |
ExpressionTree |
The function for conversion of an Expression to a Tree |
FlagTree |
The function for convertsion of a Flags to a Tree |
SeveralTree |
The function for conversion of NOnly , NtoM and NPlus to a Tree |
SingleTree |
The function for conversion of ZeroPlus , OnePlus , Optional , LookAhead , LookBehind , NegLookAhead , NegLookBehind , NamedBackreference to a Tree |
ValueTree |
The function for conversion of ClassRange , DisjunctionArgument , CharacterClass , NegCharacterClass and Disjunction to a Tree |
ChildlessTree |
The function for conversion of the rest of the tokens to a Tree |
The tokens
module has the same submodule structure as the parser
module.
submodule | description |
---|---|
boundry |
Various boundry tokens |
chars |
Various basic (first-order) tokens |
classes |
Tokens for representation of character classes |
deflag |
Flags and expressions representation tokens |
disjunction |
Disjunction-related tokens |
escaped |
Escape-sequence-related tokens |
group |
Tokens for groups and other recursive structures |
nogreedy |
Tokens for non-greedy quantifiers |
quantifier |
Tokens for quantifiers |
TokenType /TokenInstance |
represents | type |
---|---|---|
MatchIndicies |
The d flag |
"indicies" |
GlobalSearch |
The g flag |
"global" |
CaseInsensitive |
The i flag |
"case-insensitive" |
Multiline |
The m flag |
"multiline" |
DotAll |
The s flag |
"dot-all" |
Unicode |
The u flag |
"unicode" |
UnicodeSets |
The v flag |
"unicode-sets" |
Sticky |
The y flag |
"sticky" |
Flags |
The complete regular expression with flags | "flags" |
Expression |
A partial expression, without flags (can have other Expression s inside) |
"expression" |
TokenType |
represents | type |
---|---|---|
Escape |
\\ |
"escape" |
RectOp |
[ |
"rop" |
RectCl |
] |
"rcl" |
Hyphen |
- |
"hyphen" |
Pipe |
| |
"pipe" |
OpBrack |
( |
"opbrack" |
ClBrack |
) |
clbrack |
QMark |
? |
"qmark" |
ExclMark |
! |
"emark |
Eq |
= |
"eq" |
Wildcard |
. |
"wildcard" |
Star |
* |
"star" |
Plus |
+ |
"plus" |
OpBrace |
{ |
"opbrc" |
ClBrace |
} |
"clbrc" |
Colon |
: |
"colon" |
Comma |
, |
"comma" |
LeftAngular |
< |
"lang" |
RightAngular |
> |
"rang" |
Dollar |
$ |
"dollar" |
Xor |
^ |
"xor" |
RegexSymbol |
everything else | "symbol" |
TokenType |
represents | type |
---|---|---|
CharacterClass |
A character class [...] |
"charclass" |
NegCharacterClass |
A negative character class [^...] |
"neg-charclass" |
ClassRange |
A character class range X-Y |
"class-range" |
TokenType /TokenInstance |
represents | type |
---|---|---|
ControlCharacter |
\cX , \xhh , \uhhhh , \u{hhhh} or \u{hhhhh} |
"control-char" |
Backreference |
\N - numeric backreference |
"backref" |
NamedBackreference |
\k<name> - named backreference |
"named-backref" |
UnicodeClassProperty |
\p{...} - unicode class property |
"uniprop" |
RegexIdentifier |
name - identifier in named captures/backreferences |
"identifier" |
CarriageReturn |
\r - carriage return |
"cr" |
NonWordBoundry |
\B - non-word boundry (outside classes) |
"non-word-boundry" |
WordBoundry |
\b - word-boundry |
"word-boundry" |
NULClass |
\0 - NUL class |
"nul-class" |
FormFeed |
\f - form feed |
"form-feed" |
DigitClass |
\d - digit class |
"digit-class" |
NonDigitClass |
\D - non-digit class |
"non-digit-class" |
WordClass |
\w - word-class |
"word-class" |
NonWordClass |
\W - nonw-word-class |
"non-word-class" |
WhitespaceClass |
\s - whitespace class |
"whitespace-class" |
NonWhitespaceClass |
\S - non-whitespace class |
"non-whitespace-class" |
HorizontalTab |
\t - horizontal tab |
"tab" |
VerticalTab |
\v - vertical tab |
"vtab" |
BackspaceClass |
\b - backspace |
"backspace" |
Newline |
\n - newline |
"newline" |
Escaped |
Any other escaped character | "escaped" |
TokenInstance |
represents | type |
---|---|---|
PatternStart |
^ |
"start" |
PatternEnd |
$ |
"end" |
TokenType |
represents | type |
---|---|---|
CaptureGroup |
(...) |
"capture" |
NoCaptureGroup |
(?:...) |
"non-capture" |
NamedCapture |
(<name>...) |
"named-capture" |
LookAhead |
(?=...) |
"lookahead" |
LookBehind |
(?<=...) |
"lookbehind" |
NegLookAhead |
(?!...) |
"neg-lookahead" |
NegLookBehind |
(?<!...) |
"neg-lookbehind" |
TokenType |
represents | type |
---|---|---|
ZeroPlus |
...* |
"zero-plus" |
OnePlus |
...+ |
"one-plus" |
Optional |
...? |
"optional" |
NOnly |
...{...} |
"n-only" |
NPlus |
...{...,} |
"n-plus" |
NtoM |
...{...,...} |
"n-to-m" |
export | description | type |
---|---|---|
NoGreedy |
A TokenType representing no-greedy opertors |
"nogreedy" |
isQuantifier |
A predicate returning true only for tokens with types from the quantifier module |
TokenType /TokenInstance |
represents | type |
---|---|---|
Disjunction |
...|...|... |
"disjunction" |
DisjunctionArgument |
An element of a Disjunction |
"disjunction-arg" |
EmptyExpression |
An empty element of a Disjunction (|| ) |
"empty" |