A simple, yet flexible library for extracting specialized tokens out of text. After maintaining a library of Regular Expressions for years, I was inspired to build a number of parsers after taking Dmitry Soshnikov's excellent Udemy course Building a Parser from Scratch.
npm i @sudo-nymd/text-parser
The Parser is the root of the API. The Parser implemented a parse() method that will return parsed tokens with metadata about the text found.
The following code...
const { Parser } = require('@sudo-nymd/text-parser');
const text = `Sudo-Nymd's "text-parser"!`
const parsed = new Parser().parse(text);
console.log(parsed);... produces the following output:
[
{ type: 'word', flags: 48, value: "Sudo-Nymd's" },
{ type: 'whitespace', flags: 0, value: ' ' },
{
type: 'phrase',
flags: 3,
startChar: { type: 'character', value: '"' },
items: [
{
"type": "word",
"flags": 32,
"value": "text-parser"
}
],
value: 'text-parser',
stopChar: { type: 'character', flags: 0, value: '"' }
},
{ type: 'punctuation', flags: 0, value: '!' }
]If you'd prefer the token ASAP, you can specify a callback to be notified whenever a token is parsed:
const { Parser } = require('@sudo-nymd/text-parser');
const text = `Sudo-Nymd's "text-parser"!`
new Parser.parse(text, (token) => {
// We got a token, use it!
console.log(token);
});COMING SOON
COMING SOON
COMING SOON
The grammer of the parser is simple, and is outlined below.
A single line of text composed of one or more Literals.
: Literals
A collection of one or more Word, Phrase, Character, Punctuation, WhiteSpace, or Plugin.
: (Word | Phrase | Character | Puncuation | Whitespace | Plugin) +
Examples:
The quick, [brown fox] jumped over the "lazy dog", and the cow jumped over the {moon}!
The precending Literal consistes of
3 Phrases ([brown fox], "lazy dog", and {moon}),
3 Punctuation (2 commas and 1 exclamation point),
11 Words, and
13 Whitespaces.
A collection of Words, Characters, or Whitespace enclosed by a StartChar and a StopChar.
: StartChar
: (Word | Character | Whitespace) +
: EndChar
Examples:
"The snow is falling"
[Build Completed]
{Start}
'Mission Success'
One or more repeating characters that signify the start of a Phrase. Includes double quote, single quote, open brace, and open bracket.
: ( { | [ | " | ' )+
One or more repeating characters that signify the end of a Phrase. Includes double quote, single quote, close brace, and close bracket.
: ( } | ] | " | ' )+
Any single word.
: ([\w]+(?:.['-]?[\w]+)*)
Any whitespace
: [\s]+
Any character that is not alpha-numeric, whitespace, or a phrase start or stop character.
: [^a-zA-Z0-9{}\\[\\]"']