Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoadMap / Future Plans #1270

Closed
bd82 opened this issue Oct 30, 2020 · 3 comments
Closed

RoadMap / Future Plans #1270

bd82 opened this issue Oct 30, 2020 · 3 comments

Comments

@bd82
Copy link
Member

bd82 commented Oct 30, 2020

The main focus at this point is simplification of Chevrotain and reducing the API surface area. The goal here is to reduce the TCO of maintaining Chevrotain. This means deprecating and removing some features that are:

  • Less commonly used.
  • Should potentially be external to the main library.
  • Have a sub-optimal implementation.

A secondary topic is modernization of tooling and the code base.

Potential features/components for deprecation evaluation.

Modernization Topics

@bd82 bd82 pinned this issue Oct 30, 2020
@bd82 bd82 changed the title Roadmap to 8.0 RoadMap Feb 28, 2021
@bd82 bd82 changed the title RoadMap RoadMap / Future Plans Feb 28, 2021
@elidoran
Copy link
Contributor

elidoran commented May 25, 2021

I thought I'd check back and see how chevrotain was going and I saw this "future plan".

I'm still of the opinion you could simplify the main use cases and add standard streaming.

I think it could be something like this:

const chevrotain = require('chevrotain')
// or, import chevrotain from 'chevrotain'

// then, to use only the lexer, build it with the tokens like:
const lexer = chevrotain.lexer({
  // this is an options object for whatever settings you allow.

  // provide the tokens in an array imported from another file:
  tokens: require('./my-tokens.js'),
})

// then, use the lexer...
const tokens = lexer.lex(string) // or tokenize(string), of course.


// for both lexing and parsing use the main exported function
// to build the parser.
// Note, this handles building the lexer internally, and,
// also does the performSelfAnalysis() before returning the parser.
// That way, all that is handled internally instead of the dev writing it out.
const parser = chevrotain({
  // once again, the options object for settings.

  // again, provide the tokens:
  tokens: require('./my-tokens.js'),

  // provide the grammar's rules in an array so they're ordered:
  rules: require('./my-grammar.js'),
})

// now, parsing strings is straight forward:
// Note, this avoids making the dev set the input string on
// the parser before calling a rule. It's done internally by the
// library. It could start with the first rule in the provided array
// and go thru the rules until it finds one which matches the
// string and starts the parsing, or, allow them to set which
// rule to run first as an option to the `chevrotain()` function,
// or, make the parse() function accept an options object 
// which contains the `string` property, and optionally,
// the name of the rule to call to start.
const result = parser.parse(string)

// or,
const result = parser.parse({
  string, // the input string
  start: 'ruleName', // the rule to start with
})

// for streaming, get a writer stream for them to
// write chunks of strings to:
const writer = parser.writer({ /* any options */ })

// then, stream the input to the parser's writer:
someInputStream.pipe(writer)

In the files 'my-tokens.js' and 'my-grammar.js', don't make the tokens and rules in a shared scope. Just put them in arrays to export. Then, provide an object containing all the tokens and an object containing all the rules to the grammar rule functions. This avoids keeping all those things in an open shared scope to be accessed outside the functions.

To get the streaming to work, the parsing needs to use some kind of "runner" or "executor" which knows how to work the parser for a string input. It holds the input string, so, yet another reason to not have the dev's set the string on the parser object itself. When it runs out of string content, or, has some at the end which it isn't able to match yet, it can hold onto the last bit of unused string, and call the "next/done" callback and return, waiting for another chunk of data to be provided. When it comes in, it'll start with the unused bit of string from before, possibly combining it with the new chunk first, and continue on. Either parsing ends up reaching a rule which is a terminal for all parsing, or, the parsing could go on indefinitely as more chunks come in.

Also, whether the parsing produces an AST, or a result object, is still up to the dev. They could do what they want in the rules to produce output. Might allow them to provide a function in the options object which returns an object, or an array, or a class instance, or whatever they want as the "result". Then, provide that to the rules functions when they're called so they can load stuff into the result. The streaming version would then need to provide a callback function to receive the "result" when one is finished.

// so, add a callback to streaming:
const writer = parser.writer(myResultCallbackFn)
inputStream.pipe(writer)

// or, for the options object:
const writer = parser.writer({
  done: myResultCallbackFn,
})
inputStream.pipe(writer)

I seem to remember there was a visitor exported, too. To do the visitor thing, provide the visitor as a function to the parse call or writer options, like this:

// synchronous:
const result = parser.parse({
  string: inputString,
  visitor: myVisitorFn,
})

// streaming:
const writer = parser.writer({
  visitor: myVisitorFn,
  done: myResultCallbackFn, // if still needed...
})
inputStream.pipe(writer)

This style seems more JavaScript-land to me than the current chevrotain API. I'm not sure which features, if any, this doesn't allow for, so, if there's something glaringly missing, I'm not doing it intentionally, I'm just not aware of it.

Oh, and, for the tokens definition, the chevrotain package's export can still contain the Token class for them to extend or instantiate when creating tokens. And the rules can still be defined like they currently are by calling the RULE function, but, the builder function does that for them, the dev only provides the name and the function. So, like this:

// the way it is now with a reference to the parser as a dollar sign,
// and all tokens available in the outer scope:
$.RULE("selectClause", () => {
  $.CONSUME(Select)
  $.AT_LEAST_ONE_SEP({
    SEP: Comma,
    DEF: () => {
      $.CONSUME(Identifier)
    }
  })
})

// file 'my-grammar.js':
// the way to define the rule as an element in an array which the 
// parser will call to make the rule:
modules.export = [ // the rules, in order:
  // Note:
  //  1. named function means parser can get 'selectClause' by fn.name.
  //  2. the other rules are provided via the `rules` arg.
  //  3. the tokens are all available in the `tokens` arg.
  //  4. the result, if provided as an option to chevrotain(), is the third arg.
  function selectClause(rules, tokens, result) {
    // `this` is the parser/runner/recorder thing.
    this.consume(tokens.select)
    this.atLeastOneSep({
      sep: tokens.comma,
      def: () => {
        this.consume(tokens.identifier)
      }
    })
  },
]

// then, the builder/parser calls the RULE function using the name of the
// of the function(s) provided in the rules array and each function as the 
// second arg. It handles this work itself, not the dev.
// so, for each rule in the array (a rule being a function):
theParser.RULE(fn.name, fn)

// so, something like:
rulesArray.forEach(fn => { this.RULE(fn.name, fn) }, parser)

Anyway, just my opinion, my thoughts, I think this looks more JavaScript-y and allows concurrent parsing of different strings at the same time. Could ask for a writer and pipe input to it, then ask for another writer and pipe input to it, and let the streaming go with each runner/executor holding the internal state so they're separate from each other even tho they're working their way thru their own input at the same time, asynchronously, as they get more input.

@bd82
Copy link
Member Author

bd82 commented May 28, 2021

Hello @elidoran and thanks for providing this in-depth feedback 👍

More JavaScript-y style APIs

When I created the original project which eventually became Chevrotain
I did not really know JavaScript very well, which would explain why the APIs are less "JavaScript-y" style...

Unfortunately implementing stylistic / subjective API changes is outside the scope at this time.
While I don't mind doing breaking changes it (See: https://chevrotain.io/docs/changes/BREAKING_CHANGES.html)
Such breaking changes would be done to reduce the (internal) complexity of the library, or add important missing functionality
less so for stylistics reasons....

Also note that many(most?) of the stylistic changes you recommend should be possible
to implement by a consumer, e.g.

Steaming APIs

This may arrive at some point in the future, particularly if I implement some lexer adapter API (#528 ),
However it is a low priority...

My own use cases for Chevrotain Parsers if often around Editors / IDEs.
In this scenario the size of the input is normally bound so the streaming API is less important for for my main uses cases.

Cheers.
Shahar.

@bd82
Copy link
Member Author

bd82 commented Jan 12, 2022

closing this in favor of #1739

@bd82 bd82 closed this as completed Jan 12, 2022
@bd82 bd82 unpinned this issue Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants