Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume token single character pattern case #1141

Closed
janisdd opened this issue Mar 5, 2020 · 3 comments
Closed

Consume token single character pattern case #1141

janisdd opened this issue Mar 5, 2020 · 3 comments

Comments

@janisdd
Copy link
Contributor

janisdd commented Mar 5, 2020

It looks like a token with a single character e.g. /a/i ignores the case insensitive flag.

Here is an example to paste in the playground at https://sap.github.io/chevrotain/playground/

(function jsonGrammarOnlyExample() {
  // ----------------- Lexer -----------------
  const createToken = chevrotain.createToken;
  const Lexer = chevrotain.Lexer;

  const TestToken = createToken({name: "TestToken", pattern: /a/i});
  const Newline = createToken({ name: "NewlineToken", pattern: /[\n]/ })
  // ----------------- parser -----------------
  const Parser = chevrotain.Parser;
  
const jsonTokens = [TestToken, Newline];
const JsonLexer = new Lexer(jsonTokens);
  
  class JsonParser extends Parser {
    constructor() {
      super(jsonTokens)

      const $ = this;

      $.RULE("Test", () => {
        $.CONSUME(TestToken)
        return $.ACTION(() => {
          return true;
        })
      });

      
      this.performSelfAnalysis();
    }

  }
  return {
    lexer: JsonLexer,
    parser: JsonParser,
    defaultRule: "Test"
  };
}())

When you input a it works fine but fails for A.
However, if you change the regex pattern to /(a)/i it works for both cases.

I know it's just a small issue and can also easily worked around with /(a|A)/ but just wanted to let you know.

It might be connected to issue #817 and/or #708

Or am I missing something?

@bd82
Copy link
Member

bd82 commented Mar 6, 2020

Thanks for reporting this @janisdd I am guessing this may be related to some single character optimization not taking into account the existence of the ignore case flag

@janisdd
Copy link
Contributor Author

janisdd commented Mar 7, 2020

Here is a possible fix

https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/src/scan/lexer.ts#L127

-regExpSource !== "."
+regExpSource !== "." &&
+ !currPattern.ignoreCase

And maybe the following tests below
https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/test/scan/lexer_spec.ts#L111

it("can create a token from a single character regex with ignore flag", () => {
  let input = "a"
  let result = testLexer.tokenize(input)
  expect(tokenMatcher(result.tokens[0], SingleCharacterWithIgnoreCaseFlagTok)).to.be.true
  expect(result.tokens[0].image).to.equal("a")
  expect(result.tokens[0].startOffset).to.equal(0)

  input = "A"
  result = testLexer.tokenize(input)
  expect(tokenMatcher(result.tokens[0], SingleCharacterWithIgnoreCaseFlagTok)).to.be.true
  expect(result.tokens[0].image).to.equal("A")
  expect(result.tokens[0].startOffset).to.equal(0)
})

I tried adding a check below https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/test/scan/regexp_spec.ts#L81
like

it("Can compute for single ignore case", () => {
  expect(getOptimizedStartCodesIndices(/a/i)).to.deep.equal([
    65,
    97
  ])
})

but it seems to work fine there without the fix. So this test is probably not needed.


I also noticed some smaller unattractive typings

In https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/src/scan/lexer.ts there are many any typed vars e.g. onlyRelevantTypes, allTransformedPatterns, ...

In https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/src/utils/utils.ts#L44 the type is missing. It probably should be

-export function map<I, O>(arr: I[], callback: (I, idx?: number) => O): O[] {
+export function map<I, O>(arr: I[], callback: (i:I, idx?: number) => O): O[] {

After this changed there are some errors in the https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/src/scan/lexer.ts
which are a bit harder to fix (requires probably much more work to figure out)

This led me to some other things in https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/src/utils/utils.ts

https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/src/utils/utils.ts#L93 can be (using user defined type guards)

export function isString(item: any): item is string {
  return typeof item === "string"
}

same as the others below isUndefined, isFunction (and maybe some others).

But this is somewhat out of scope here... just wanted to write it down ;)

@bd82
Copy link
Member

bd82 commented Mar 7, 2020

Here is a possible fix

Thanks @janisdd do you want to create a PR?

I also noticed some smaller unattractive typings

I suppose this is a combination of:

  • Implementing own utilities (e.g map/isFunction) without the best Type Signatures.
  • My personal opinion that "any" types are not a huge issue in "internal function code. I am much more focused on having good clear external APIs signatures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants