Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple tokenizer question #58

Closed
dhowe opened this issue Oct 4, 2018 · 5 comments
Closed

Simple tokenizer question #58

dhowe opened this issue Oct 4, 2018 · 5 comments
Labels

Comments

@dhowe
Copy link

dhowe commented Oct 4, 2018

I'm trying to match c-style identifiers that start with a '#' character, but to discard the hash character and keep the rest. In my attempts below the '#' is always included in the token:

static TextParser<Unit> HashIdent { get; } = Character.EqualTo('#').IgnoreThen(Identifier.CStyle).Value(Unit.Value);

I know I'm missing something simple...

@nblumhardt
Copy link
Member

Hi!

The tokenizer will consider the whole span matched by the parser to be the token, regardless of the value that's returned, but, you can use .Apply(HashIdent) later on in the parsing stage to get the value, i.e. Token.EqualTo(Tokens.Identifier).Apply(HashIdent).

For this to work you need to drop off the Value(Unit.Value) piece:

static TextParser<Unit> HashIdent { get; } = Character.EqualTo('#').IgnoreThen(Identifier.CStyle)

HTH!

@dhowe
Copy link
Author

dhowe commented Oct 4, 2018

Thanks, thats very helpful (I was just wondering what the apply was doing in your example code)!

When I do that, I get a compile error... do I need to change the generic type to TextSpan?

@dhowe
Copy link
Author

dhowe commented Oct 4, 2018

Also, how do I handle the opposite case, where I want to ignore the last character?

        static TextParser<Unit> Actor { get; } =
            from name in Character.LetterOrDigit.Many()
            from last in Character.EqualTo(':')
            select Unit.Value;

@AndrewSav
Copy link
Contributor

Yes, dropping Unit means that you are returning a different value now. If value is not needed for performance reasons it's better to use unit, but in your case change it to TextSpan as you suggested.

As for your second question something like this would probably work:

static TextParser<string> Actor { get; } =
			from name in Character.LetterOrDigit.Many()
			from last in Character.EqualTo(':')
			select new string(name);

@nblumhardt
Copy link
Member

I guess this water is long under the bridge - hope you found a good solution, @dhowe - closing as stale, but keep us posted :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants