Skip to content
This repository has been archived by the owner on May 28, 2019. It is now read-only.

Prove that we can use UTF-8 #38

Open
aslakhellesoy opened this issue Mar 6, 2013 · 2 comments
Open

Prove that we can use UTF-8 #38

aslakhellesoy opened this issue Mar 6, 2013 · 2 comments

Comments

@aslakhellesoy
Copy link
Contributor

If Gherkin3 is going to use this project as a template, we have to make sure we can scan UTF-8 encoded input since many Gherkin translations rely on the unicode character set.

A simple way to do this is to create a utf8 branch where we change && (AND) to øø everywhere, both in lexer definitions and in tests. If everything passes we're fine, if not we have a problem....

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy added a commit that referenced this issue Mar 7, 2013
…rors are off since flex counts bytes, not characters. We can live with that since line number error reporting will be good enough for gherkin. Ref #38
@aslakhellesoy
Copy link
Contributor Author

I have verified that with Ragel, multi-byte characters (such as å) work fine for recognition, but it puts the firstColumn and lastColumn values off, since they are based on ts and te, which seem to be counting bytes, not characters. This is not a huge problem since we're only likely to be using line numbers in error reporting anyway.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant