Prove that we can use UTF-8 #38

aslakhellesoy · 2013-03-06T18:01:45Z

If Gherkin3 is going to use this project as a template, we have to make sure we can scan UTF-8 encoded input since many Gherkin translations rely on the unicode character set.

A simple way to do this is to create a utf8 branch where we change && (AND) to øø everywhere, both in lexer definitions and in tests. If everything passes we're fine, if not we have a problem....

The text was updated successfully, but these errors were encountered:

aslakhellesoy · 2013-03-06T18:03:55Z

Useful SO thread: http://stackoverflow.com/questions/9611682/flexlexer-support-for-unicode

…rors are off since flex counts bytes, not characters. We can live with that since line number error reporting will be good enough for gherkin. Ref #38

aslakhellesoy · 2013-04-01T21:27:36Z

I have verified that with Ragel, multi-byte characters (such as å) work fine for recognition, but it puts the firstColumn and lastColumn values off, since they are based on ts and te, which seem to be counting bytes, not characters. This is not a huge problem since we're only likely to be using line numbers in error reporting anyway.

aslakhellesoy added a commit that referenced this issue Mar 7, 2013

UTF-8 works with java/jruby/javascript, but not with c/mri. Ref #38

97b7549

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prove that we can use UTF-8 #38

Prove that we can use UTF-8 #38

aslakhellesoy commented Mar 6, 2013

aslakhellesoy commented Mar 6, 2013

aslakhellesoy commented Apr 1, 2013

Prove that we can use UTF-8 #38

Prove that we can use UTF-8 #38

Comments

aslakhellesoy commented Mar 6, 2013

aslakhellesoy commented Mar 6, 2013

aslakhellesoy commented Apr 1, 2013