support English keywords, i18N generation for Jython (excluding some languages) #197

Merged
merged 8 commits into from Feb 8, 2012

Conversation

Projects
None yet
2 participants
@sabrams
Contributor

sabrams commented Feb 7, 2012

Hi Aslek,

This update adds support for English keywords, and includes auto-generation code for other languages. Languages that can not be normalized to be usable with ASCII are excluded from the autogeneration, as the current Jython support uses Python class names to support keywords (where Unicode is not supported).

This does not add support for using these languages yet, as I have not yet found a way to load these files from the PythonInterpretor used in JythonBackend correctly when imported from a step def file.

@sabrams

This comment has been minimized.

Show comment
Hide comment
@sabrams

sabrams Jan 26, 2012

Owner

This isn't yet ready for a pull. The generated Jython is correct and non-CLI tests pass when I manually import the EN.py file, but its not running seamlessly with Maven yet (build fails).

Owner

sabrams commented on e359d04 Jan 26, 2012

This isn't yet ready for a pull. The generated Jython is correct and non-CLI tests pass when I manually import the EN.py file, but its not running seamlessly with Maven yet (build fails).

This comment has been minimized.

Show comment
Hide comment
@aslakhellesoy

aslakhellesoy Jan 27, 2012

Looks nice. It would be nice if each stepdef script could import the DSL like this:

import cucumber.runtime.jython.EN

@Given('I have (\d+) "(.+)" in my belly')
def something_in_the_belly(self, n, what):
  self.n = int(n)
  self.what = what

Looks nice. It would be nice if each stepdef script could import the DSL like this:

import cucumber.runtime.jython.EN

@Given('I have (\d+) "(.+)" in my belly')
def something_in_the_belly(self, n, what):
  self.n = int(n)
  self.what = what

This comment has been minimized.

Show comment
Hide comment
@sabrams

sabrams Jan 27, 2012

Owner
Owner

sabrams replied Jan 27, 2012

This comment has been minimized.

Show comment
Hide comment
@aslakhellesoy

aslakhellesoy Jan 27, 2012

Aha. The codeKeywords strings are definitely unicode, so for jython I think the best is to convert them to ascii. (Watch out for dupes after asciification).

I think java.text.Normalizer might do the trick - haven't tried it. It might not work for Chinese, Arabic, Hebrew etc - so maybe fall back to EN if a language's keywords can't be ASCIIfied?

Aha. The codeKeywords strings are definitely unicode, so for jython I think the best is to convert them to ascii. (Watch out for dupes after asciification).

I think java.text.Normalizer might do the trick - haven't tried it. It might not work for Chinese, Arabic, Hebrew etc - so maybe fall back to EN if a language's keywords can't be ASCIIfied?

This comment has been minimized.

Show comment
Hide comment
@aslakhellesoy

aslakhellesoy Jan 28, 2012

I have made some changes in my sabrahams-jython-i18n branch which improve this a little...

I have made some changes in my sabrahams-jython-i18n branch which improve this a little...

aslakhellesoy and others added some commits Jan 27, 2012

Slightly improved python i18n. This gets rid of latin accents (fixes …
…NO, FR etc), but symbolic characters are still there....
steps toward Jython i18n support, still one hard-coded piece to remov…
…e, and some work around languages having unicode issues
@aslakhellesoy

This comment has been minimized.

Show comment
Hide comment
@aslakhellesoy

aslakhellesoy Jan 30, 2012

This seems like a decent workaround. Latin-to-ASCII able Locales get translated. Others don't because it's a limitation of Python/Jython.

Am I right?

This seems like a decent workaround. Latin-to-ASCII able Locales get translated. Others don't because it's a limitation of Python/Jython.

Am I right?

This comment has been minimized.

Show comment
Hide comment
@sabrams

sabrams Jan 31, 2012

Owner

That is correct. To support the annotation format we're given by the Jython interpreter for these unicode, non-ascii-able languages, there will at least need to be some pre-processing phase on the file. One possibility for a future update: we could use a character replacement mechanism on both the I18N file generator and this pre-processing phase, creating classes whose names were ascii-able. For example, if in English we suddenly started spelling Given with the Yen sign (Gi¥en), and someone wrote the step def:

@gi¥en('I have (\d+) "(.+)" in my belly')
def something_in_the_belly(self, n, what):
self.n = int(n)
self.what = what

we could use a preprocess phase that parses this file, and feeds this to the Jython parser:

@GiU_00A5en('I have (\d+) "(.+)" in my belly')
def something_in_the_belly(self, n, what):
self.n = int(n)
self.what = what

This would run, grabbing the class name from the annotation, where the class def was already loaded in the EN.py I18N file:

And = But = GiU_00A5en = Then = When = I18NKeywordTemplate

(U+00A5 is Unicode char point for Yen sym)

Anyway, support for most of the languages is almost there - just need a way to load them and make them usable from any context (maven, cli, ide)

Owner

sabrams replied Jan 31, 2012

That is correct. To support the annotation format we're given by the Jython interpreter for these unicode, non-ascii-able languages, there will at least need to be some pre-processing phase on the file. One possibility for a future update: we could use a character replacement mechanism on both the I18N file generator and this pre-processing phase, creating classes whose names were ascii-able. For example, if in English we suddenly started spelling Given with the Yen sign (Gi¥en), and someone wrote the step def:

@gi¥en('I have (\d+) "(.+)" in my belly')
def something_in_the_belly(self, n, what):
self.n = int(n)
self.what = what

we could use a preprocess phase that parses this file, and feeds this to the Jython parser:

@GiU_00A5en('I have (\d+) "(.+)" in my belly')
def something_in_the_belly(self, n, what):
self.n = int(n)
self.what = what

This would run, grabbing the class name from the annotation, where the class def was already loaded in the EN.py I18N file:

And = But = GiU_00A5en = Then = When = I18NKeywordTemplate

(U+00A5 is Unicode char point for Yen sym)

Anyway, support for most of the languages is almost there - just need a way to load them and make them usable from any context (maven, cli, ide)

This comment has been minimized.

Show comment
Hide comment
@aslakhellesoy

aslakhellesoy Feb 6, 2012

This sounds like it would allow people to write invalid python code. (non-ASCII annotations are invalid python as it seems).

Allowing people to write invalid python doesn't sound like a good idea to me. I think people working in non-ASCII languages should be forced to use ASCII.

This sounds like it would allow people to write invalid python code. (non-ASCII annotations are invalid python as it seems).

Allowing people to write invalid python doesn't sound like a good idea to me. I think people working in non-ASCII languages should be forced to use ASCII.

This comment has been minimized.

Show comment
Hide comment
@sabrams

sabrams Feb 7, 2012

Owner

I'm in agreement about keeping the Python code valid.

Owner

sabrams replied Feb 7, 2012

I'm in agreement about keeping the Python code valid.

@aslakhellesoy aslakhellesoy merged commit c0f263c into cucumber:master Feb 8, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment