Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gherkin: Duplicate keywords in translations #17

Open
ciaranmcnulty opened this issue Feb 1, 2021 · 3 comments
Open

gherkin: Duplicate keywords in translations #17

ciaranmcnulty opened this issue Feb 1, 2021 · 3 comments

Comments

@ciaranmcnulty
Copy link
Contributor

Summary

In Behat/Gherkin we use the cucumber i18n data for translations. We have some tests that use this data to:

  • Generate a gherkin string using the different variations of the keywords
  • Generate a parsed AST that corresponds to that feature
  • Check the parser creates a similar AST when consuming the gherkin

This tends to fail when there are duplicate keywords inside a language's translation, so we skip some languages when doing this test.

Currently we are excluding the following from the tests:

  • ne: अनी means both And and Given
  • uz: Агар means both Given and When
  • en-old: Tha means both When and Then

Impact

Aside from failing tests easily skipped, in Behat our AST retains the 'type' of the node, which in some cases will now be wrong vs what the author intended

Maybe more importantly, in those languages there is a word that means two different things. This may damage education efforts around the G/W/T structure.

Possible Solution

  1. fix the languages listed above to be unambiguous - this would have backwards compatibility issues so might not be acceptable
  2. ensure via automated tests that duplicated words cannot be added to new or existing translations
@aslakhellesoy
Copy link
Contributor

I agree we should remove duplicates. We fixed a similar issue in cucumber/common@a54f32d - see https://github.com/cucumber/cucumber/blob/master/gherkin/CHANGELOG.md#1501---2020-08-12 where we removed keywords that only differed by case.

We need to bump the major version when/if we fix this, but that's no big deal.

@stof
Copy link

stof commented Feb 17, 2021

As the keyword would keep working for a step, I'm not even sure this creates a BC issue. They would still be usable for steps (as you would only remove one of the occurrences of the duplicate, not both). And to preserve BC with step type, it would be a matter of keeping the one corresponding to the step type being detected currently by cucumber implementations (but I think this is not even consistent between implementations, and may not cause issues)

@ehuelsmann
Copy link
Contributor

@aslakhellesoy isn't this solved with the cucumber/common#1741 merge?

@mpkorstanje mpkorstanje transferred this issue from cucumber/common Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants