Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change messages and Gherkin parser/pickle compiler to retain step keyword #1741

Merged
merged 53 commits into from
May 25, 2022

Conversation

ehuelsmann
Copy link
Contributor

@ehuelsmann ehuelsmann commented Sep 8, 2021

Summary

Re #768: As discussed, for further discussion, implement a proposal for retention of step keywords Given/When/Then/But/And into the pickle.

Please note that the tests on this PR can't succeed until the test-data has been patched to include the new fields defined in the JSON schema. I'll take that task when the general direction has been decided to be acceptable/refined.

Details

Considerations (summarized from the issue as well as implementation-wise):

  • Some implementations use keyword-specific dispatching of step functions; without the keyword in the pickle, these implementations can't migrate to Gherkin
  • The keywords But and And are resolved to the Given/When/Then keywords they inherit their context from, because:
    • Pickles contain execution plans, i.e. with variables resolved (readily available for execution)
    • Along the same lines, the But and And keywords have been resolved to eliminate variability in intepretation
  • The step dispatch environment doesn't have the parser available to determine the meaning of the keyword as mentioned in the parsed document; therefore the GherkinDocument mentions both the keyword from the parsed text (keyword) and the key as it is used in the translations (Given/Then/When/And/But).

Motivation and Context

  1. To allow implementations to move to the Gherkin parser
  2. To retain information currently, literally, "lost in translation"
  3. Strengthen separation of concerns (remove the need to mix parsing, execution and formatting stages)

Screenshots

Not screenshots, but rendered output nonetheless.

(the first scenario of the) AST output from gherkin --predictable-ids --no-source --no-pickles testdata/good/i18n_fr.feature:

{
  "gherkinDocument": {
    "comments": [],
    "feature": {
      "children": [
        {
          "scenario": {
            "description": "",
            "examples": [],
            "id": "3",
            "keyword": "Scénario",
            "location": {
              "column": 3,
              "line": 4
            },
            "name": "Support des caractères spéciaux",
            "steps": [
              {
                "id": "0",
                "keyword": "Soit ",
                "keywordType": "Given",                                        ******
                "location": {
                  "column": 5,
                  "line": 5
                },
                "text": "un exemple de scénario en français"
              },
              {
                "id": "1",
                "keyword": "Quand ",
                "keywordType": "When",                                        ******
                "location": {
                  "column": 5,
                  "line": 6
                },
                "text": "j'ai 1 gâteau"
              },
              {
                "id": "2",
                "keyword": "Alors ",
                "keywordType": "Then",                                       ******
                "location": {
                  "column": 5,
                  "line": 7
                },
                "text": "je suis heureux"
              }
            ],
            "tags": []
          }
        },

And the pickle output for said first scenario (gherkin --predictable-ids --no-source --no-ast testdata/good/i18n_fr.feature):

{
  "pickle": {
    "astNodeIds": [
      "3"
    ],
    "id": "53",
    "language": "fr",
    "name": "Support des caractères spéciaux",
    "steps": [
      {
        "astNodeIds": [
          "0"
        ],
        "id": "50",
        "keyword": "Given",                                                     ******
        "text": "un exemple de scénario en français"
      },
      {
        "astNodeIds": [
          "1"
        ],
        "id": "51",
        "keyword": "When",                                                     ******
        "text": "j'ai 1 gâteau"
      },
      {
        "astNodeIds": [
          "2"
        ],
        "id": "52",
        "keyword": "Then",                                                     ******
        "text": "je suis heureux"
      }
    ],
    "tags": [],
    "uri": "testdata/good/i18n_fr.feature"
  }
}

See the lines with the ****** markers for the new output.

How Has This Been Tested?

It's only lightly tested, because this is mostly for discussion purposes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist:

  • The change has been ported to Java.
  • The change has been ported to Ruby.
  • The change has been ported to JavaScript.
  • The change has been ported to Go.
  • The change has been ported to .NET.
  • I've added tests for my code.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have updated the CHANGELOG accordingly.

@aslakhellesoy indicated this PR should not be blocked on the .NET implementation, as it's not ported to JSONSchema yet (still running on the old protobuf schema).

@aurelien-reeves
Copy link
Contributor

aurelien-reeves commented Sep 9, 2021

It looks promising 👍

Is the "non-keyword" * missing?
How would you manage the following:

Feature: a feature
  Scenario: a scenario
    * first step
    * second step

@ehuelsmann
Copy link
Contributor Author

Yes. The "non-keyword" is missing. If you look at

https://github.com/cucumber/common/pull/1741/files#diff-c07661832329db6cbfc36ea70c7dde480191c281e6e134531ef7b7638d144cfbR182

you'll see that the * keyword isn't being considered in the current Perl implementation of the Gherkin parser, which is why I overlooked it. There's no failure in the parser's test suite, which makes me wonder: is * still a supported keyword? Or is it there in some implementations for backward compatibility?

@aurelien-reeves
Copy link
Contributor

Yes. The "non-keyword" is missing. If you look at

https://github.com/cucumber/common/pull/1741/files#diff-c07661832329db6cbfc36ea70c7dde480191c281e6e134531ef7b7638d144cfbR182

you'll see that the * keyword isn't being considered in the current Perl implementation of the Gherkin parser, which is why I overlooked it. There's no failure in the parser's test suite, which makes me wonder: is * still a supported keyword? Or is it there in some implementations for backward compatibility?

Good question for our community meeting 😁

@ehuelsmann
Copy link
Contributor Author

ehuelsmann commented Sep 9, 2021

is * still a supported keyword? Or is it there in some implementations for backward compatibility?

Good question for our community meeting grin

So, in preparation of that meeting, I took a stab at the Ruby parser, trying to find whether it supports the * non-keyword. I found:

https://github.com/cucumber/common/blob/main/gherkin/ruby/lib/gherkin/token_matcher.rb#L120-L124

which suggests that it also doesn't. So, definitely something to discuss!

@brasmusson
Copy link
Contributor

My take on the "*" keyword is that when using it, the step should be allowed to match step definitions defined with all of "Given", "When", and "Then".

From the original discussions of "unambiguous keywords", I think I remember the view that defining both step definitions for "Given " and "When " should (still?) be an error, so the keywords in the pickles should rather be for verifying the keyword of the step definition, than for selecting one of several candidates of step definitions.

@brasmusson
Copy link
Contributor

As long as the "* " keyword is defined in the gherkin-languages.json is is supported, isn't it?

@mpkorstanje
Copy link
Contributor

The keywords But and And are resolved to the Given/When/Then keywords

I would suggest deferring this to a the cucumber implementation. It's non trivial, especially with * keyword is used. The pickle may also be used in representations in which case it would be confusing to see it replaced.

@gasparnagy gasparnagy self-requested a review September 9, 2021 14:14
@ehuelsmann
Copy link
Contributor Author

This is how * keyword is supported: https://github.com/cucumber/common/blob/main/gherkin/gherkin-languages.json#L4 shows that * is a translation/dialect for the And keyword.

@sebrose
Copy link
Member

sebrose commented Sep 9, 2021

  • It looks like '*' is supported in some Cucumber Rubys (I tried it in 4.0.0)
  • By converting the actual keyword used in the scenario into G/W/T in the pickle, we lose the ability to reconstitute the scenario as written
  • There's nothing to prevent a scenario starting with a conjunction (And / But)

@aurelien-reeves
Copy link
Contributor

This is how * keyword is supported: https://github.com/cucumber/common/blob/main/gherkin/gherkin-languages.json#L4 shows that * is a translation/dialect for the And keyword.

Yeah
Actually it is a translation for all given/when/then/and/but keyword.
I guess it fallback as "Given" if needed?

@gasparnagy
Copy link
Member

On the other hand, we realized with @sebrose that treating * just as a simple synonym of And is not perfect... e.g. a scenario with only * steps might be valid on the other hand having a scenario with only Ands does not look good...

So if we do this change it would be good if we could semantically detach the * from the Ands...

@davidjgoss
Copy link
Contributor

By converting the actual keyword used in the scenario into G/W/T in the pickle, we lose the ability to reconstitute the scenario as written

I think the intent was to have this as well as the verbatim keyword, not instead?

@brasmusson
Copy link
Contributor

The "* " is a valid keyword for Given, When, Then, And, But (see gherkin-languages.json). So any parser that passes the test supports "* " (see this ast.ndjson). Currently its not a problem that "* " is a valid keyword for all types, but when converting to G/W/T then the question of how to convert "* " occurs. My suggestion in that case, it should be converted to the "*" type (which would be valid for all step definitions).

@mattwynne
Copy link
Member

mattwynne commented Sep 9, 2021

When we refer to the "phase" of the scenario, the thing that's called keywordType in the gherkinDocument message, I would like us to be consistent, so we should use the same name for that field in the pickle message too.

I suggest we make this an enumerable with the values:

  • Ambiguous
  • Context (i.e. Given)
  • Action (i.e. When)
  • Outcome (i.e. Then)

We can map *, And and But to Ambiguous, and let something downstream try to figure out what the implied keyword type might be, as Rien suggested in the meeting today. I like the idea that the parser doesn't try to do too much here.

I am still curious about where we would put that logic so that all implementations can do it consistently.

I think it's clearer to avoid the English Gherkin keywords (Given, When, Then) here, so this stands out as being a distinct thing from the keyword used in the document.

Edit: I think Ambiguous is better than Unknown

@ehuelsmann
Copy link
Contributor Author

@gasparnagy writes:

On the other hand, we realized with @sebrose that treating * just as a simple synonym of And is not perfect... e.g. a scenario with only * steps might be valid on the other hand having a scenario with only Ands does not look good...

I'd agree that a scenario of pure conjunctions (And / But) doesn't look good. However, if I'm understanding correctly the way Cucumber Ruby/Java work now, it's still allowed. To that extent, I'd say that the argument @mpkorstanje and @mattwynne (?) made to leave the intepretation of the keywords to the actual executor and not hard-code it in the Pickle compiler, does allow the executor to object to scenarios like these (or simply assume "* ").

So if we do this change it would be good if we could semantically detach the * from the Ands...

I think that if I change the PR to have the same fields keyword and keywordType in PickleStep the way it's now done in the Step, with the addition of the new keywordType * (in addition to the keywordTypes Given/When/Then//And/But), the code will be able to handle all comments made so far.

@brasmusson
Copy link
Contributor

We can map *, And and But to Ambiguous

@mattwynne I disagree with that * is mapped the same way And and But. The current meaning of And and But is "the same as the previous step". But that is currently not the meaning of *, as it is a valid keyword for Given, When, Then, And and But.

@gasparnagy
Copy link
Member

gasparnagy commented Sep 9, 2021

I like the abstraction of phase suggested my @mattwynne. What about these?

  • Context -- like Given
  • Action -- like When
  • Outcome -- like Then
  • Conjunction or Inherited -- like And and But
  • Step or General -- for the *

@mattwynne
Copy link
Member

But that is currently not the meaning of *, as it is a valid keyword for Given, When, Then, And and But.

Good point. In fact we might find (now or in the future) that there are other strings mapped in the gherkin language spec that could be mapped back to more than one keyword.

@ehuelsmann
Copy link
Contributor Author

But that is currently not the meaning of *, as it is a valid keyword for Given, When, Then, And and But.

Good point. In fact we might find (now or in the future) that there are other strings mapped in the gherkin language spec that could be mapped back to more than one keyword.

Assuming that more of these exist today, should all of these be mapped to "General" in that case? Doing so will at least give us the desired behaviour for * without special-casing.

@ehuelsmann
Copy link
Contributor Author

Well, do we have some documentation to update beside the changelog?

All that's going on here really is the implementation of a change in messages, which was documented in a prior PR. I'd expect all we need to do is to do the changelog item -- all other documentation is in the messages definitions (and there, we have some debt to repay, but that was true far before this PR).

@aurelien-reeves aurelien-reeves marked this pull request as ready for review May 25, 2022 09:03
@aurelien-reeves
Copy link
Contributor

Last call for discussion and/or objection

Are we good? Can we merge? Are we still waiting for something else?

Copy link
Member

@gasparnagy gasparnagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

@aslakhellesoy aslakhellesoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aslakhellesoy aslakhellesoy enabled auto-merge (squash) May 25, 2022 13:31
@aslakhellesoy aslakhellesoy merged commit b155220 into main May 25, 2022
@aslakhellesoy aslakhellesoy deleted the retain-step-keyword branch May 25, 2022 13:42
elchupanebrej pushed a commit to elchupanebrej/pytest-bdd-ng that referenced this pull request Jul 14, 2022
elchupanebrej pushed a commit to elchupanebrej/pytest-bdd-ng that referenced this pull request Jul 14, 2022
lamkr pushed a commit to lamkr/cucumber-common that referenced this pull request Aug 17, 2022
mpkorstanje added a commit to cucumber/gherkin that referenced this pull request Nov 11, 2022
We did not specify a version range for the Cucumber::Messages dependency. This
wasn't a problem before, every version of message was usable. However once
cucumber/common#1741 was merged this was no longer
true.

Fixes #48
mpkorstanje added a commit to cucumber/gherkin that referenced this pull request Nov 11, 2022
We did not specify a version range for the Cucumber::Messages dependency. This
wasn't a problem before, every version of message was usable. However once
cucumber/common#1741 was merged this was no longer
true.

Fixes #48
mpkorstanje added a commit to cucumber/gherkin that referenced this pull request Dec 2, 2022
We did not specify a version range for the Cucumber::Messages dependency. This
wasn't a problem before, every version of message was usable. However once
cucumber/common#1741 was merged this was no longer
true.

Fixes #48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet