From 45a35114925a640904ae361310a8dfcdc234c9a0 Mon Sep 17 00:00:00 2001 From: acoulton Date: Thu, 26 Mar 2026 12:07:13 +0000 Subject: [PATCH] docs: document Gherkin parser compatibility mode --- user_guide/gherkin.rst | 12 +++ user_guide/gherkin/parser_mode.rst | 164 +++++++++++++++++++++++++++++ 2 files changed, 176 insertions(+) create mode 100644 user_guide/gherkin/parser_mode.rst diff --git a/user_guide/gherkin.rst b/user_guide/gherkin.rst index 11589fb..57c4bcb 100644 --- a/user_guide/gherkin.rst +++ b/user_guide/gherkin.rst @@ -16,6 +16,12 @@ real, human language telling you what code you should write. If you're still new to Behat, jump into the :doc:`/quick_start` first, then return here to learn more about Gherkin. +.. note:: + + You can configure whether Behat's Gherkin parsing is compatible with + previous Behat versions, or with the official ``cucumber/gherkin`` + parsers. See :doc:`gherkin/parser_mode` for more details. + Gherkin Syntax -------------- @@ -103,3 +109,9 @@ run: Behat the ability to have multilanguage features in one suite. .. _`Business Readable, Domain Specific Language`: http://martinfowler.com/bliki/BusinessReadableDSL.html + +.. toctree:: + :maxdepth: 2 + :hidden: + + gherkin/parser_mode diff --git a/user_guide/gherkin/parser_mode.rst b/user_guide/gherkin/parser_mode.rst new file mode 100644 index 0000000..23885bb --- /dev/null +++ b/user_guide/gherkin/parser_mode.rst @@ -0,0 +1,164 @@ +Gherkin Compatibility Mode +========================== + +Behat uses the `behat/gherkin`_ library to parse your feature files into the data structures that +Behat will use to execute them. + +In most cases, this parses identically to `the official parsers provided by the Cucumber project`_. +However, there are some small differences in how our parser has traditionally treated some specific +syntax compared to the official parsers. + +To resolve this, we have added a ``GherkinCompatibilityMode`` setting to the parser. This setting +has two possible options: + +* ``GherkinCompatibilityMode::LEGACY`` - match our previous behaviour. This is the default in Behat 3.x. +* ``GherkinCompatibilityMode::GHERKIN_32`` - match the official parsers. This will become the default in Behat 4.0. + +.. caution:: + ``GherkinCompatibilityMode::GHERKIN_32`` is currently considered experimental. We expect that + there will be more changes to how the parser behaves in this mode before we mark it as stable. + +Configuring the parser mode +--------------------------- + +In Behat >= 3.30, you can specify the parser compatibility mode for your project in +your :doc:`/user_guide/configuration`: + +.. code-block:: php + + withProfile(new Profile('default') + ->withGherkinOptions(new GherkinOptions() + ->withCompatibilityMode(GherkinCompatibilityMode::GHERKIN_32) + ) + ) + ; + +Differences between parser modes +-------------------------------- + +Tables containing whitespace or escaped newlines +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In ``GHERKIN_32`` mode, table cells can include newlines, which will be unescaped during parsing. Note that +newlines are unescaped **after** we remove the cell padding. + +For example, with the following table: + +.. code-block:: gherkin + + Given 3 lines of poetry on 5 lines: + | \nraindrops--\nher last kiss\ngoodbye.\n | + +In ``GHERKIN_32`` mode, the table will parse as: + +.. code-block:: php + + [ + [ + <<getTags()`` will **include** the ``@`` prefix. In legacy mode, + this was removed. This may affect custom hooks / event listeners that inspect the tag values at + runtime. + + +File language +~~~~~~~~~~~~~ + +In ``GHERKIN_32`` mode, if a file includes a ``#language`` annotation: + +* Any whitespace in / around the tag will be ignored - so ``# language : fr`` will be + recognised as a valid language tag. In legacy mode, this would have been treated as a comment. +* Parsing fails if the language is not recognised - so ``#language: no-such`` will cause an error. + In legacy mode, this would have been ignored and parsing would continue in the default language. + +Whitespace following step keywords +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In ``GHERKIN_32`` mode, a space between a step keyword and the rest of the text is treated as part of the keyword. This +is because in a small number of languages there is no space after the keyword. + +With a step in English like ``Then something should happen``, if you call ``StepNode::getKeyword()`` then: + +* In ``GHERKIN_32`` mode the return value will be ``'Then '`` +* In ``LEGACY`` mode the return value will be ``'Then'`` + +In a language that does not place spaces after the keyword (e.g. Japanese), the return value will be the same in both +modes. + +Elements with descriptions +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Gherkin syntax allows multi-line descriptions on ``Feature:``, ``Background:``, ``Scenario:``, ``Scenario Outline:``, +and ``Examples:`` elements. + +Historically, we only parsed the description separately for a ``Feature`` node. For other nodes, we parsed the full +text as a multi-line title. + +In ``GHERKIN_32`` mode, if one of the elements listed above has multi-line text, then: + +* The first line (containing the keyword) will be parsed as the title. +* Following lines will be parsed as the description. +* Any blank lines between the title & description will be ignored (in legacy mode, these were included at the start of + the description). +* Any left padding will be removed from the first line of the description, but subsequent lines will have the same + left padding / indentation as the feature file. In legacy mode, we attempted to left-trim all lines to match the + indentation of the keyword. + + +.. _`behat/gherkin`: http://martinfowler.com/bliki/BusinessReadableDSL.html +.. _`the official parsers provided by the Cucumber project`: https://github.com/cucumber/gherkin