change: Do not transliterate when suggesting method names for step definitions #1633

acoulton · 2025-04-25T14:35:31Z

As discussed in #1632, we should just remove our behat/transliterator dependency. We use very little of it, it is out of date, and it avoids the licensing issues discussed in Behat/Transliterator#3 and #1343.

This PR starts by refactoring the duplicated code that generates method names into a single service that implements the new StepMethodNameGenerator interface. This would allow people to define their own implementations if they want something more complex (or with more dependencies) than we provide out of the box.

Our default SimpleStepMethodNameGenerator then does the bare minimum to produce a unicode name that is syntactically valid for PHP.

This is all backwards compatible at the code level (including if people have registered a custom PatternPolicy class).

At a functional level, it obviously changes the method names we'll suggest - including that there are some cases where we'll now just suggest a generic name where previously we would have attempted to transliterate.

Previously, most of this was done within duplicated methods in the `PatternPolicy` implementations and passed into the `Pattern` as `canonicalText`. There are lots of potential interpretations of the `canonical text` of a step but in practice we were only using this as the basis of the suggested method name when generating snippets. This suggestion was processed almost entirely in the `PatternPolicy` class with one final step in the `ContextSnippetGenerator` to lowercase the first character. It makes more sense to consolidate all that logic together, and to make explicit that this property on the `Pattern` is the suggested method name.

Expose it in the service container so that users (particularly those working in other languages) can provide a more complex implementation if they wish to do so.

Drop the behat/transliterator dependency and simply suggest step method names with the full utf8 characters used in the feature file.

Just remove any characters that are not within the allowed set for method names (including digits at the start of the name), and fall back to the generic `stepDefinition1` if the text is completely invalid. Since we may now have utf8 characters in the suggestion, we also need to make the case conversions (to lowerCamelCase) multibyte-safe. Fortunately, Behat already depends on ext-mbstring.

acoulton · 2025-04-25T15:33:00Z

features/i18n_annotations.feature


          #[Then('/^Добавить "([^"]*)" число$/')]
-          public function dobavitChislo($arg1): void
+          public function stepDefinition1($arg1): void


We will now just suggest generic names for some non-latin languages where the entire character set is outside the a-zA-Z0-9_\x80-\xff unicode range.

We could attempt to detect whether ext-intl is present and use the native Transliterator class as suggested by @stof in #1632.

However I'd be interested to hear what Russian etc speakers would actually want to have here - is a straight machine transliteration useful, or would they likely end up changing it anyway?

My gut feel is it would make more sense to leave this as-is "out of the box". It would now be easy for us (or a.n. other) to implement a Behat extension with a hard dependency on ext-intl for more complex cases like this.

acoulton · 2025-04-25T15:34:45Z

features/legacy_snippets.feature

+          Then 5 should have value of £10 # FeatureContext::shouldHaveValueOf£()
            TODO: write pending definition
-          And 7 should have value of £7.2 # FeatureContext::shouldHaveValueOfPs()
+          And 7 should have value of £7.2 # FeatureContext::shouldHaveValueOf£()


I actually think the new ValueOf£ makes more sense than ValueOfPs anyway!

acoulton · 2025-04-25T15:35:43Z

features/legacy_snippets.feature


          #[Given('I have a package v2.5')]
-          public function iHaveAPackageV(): void
+          public function iHaveAPackageV25(): void


It would be trivial to strip trailing numbers as before. However I'm not sure it's warranted (bearing in mind this number is in the hardcoded step text, it is not a placeholder).

this case is fine to me

src/Behat/Behat/Context/Snippet/Generator/ContextSnippetGenerator.php

acoulton · 2025-04-25T15:37:31Z

src/Behat/Behat/Definition/Pattern/Pattern.php

-    public function getCanonicalText()
+    public function getSuggestedMethodName(): string


The unified diff looks at a glance like I've just renamed the method - in fact getCanonicalText is still present but deprecated, and getSuggestedMethodName is a new method.

acoulton · 2025-04-25T15:38:28Z

src/Behat/Behat/Definition/Pattern/Pattern.php

+    public function __construct(
+        private string $suggestedMethodName,
+        private string $pattern,
+        private int $placeholderCount = 0,
+    ) {
+    }


These previously had PHPDoc typehints, I would also be very surprised if anyone externally is creating a Pattern instance and certainly not with arguments that won't be able to cast to these types. I think therefore it's safe to make them strict.

src/Behat/Behat/Context/Snippet/Generator/ContextSnippetGenerator.php

stof · 2025-04-25T15:40:32Z

src/Behat/Behat/Context/Snippet/Generator/ContextSnippetGenerator.php

    private function getMethodName(string $contextClass, string $canonicalText, string $pattern): string
    {
-        $methodName = $this->deduceMethodName($canonicalText);
+        $methodName = SimpleStepMethodNameGenerator::cleanupMethodName($canonicalText);


should we trigger a deprecation here when the cleaned value is different from the provided argument, to ask using the StepMethodNameGenerator in the pattern policy ?

The problem is we have no neat way to report that deprecation to the user at the moment, and even if we did if they are consuming a PatternPolicy from an extension that could be difficult for them to resolve in the short term.

Considering all the method actually does is lowercase the first letter and backfill empty strings, I almost considered removing it now - worst case someone will get a suggested method with an uppercase first character (or perhaps no name) and have to fix it in their Context class... It's not going to actually break anything. WDYT?

@carlos-granados I have reflected more on @stof's comment I think the benefit of lowercasing the first character (in the unlikely situation someone is using a custom PatternPolicy) is limited for the cost of exposing this method and changing it again at some future date. I've reworked to just handle the case where the method name is empty, which avoids having to have this method at all.

src/Behat/Behat/Definition/ServiceContainer/DefinitionExtension.php

carlos-granados · 2025-04-29T17:45:27Z

src/Behat/Behat/Definition/Pattern/SimpleStepMethodNameGenerator.php

+     * @deprecated provided only for BC, to allow the ContextSnippetGenerator to continue doing this cleanup for any
+     *             steps that were generated by a third-party PatternPolicy that does not use our StepMethodNameGenerator
+     */
+    public static function cleanupMethodName(string $name): string


If several steps end up with an empty name and we want to use the generic name we will get several functions named stepDefinition1 which is not ideal. Perhaps you could have a static int property which is initialised to 1 and appended to stepDefinition and then increased?

Although they will all be suggested as stepDefinition1 initially ContextSnippetGenerator::getUniqueMethodName() will then convert these to stepDefinition2, stepDefinition3 etc before suggesting the actual snippets. This then also takes account of methods that are already defined in the context (rather than just from this execution).

It doesn't actually look like we have test / feature coverage of that - I could add that and a quick comment here to make the behaviour clearer / more reliable in future?

Ah, now I realise what you mean. Yes, some test coverage and a comment would certainly help

@carlos-granados turns out the "numbering duplicate step definitions" behaviour is actually covered already in snippets.feature:

Behat/features/snippets.feature

Lines 104 to 114 in 2ea762d

#[Then('/^I should have "([^"]*)"$/')]

public function iShouldHave2($arg1): void

{

throw new PendingException();

}

#[Then('/^I should get a "([^"]*)":$/')]

public function iShouldGetA2($arg1, PyStringNode $string): void

{

throw new PendingException();

}

I have added comments to the new code, and also renamed from StepMethodNameGenerator::generate() to StepMethodNameSuggester::suggest() to make more obvious that these are just suggestions and may not be the final result (also makes it more consistent with $pattern->getSuggestedMethodName()`

carlos-granados

Looking mostly good, just added one comment

Renamed the interface and method to make it clearer that this code is only responsible for *suggesting* a method name, which may later be changed by the ContextSnippetGenerator. And added some comments to explain why suggested names do not need to be unique.

…licy The previous implementation introduced a new-but-deprecated method to attempt to handle the (unlikely) case that a user has a custom PatternPolicy suggesting step method names. In practice, the only thing that is actually *required* to produce valid code is that the name is not empty. We can easily handle that without extracting a complete internal method. Worst case, if a user's custom code is not lowercasing the first character of their method name, we may generate a method that does not follow their coding style. This feels like a reasonable trade-off for the improvement to code clarity / structure here.

carlos-granados

Looking good, great job @acoulton

acoulton added 7 commits April 25, 2025 15:05

refactor: Extract shared logic for suggesting a step method name

3fb4aa9

refactor: Convert Pattern to constructor promoted properties

74f350b

refactor: Add constructor promotion and strict types to Pattern

2a4749c

feature: Allow users to use or customise StepMethodNameGenerator service

c67c4e3

Expose it in the service container so that users (particularly those working in other languages) can provide a more complex implementation if they wish to do so.

change: Do not transliterate suggested step method names to ASCII

3aab293

Drop the behat/transliterator dependency and simply suggest step method names with the full utf8 characters used in the feature file.

acoulton commented Apr 25, 2025

View reviewed changes

src/Behat/Behat/Context/Snippet/Generator/ContextSnippetGenerator.php Outdated Show resolved Hide resolved

acoulton commented Apr 25, 2025

View reviewed changes

acoulton changed the title ~~[WIP] Refactor step method name generation and remove transliteration~~ change: Do not transliterate step text when suggesting step method names Apr 25, 2025

acoulton changed the title ~~change: Do not transliterate step text when suggesting step method names~~ change: Do not transliterate when suggesting method names for step definitions Apr 25, 2025

acoulton marked this pull request as ready for review April 25, 2025 15:40

acoulton requested a review from carlos-granados April 25, 2025 15:40

stof reviewed Apr 25, 2025

View reviewed changes

acoulton mentioned this pull request Apr 25, 2025

change: Mark that this package is now abandoned Behat/Transliterator#41

Merged

carlos-granados reviewed Apr 29, 2025

View reviewed changes

acoulton added 3 commits April 30, 2025 06:33

change: Make new method name suggestion classes final

b40c126

acoulton force-pushed the feat-remove-transliterator branch from 95bf39a to b40c126 Compare April 30, 2025 06:03

acoulton requested a review from carlos-granados April 30, 2025 06:11

carlos-granados approved these changes May 1, 2025

View reviewed changes

acoulton merged commit e2f7cc1 into Behat:master May 3, 2025
18 checks passed

acoulton deleted the feat-remove-transliterator branch May 3, 2025 08:49

acoulton mentioned this pull request May 3, 2025

Remove the dependency on behat/transliterator #1632

Closed

acoulton mentioned this pull request May 3, 2025

Conflict of Licenses for the behat/transliterator dependency #1343

Closed

		public function getCanonicalText()
		public function getSuggestedMethodName(): string

	#[Then('/^I should have "([^"]*)"$/')]
	public function iShouldHave2($arg1): void
	{
	throw new PendingException();
	}

	#[Then('/^I should get a "([^"]*)":$/')]
	public function iShouldGetA2($arg1, PyStringNode $string): void
	{
	throw new PendingException();
	}

Uh oh!

change: Do not transliterate when suggesting method names for step definitions #1633

change: Do not transliterate when suggesting method names for step definitions #1633

Uh oh!

Conversation

acoulton commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acoulton Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-granados left a comment

Choose a reason for hiding this comment

Uh oh!

carlos-granados left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

acoulton commented Apr 25, 2025 •

edited

Loading

acoulton Apr 25, 2025 •

edited

Loading