Skip to content

Conversation

@acoulton
Copy link
Contributor

@acoulton acoulton commented Apr 25, 2025

As discussed in #1632, we should just remove our behat/transliterator dependency. We use very little of it, it is out of date, and it avoids the licensing issues discussed in Behat/Transliterator#3 and #1343.

This PR starts by refactoring the duplicated code that generates method names into a single service that implements the new StepMethodNameGenerator interface. This would allow people to define their own implementations if they want something more complex (or with more dependencies) than we provide out of the box.

Our default SimpleStepMethodNameGenerator then does the bare minimum to produce a unicode name that is syntactically valid for PHP.

This is all backwards compatible at the code level (including if people have registered a custom PatternPolicy class).

At a functional level, it obviously changes the method names we'll suggest - including that there are some cases where we'll now just suggest a generic name where previously we would have attempted to transliterate.

Previously, most of this was done within duplicated methods in the
`PatternPolicy` implementations and passed into the `Pattern` as
`canonicalText`.

There are lots of potential interpretations of the `canonical text` of a
step but in practice we were only using this as the basis of the
suggested method name when generating snippets. This suggestion was
processed almost entirely in the `PatternPolicy` class with one final
step in the `ContextSnippetGenerator` to lowercase the first character.

It makes more sense to consolidate all that logic together, and to make
explicit that this property on the `Pattern` is the suggested method
name.
Expose it in the service container so that users (particularly those
working in other languages) can provide a more complex implementation
if they wish to do so.
Drop the behat/transliterator dependency and simply suggest step method
names with the full utf8 characters used in the feature file.
Just remove any characters that are not within the allowed set
for method names (including digits at the start of the name),
and fall back to the generic `stepDefinition1` if the text
is completely invalid.

Since we may now have utf8 characters in the suggestion, we
also need to make the case conversions (to lowerCamelCase)
multibyte-safe. Fortunately, Behat already depends on
ext-mbstring.
#[Then('/^Добавить "([^"]*)" число$/')]
public function dobavitChislo($arg1): void
public function stepDefinition1($arg1): void
Copy link
Contributor Author

@acoulton acoulton Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will now just suggest generic names for some non-latin languages where the entire character set is outside the a-zA-Z0-9_\x80-\xff unicode range.

We could attempt to detect whether ext-intl is present and use the native Transliterator class as suggested by @stof in #1632.

However I'd be interested to hear what Russian etc speakers would actually want to have here - is a straight machine transliteration useful, or would they likely end up changing it anyway?

My gut feel is it would make more sense to leave this as-is "out of the box". It would now be easy for us (or a.n. other) to implement a Behat extension with a hard dependency on ext-intl for more complex cases like this.

Then 5 should have value of £10 # FeatureContext::shouldHaveValueOf£()
TODO: write pending definition
And 7 should have value of £7.2 # FeatureContext::shouldHaveValueOfPs()
And 7 should have value of £7.2 # FeatureContext::shouldHaveValueOf£()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think the new ValueOf£ makes more sense than ValueOfPs anyway!

#[Given('I have a package v2.5')]
public function iHaveAPackageV(): void
public function iHaveAPackageV25(): void
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be trivial to strip trailing numbers as before. However I'm not sure it's warranted (bearing in mind this number is in the hardcoded step text, it is not a placeholder).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case is fine to me

Comment on lines -52 to +37
public function getCanonicalText()
public function getSuggestedMethodName(): string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unified diff looks at a glance like I've just renamed the method - in fact getCanonicalText is still present but deprecated, and getSuggestedMethodName is a new method.

Comment on lines +20 to +25
public function __construct(
private string $suggestedMethodName,
private string $pattern,
private int $placeholderCount = 0,
) {
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These previously had PHPDoc typehints, I would also be very surprised if anyone externally is creating a Pattern instance and certainly not with arguments that won't be able to cast to these types. I think therefore it's safe to make them strict.

@acoulton acoulton changed the title [WIP] Refactor step method name generation and remove transliteration change: Do not transliterate step text when suggesting step method names Apr 25, 2025
@acoulton acoulton changed the title change: Do not transliterate step text when suggesting step method names change: Do not transliterate when suggesting method names for step definitions Apr 25, 2025
@acoulton acoulton marked this pull request as ready for review April 25, 2025 15:40
private function getMethodName(string $contextClass, string $canonicalText, string $pattern): string
{
$methodName = $this->deduceMethodName($canonicalText);
$methodName = SimpleStepMethodNameGenerator::cleanupMethodName($canonicalText);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we trigger a deprecation here when the cleaned value is different from the provided argument, to ask using the StepMethodNameGenerator in the pattern policy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we have no neat way to report that deprecation to the user at the moment, and even if we did if they are consuming a PatternPolicy from an extension that could be difficult for them to resolve in the short term.

Considering all the method actually does is lowercase the first letter and backfill empty strings, I almost considered removing it now - worst case someone will get a suggested method with an uppercase first character (or perhaps no name) and have to fix it in their Context class... It's not going to actually break anything. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlos-granados I have reflected more on @stof's comment I think the benefit of lowercasing the first character (in the unlikely situation someone is using a custom PatternPolicy) is limited for the cost of exposing this method and changing it again at some future date. I've reworked to just handle the case where the method name is empty, which avoids having to have this method at all.

* @deprecated provided only for BC, to allow the ContextSnippetGenerator to continue doing this cleanup for any
* steps that were generated by a third-party PatternPolicy that does not use our StepMethodNameGenerator
*/
public static function cleanupMethodName(string $name): string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If several steps end up with an empty name and we want to use the generic name we will get several functions named stepDefinition1 which is not ideal. Perhaps you could have a static int property which is initialised to 1 and appended to stepDefinition and then increased?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although they will all be suggested as stepDefinition1 initially ContextSnippetGenerator::getUniqueMethodName() will then convert these to stepDefinition2, stepDefinition3 etc before suggesting the actual snippets. This then also takes account of methods that are already defined in the context (rather than just from this execution).

It doesn't actually look like we have test / feature coverage of that - I could add that and a quick comment here to make the behaviour clearer / more reliable in future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I realise what you mean. Yes, some test coverage and a comment would certainly help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlos-granados turns out the "numbering duplicate step definitions" behaviour is actually covered already in snippets.feature:

#[Then('/^I should have "([^"]*)"$/')]
public function iShouldHave2($arg1): void
{
throw new PendingException();
}
#[Then('/^I should get a "([^"]*)":$/')]
public function iShouldGetA2($arg1, PyStringNode $string): void
{
throw new PendingException();
}

I have added comments to the new code, and also renamed from StepMethodNameGenerator::generate() to StepMethodNameSuggester::suggest() to make more obvious that these are just suggestions and may not be the final result (also makes it more consistent with $pattern->getSuggestedMethodName()`

Copy link
Contributor

@carlos-granados carlos-granados left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking mostly good, just added one comment

Renamed the interface and method to make it clearer that this code
is only responsible for *suggesting* a method name, which may
later be changed by the ContextSnippetGenerator. And added some
comments to explain why suggested names do not need to be unique.
…licy

The previous implementation introduced a new-but-deprecated method to
attempt to handle the (unlikely) case that a user has a custom
PatternPolicy suggesting step method names.

In practice, the only thing that is actually *required* to produce
valid code is that the name is not empty. We can easily handle that
without extracting a complete internal method.

Worst case, if a user's custom code is not lowercasing the first
character of their method name, we may generate a method that does
not follow their coding style. This feels like a reasonable trade-off
for the improvement to code clarity / structure here.
@acoulton acoulton force-pushed the feat-remove-transliterator branch from 95bf39a to b40c126 Compare April 30, 2025 06:03
Copy link
Contributor

@carlos-granados carlos-granados left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, great job @acoulton

@acoulton acoulton merged commit e2f7cc1 into Behat:master May 3, 2025
18 checks passed
@acoulton acoulton deleted the feat-remove-transliterator branch May 3, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants