Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
library: Support for unique test names in library #620
I've tried to cover as much background as I can given the limitations of markdown. Really this is about trying to arrive at a standard naming convention for Pickles in order for tests from many different test frameworks across the Gherkin ecosystem to be reconciled with their corresponding results.
It's similar, but not quite the same (I think) as some of the previous conversations on linking Gherkin with a test management system.
I've tried to include as much context as I can - and stick to the issue template. Though I'm sure it will read a bit funny. It might be best to look at the "context and motivation" section to understand why I think it would be useful.
It would be lovely if all Gherkin test results had a standardised identifier that was easily linkable to the Gherkin that the test result came from without any a priori knowledge of which result belongs to which feature file.
The issue as I see it is that each test framework uses a different naming convention for each test (pickle) - and the naming conventions that are used are "lossy" - for example, SpecFlow's MSTest doesn't include line number information, SpecFlow's XUnit truncates test names at a certain number of characters (long scenario outline names with quite a lot of columns/long content in the table). Karma-Gherkin-Feature's were so inconsistent we had to fork it because we just couldn't make it work.
In discussion with @brasmussen in the Gherkin Slack space, he suggested using file and line number - which may work (in principle) for automated tests - except the frameworks out there don't include this information in their test output - or at least, don't all include this information.
I wondered if we could encourage consistency in the tool ecosystem by providing one (or more) standard identifiers for tests from the Gherkin parser.
It wouldn't fix all problems, and it would need uptake from the folks who write the test frameworks, but if it's easier to be consistent than to be inconsistent, we might see improved interoperatiblity between tools in the ecosystem.
One thing to note is that I think there's a marked difference between our automation and manual use cases (see context and motivation below):
Because of the new cross-language support in Gherkin 6, it seems a real opportunity to help improve interoperability across the ecosystem as in time, most test framework implementations would come to rely on the new parser (with whatever test ID scheme it contained).
Context & Motivation
We like ATDD/BDD!
We really like BDD/ATDD.
We are a large organisation (ca. 400 developers/testers)
We have a relatively small number of products, but they are large.
We have a large number (~16000) of Gherkin files describing our products. Most written in the style of tests, but increasingly as proper executable specification.
We have a mixture of tools and technologies - some more historic, some less so.
Many of our automation tests are written in Gherkin. We have a number of tools in use - SpecFlow2.0 (MSTest), SpecFlow3.0 (XUnit), karma-jasmine-feature (maybe to be replaced with cucumber.js).
Are we finished yet?
Sometimes tests don't get run. Sometimes tests get disabled. In some frameworks (like SpecFlow), we get a result that says the test was ignored. In other frameworks, we don't get a result at all.
It's therefore really important for us to use to be able to reconcile the test results with the specification (i.e. the Gherkin) to know we've actually implemented everything we should (and someone hasn't accidentally left a test turned off because it was flickering, or something like).
So we trace to our Gherkin Specifications by tagging the Feature, Scenario, Outline or Example table with the external requirement ID (so we can review which detailed specs address each requirement, and whether we think that's enough)
Because we use a mixture of technologies, we also have a mixture of test frameworks (some SpecFlow/MSTest, some karma-gherkin-feature, some SpecFlow/XUnit, some cucumberjs). Each has a different convention for how it names pickles in the results - and all are in some way "lossy".
Our results can be quite fragmented - because our Gherkin is a behavioural specification (and we use them to generate product documentation), when they are written, we pay no mind to how fast or slow a behaviour is to test - only whether we have refined and specified the right behaviour.
This means that for a given feature file, some pickles are run at one (fast) stage of the pipeline, and some at later (e.g. overnight) stages of the pipeline, and at the end, we have to aggregate many different results files together and tie them back to the feature file (see earlier) to make sure we've covered everything.
In many of our pipelines (cloud and deployment environment), the tools don't have reliable access to the original Gherkin at the point the tests are run.
We have similar arrangements with our nightly builds - where tests are run off-line in a deployment environment.
Our cloud pipelines build the test executables, pack them into docker images, spin them out to multiple deployment environments in the cloud, and then post the results back.
Because of the multiplicity of tools involved (and their not-quite-consistent naming conventions), the fact that lots of those tools don't exist in the deployment environments, the fact that many points in out build pipeline don't have access to both the Gherkin and the test results at the same time, this makes reconciling our tests and test results really quite challenging.
Also - because we really like BDD/ATDD, we don't pay any mind when writing feature files as to whether we will automate a given scenario or whether we will run it manually. Because the feature file is the specificiation - which doesn't really care how we intend to test it.
In these instances, we tag scenarios (or other taggable constructs) "@manualtest", and have a tool which generates a pro-forma PDF form which our manual testers use to complete their results.
Those PDFs are also reconciled against the orginal feature, showing that between the automation and manual test results, we have results covering the whole specification.
There's lot here to parse @MattWherry!
I think that what I'm hearing is you'd like each pickle to have an identifier (ID) that is:
Do I have that right?
For (3) I presume (though I'm not entirely clear) you wouldn't want to include git revision info or anything in the ID itself, that you could capture that separately?
Yeah. Sorry about that. The template said "motivation and context".... So....
I'll try to be briefer:
3.2: Would it be practical for a test framework to use something like that in a test name? For example, there are all sorts of "what's a reserved character or valid symbol name in js, c#, language-du-jour etc.
But other than that - yes!
3.2 was what got me thinking about digests. Hexified, they're most likely short and ascii enough to be valid symbol names in any auto code generation that test frameworks might use, and probably still just about unique enough to be useful.. .
Requirement 1 is just a tooling issue and should be easy enough to guarantee for any reasonable solution to the rest of the problems.
Requirement 2 more or less necessitates that the identifier be a distinct and explicit part of the Gherkin text in the feature file instead of being implicit like line numbers are. This could be a new addition to the language (we added the
Requirement 3 would be implementation specific unless all implementations generate results in the same way. I know that in the Ruby version of Cucumber, the test objects provided to the hooks provide access to the various pieces of the test. Presumably, the shared formatters would be a way to ensure that the hypothetical identifier was always included in the results.
Thanks for the feedback @enkessler -
Any modification is a 'reasonable' modification. ;)
Don't like the name of a test? Change it. An extra pop-up window now exists and has to be handled? Stick a new step in the test. The team remembers that that kind of thing shouldn't be handled at the feature file level? Yank the step back out. Your company roles out the new standard that all tests must be written in 2nd person future perfect tense and/or Klingon? Everyone will immediately quit but, sure, go ahead and rewrite all of the Gherkin.
In none of those cases did any of the tests actually stop being the same tests. Still proving the same things for the same reason. The How may have changed but the What did not. That is why my view is that the only way to properly identify a test is by adding some aspect whose only point is to be an immutable identifier. By doing so, all of the other parts of the test which already have a purpose that they shouldn't be restricted from changing in order to fulfill are free to do whatever. The additional identifier property, on the other hand, should only change under one circumstance: when a human decides "yeah, this test isn't really the same test anymore".
Custom formatter. Bam! Unless the implementation that you are using (again, I'm only familiar with the Ruby implementation) does not allow access to the all of the fiddly bits of the test via whatever object is handed to the formatter methods, you should be able to ensure that any information that you like is in a test result. In the past, I've made custom formatters to stick the results in a DB and saving off the ID was just one more column to populate.
Alternatively, a two step process:
Very thoughtful. Makes me think:
1: I wish all the implementations were like the Ruby one. Maybe that changes the complexion of my earlier thoughts. Maybe if the information were readily accessible in other test frameworks, just some reasonably unique ff, content, line based thing would be perfectly adequate for automated tests.
2: maybe the manual test thing is something we could bite internally.
3: Though that still make my head spin c.f. CI pipeline and staging...
FTR: If a butterfly flaps its wings in Brazil, we would be rerunning our manual tests anyway. And roundly refusing any Klingon language policy changes. Or at least making the person proposing it rewrite the feature files and rerun the tests themselves :-)
Manually tagging is such a horrid solution though. I get your points, but it soooo, soooo would interfere with the fluency of the whole process.
Which is one reason we love it so much in the first place...
Certainly food for thought...
Manually tag things? My dear Matt, if I thought that I could trust a person to have the consistent motivation and accuracy to do it, I wouldn't have created a computer program to do it instead. :P
Unless I am misunderstanding what you mean?
All the person does is run a script and the tags appear. It sounds like you have both manual and automated tests living in the same feature files, so there doesn't even have to be a distinction between the two, as far as unique identification goes. The automated tests will just, presumably, have results produced more often.
I would love to try and find a solution that didn't involve putting additional stuff into the Gherkin text if possible, manually or automatically.
It has always seemed to me that a reasonable proxy for this human decision is if the scenario's name changes. The obvious problem with using this as your actual identifier is that it's unlikely a scenario name on its own would be unique within the whole possible set of pickles.
There are many different strategies for assigning a unique id to a pickle. They each have their strengths and weaknesses. We should pick one based on the capabilities we want from various applications.
The most basic capability we want is to link the result of a test case (pickle) back to the source. This can be done with a path:line id. These are unique and simple to implement. Applications that track multiple versions of results/sources, can create composite keys based on the path:line id and e.g. the git sha and cucumber execution id.
Another more advanced capability is to model scenarios/tests/pickles as entities with an id that allows to track and follow their evolution over time. Consider the following scenario:
A few weeks later this scenario has been refactored:
Many properties have changed:
Conceptually, this is the same scenario at different times. It's like looking a picture of myself when I was 7 and when I was 45. It's still Aslak, he just moved, changed his name when he married (I'm old-fashioned - I didn't), grew taller and gained a bit of weight.
How do we track this evolution with files and file contents? This is a non-trivial problem to solve. There has been some great research done in this space, and I have been following it with passion and fascination for over 5 years. Here are some of the algorithms:
If we were to adopt one of these algorithms, we would have a way to assign unique ids to scenarios that would satisfy both of these capabilities - link results back to source, and follow the evolution of tests, treating them as entities.
I think we should give it a go. The tricky part is to do it in a way that is portable across all our platforms without having to maintain half a dozen implementations. That is a general challenge we have for other components such as gherkin, cucumber expressions, tag expressions etc. So far it looks like we'll try to implement this shared functionality in Go, and distribute it for many platforms. My preference would be to use wasm as the cross-platform format, but wasm execution isn't well supported across platforms yet (although most of them now compile to wasm).
Unfortunately, it is the only way that comes to mind when I think of an 'explicit' style solution. For an 'implicit' style solution, @aslakhellesoy has better ideas but, as he mentioned, they are non-trivial. I tend to lean towards to an 'explicit' solution because there is no guesswork involved, however intelligent that guessing may be.
I once again must envy the environments in which many members of this community seem to work. At the kind of places where I have worked, if a month went by without some feature or other needing a touch up due to the hasty/uninformed/lazy manner in which it was written then I would be pleasantly surprised. ;)