Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAP version 14 - First Draft #36

Closed
wants to merge 1 commit into from
Closed

Conversation

isaacs
Copy link
Contributor

@isaacs isaacs commented Sep 7, 2015

This is a first draft at a specification that seeks to ratify existing
behavior of TAP harnesses and producers.

  • YAML blocks standardized to 2 space indentation
  • Subtests specified to behavior of Test::More and node-tap.
  • Normative advice regarding exit code for harness programs
  • Examples and usage comments made language-agnostic.
  • Clarification of whitespace and hyphens in test lines.
  • Clarification of handling of incorrect lines.
  • Specification of Pragma lines

It'd be very helpful for implementors to point to a body of
language-agnostic tests for compliant parsers. I've got a pretty nice
start at this over at
https://github.com/substack/tap-parser/tree/master/test/fixtures, but
we may want to bikeshed the event/property names a bit, and ideally I'd
like to get at least one other implementation passing those tests before
we sign off on them.

Note that this does not go as far as a lot of people would probably like
to see in a forward-looking TAP specification. No fancy new magic is
added. However, before we start talking about brand new features, it
seems wise to ratify the features we are already using.

With this change, node-tap and Test::More should be able to simply
change their version number from 13 to 14 in order to be fully
compliant. Feedback from other TAP producers and harnesses is
necessary before adopting this officially.

@isaacs isaacs force-pushed the tap14 branch 2 times, most recently from 6c6d974 to 561d609 Compare September 7, 2015 16:21
@isaacs
Copy link
Contributor Author

isaacs commented Sep 7, 2015

Review welcome: @AndyA @gaurav @jonathanKingston @kinow @Leont @Ovid

`Bail out!` in the parent test.

If a Subtest TAP stream does not include a version number, it MUST be
interpreted as a TAP14 stream anyway. This is a backwards-compatible
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea being that in some future TAP14+n with some different behaviour could have subtests that are interpreted using TAP14 semantics? And the context here is that TAP14 is a codification of how people have been using/extending TAP13 in practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the context of this spec is to codify how we're extending TAP13 in practice.

The wording of this is tricky. The issue here is that you don't want subtests to be interpreted as a TAP12 stream (since they'll likely have subtests and yaml diags), but including a TAP version XX line in a subtest upsets some existing parsers that don't require that the version start on column 0, so it must be acceptable to omit the version designator.

Perhaps it's better to say that, in absence of a TAP version XX in the subtest, it's assumed to be the same version as the parent test? (Ie, 14+)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kinda what I was thinking.. as useful as it would be to be able to have some older tests output appear as subtests, it seems like the facilities to produce them in a way that matches this spec is somewhat mutually exclusive of requiring this contingency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think people are restreaming of different versions very much as I personally think the format doesn't really lend itself to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For stuff like t.test('child test', function (t) { ... }), yeah, you're going to definitely use the same version, because you're generating TAP with the same test framework. But for spawning child processes, or especially collating tests from users of a web service, it's possible to get more random stuff showing up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, I just don't think TAP handles those test cases very well at all. With the fact that the debug isn't there for the process etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing more than one TAP version in a stream sounds like an implementors nightmare to me. I don't think it's worth the complication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing more than one TAP version in a stream sounds like an implementors nightmare to me.

Yeah, I agree, and in practice I'd probably just rely on TAP's relatively reliable backward compatibility, and parse it all as TAP14.

Since TAP14 is a superset of TAP12 (mostly; it's a bit stricter about yaml indentation, but in practice, everyone uses 2 spaces, not 1), and using other versions is unlikely anyway, it seems like it might just be reasonable to say that any embedded subtest SHOULD be TAP14?

@kinow
Copy link
Member

kinow commented Sep 7, 2015

Hi, sorry, I missed the discussion for this draft, so I'm not sure I'll be able to be of much help reviewing it.

@isaacs
Copy link
Contributor Author

isaacs commented Sep 7, 2015

@kinow There hasn't been all that much discussion, tbh. There have been a few conversations about making TAP14 specify what people are already doing to extend TAP13, and add some clarifications about edge cases, so I just wrote down what node-tap and Test::More seem to be doing today. If that's divergent from what your TAP programs do, I'd love to surface those differences.

ok
```

has five tests. The sixth is missing. For example, in Perl,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made independent to language or sections like this clearly demarcated?

The W3C mark these as 'non normative'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@Leont
Copy link

Leont commented Dec 11, 2015

Is there a valid use case for multiple plan lines?

No, I don't think so.

Initially I thought that prove might be deficient as a TAP producer, but the behavior seemed very reasonable the more that I thought about it. I've encountered a number of TAP producer libraries that specify a plan per file.

prove is not a TAP producer, it's a TAP consumer. The -v makes it output whatever it got as input. Most likely you're running multiple test files, and the prove output will contain a concatenation of it (or worse, a mix if you're using -j). prove -v's output is not intended to be parsed as TAP (though the idea of it producing a summary TAP is interesting).

@mblayman
Copy link
Member

Sorry. I didn't intend to misrepresent prove. It "smelled" like a producer to me with the -v flag on. 😄

What about the case when multiple producers share the same output stream (e.g., STDOUT)? For instance, a project might have a Python server backend and an Ember JS frontend. A reasonable CI build may do something like:

$ nosetests --with-tap --tap-stream server
TAP version 13
... a stream of Python tests here ...
1..42
$ cd client
$ ember test
TAP version 13
... a stream of Ember tests here ...
1..24

As TAP currently exists, any TAP CI plugin would be unable to aggregate that. The results would have to be sent to separate files or something to do it correctly.

@Leont
Copy link

Leont commented Dec 11, 2015

As TAP currently exists, any TAP CI plugin would be unable to aggregate that. The results would have to be sent to separate files or something to do it correctly.

That's exactly the type of thing that prove is doing: aggregating different TAP results.

@mblayman
Copy link
Member

For sure, prove can aggregate things from separate streams like different test files in a t directory. I'm not able to get it to handle the case I'm talking about. Maybe I don't know the right flag. ¯_(ツ)_/¯

To simulate, I made a fake stream that I stored in a file, sample-stream.txt. It contained:

$ tap-producer-1
TAP version 13
ok 1 - something passed
not ok 2 - something failed
1..2

$ tap-producer-2
TAP version 13
ok 1 - a different producer
ok 2 - doing different things
1..2

Then I ran cat sample-stream.txt | prove and got an error message. I don't mean to pick on prove. It's a good tool and I have some familiarity with. What I'm attempting to demonstrate is a fan-in problem. A lot of consumers have no trouble with separate TAP streams if they are broken out into multiple files (prove included). When TAP streams are in the same file output stream, I think the protocol runs into some trouble.

@Leont
Copy link

Leont commented Dec 12, 2015

Sounds like what you want it a TAP multiplexing protocol. While I can imagine that would be useful in some scenarios, it also sounds like something that would complicate consumers (and probably producers too) that don't need such features.

@mblayman
Copy link
Member

This is the place and time to discuss changes that could affect consumer/producer behavior, right?

Maybe you didn't intend it, but it feels very dismissive to state that some change would complicate consumers/producer when the reality is that any change can potentially complicate consumers/producers. For instance, subtests will definitely complicate consumer behavior. That doesn't mean that it was thrown out from consideration.

Back to the subject, I think it would be possible to support multiple TAP streams in a single output stream without being a separate protocol. Some kind of "end of TAP" marker could signal a consumer to treat the stream as multiple streams. I think it would be similar to how consumers handle and aggregate multiple files.

Does anyone see hidden gotchas in having an EOTAP line?

@isaacs
Copy link
Contributor Author

isaacs commented Jan 8, 2017

I would prefer not to add support for backslash-escaping to TAP. It's a slippery slope to tests with \n in the name, etc. And then we'd definitely have to be able to escape backslash itself, or else how could you have a TODO test that ends in \? It's also complicated, by the fact that the slash has to be escaped in the output, and also again in the input, and so in most languages (that use backslashes for escapes in strings), we have the new RegExp('\\\\') type of anti-usability noise.

I think the right answer here is to say that Test::More should remove support for escaping # characters in a future major version bump, or retain the behavior but with a caveat that it's out of spec.

Node-tap does not support any kind of escaping in test names, I'm not sure about other harnesses or test frameworks.

@3cp
Copy link

3cp commented Jun 16, 2019

I am new here. I saw this is a 4 years old draft, would like to know is tap version 14 still ongoing or abandoned?

@bjh83
Copy link
Contributor

bjh83 commented Dec 16, 2019

From the above closed issues, it kind of looks like you are merging subfeatures of this PR instead of everything all at once; that makes sense. Nevertheless, I am curious about the status of some of the features it introduces like subtests; could you comment on when that will be added to the spec?

We are already using the subtests feature on the Linux kernel, and it would be nice to see the current version of the spec to reflect this. At this point substests are something we depend on and I really don't want this to result in us forking your spec.

@isaacs
Copy link
Contributor Author

isaacs commented Dec 17, 2019

@bjh83 @3cp The TAP specification is somewhat in a weird state, I have to say.

No one with the moral authority to dictate the spec has the time or inclination to do so. So, what's happened is that a bunch of implementers have just gone about implementing in a way that makes sense to them.

For myself, as the maintainer of node-tap, yes, I've basically already "ratified" this specification years ago, and have implemented all of it. Test::More and a few other CPAN modules provided the original inspiration, and it was a specification of their observed behavior, so they're also in line. It might be incomplete, but as a description of current reality, it's fairly accurate.

What it isn't is normative, because there is no governing body with authority to say what TAP is or should be. It's a peaceful anarchy, with the "specification" of the protocol only held in line by a shared desire to interoperate.

It's a good protocol for its stated purpose, and anything that deviates too sharply would be something other than TAP, so it's probably safe to rely on, assuming you're somewhat strict in what you send and loose in what you accept.

Someday when I have time, I might push forward with this, but for now, other things are higher priority.

The Plan MUST appear exactly once, EITHER:

- the first line of TAP output after the Version, or
- the last line of TAP output prior to the end of the TAP stream.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about comments after a final plan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. In practice, we all put comments after the closing plan, so I think the assumption here is that comments don't count as a "line of TAP output". (Which is an understandable assumption, I think, but also weird and should be called out.)

@Leont
Copy link

Leont commented Dec 19, 2019

The TAP specification is somewhat in a weird state, I have to say.

Agreed.

No one with the moral authority to dictate the spec has the time or inclination to do so.

Without a decision making process and clear requirements, the process is doomed.

For myself, as the maintainer of node-tap, yes, I've basically already "ratified" this specification years ago, and have implemented all of it.

I've done essentially the same in prove6 (though I just spotted I had missed pragmas)

Krinkle added a commit to qunitjs/js-reporters that referenced this pull request Jan 17, 2021
This is to accomodate node-tap and tape, which allow for child
tests to be associated directly with other assertion-holding tests
(as opposed to having tests only contain assertions, and suites
contain only tests and other suites).
Ref #126.

It also allows for future compatibility with TAP 14, which currently
has no concept of test groups or test suites, but is considering
the addition of "sub tests".
Ref TestAnything/testanything.github.io#36.

Also:

- Define "Adapter" and "Producer" terms.

- Refer mostly to producers and reporters, instead of frameworks,
  runners, or adapters.

- Remove mention that the spec is for reporting information about
  JavaScript test frameworks, it can report information about any
  kind of test that can be represented in its structure of JSON
  messages.
  Instead, do clarify that the spec defines a JavaScript-based
  API of producers and reporters.

Thought dump:

In aggregation, simplify status to failed/passed only,
if something has only todo or skipped children, don't
propagate this like we did with suites, but cast it down
to only failed/passed, as we did with "run" before.

This is because, with the "suite" concept gone, we can't
assume that test parents only contained other tests, they
may have their own assertions. As such, a parent with only
two skipped children doesn't mean the parent can therefore
be marked as skipped, rather it will be marked as passed,
assuming no errors/failures reported.

This affects the adapters for QUnit/Mocha/Jasmine, but when
frameworks implement this themselves, they can of course have
know if an entire suite was known to have been explicitly skipped
in which case it can mark that accordingly.
Krinkle added a commit to qunitjs/js-reporters that referenced this pull request Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit to qunitjs/js-reporters that referenced this pull request Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit to qunitjs/js-reporters that referenced this pull request Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

- The "Console" reporter that comes with js-reporter now no longer
  uses `console.group()` for collapsing nested tests.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit to qunitjs/js-reporters that referenced this pull request Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

- The "Console" reporter that comes with js-reporter now no longer
  uses `console.group()` for collapsing nested tests.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit to qunitjs/js-reporters that referenced this pull request Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

- The "Console" reporter that comes with js-reporter now no longer
  uses `console.group()` for collapsing nested tests.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit to qunitjs/js-reporters that referenced this pull request Feb 14, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

- The "Console" reporter that comes with js-reporter now no longer
  uses `console.group()` for collapsing nested tests.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit to qunitjs/js-reporters that referenced this pull request Feb 21, 2021
In light of the shift in direction per #133,
I'm reverting (most of) cce0e4d so as
to allow the next release to more similar to the previous, and to make
upgrading easy, allowing most reporters to keep working with very minimal
changes (if any).

Instead, I'll focus on migrating consumers of js-reporters to use
TAP tools directly where available, and to otherwise reduce use of
js-reporters to purely the adapting and piping to TapReporter.

* Revert `RunStart.testCounts` > `RunStart.counts` (idem RunEnd).
* Revert `TestStart.suitName` > `TestStart.parentName` (idem TestEnd).
* Revert Test allowing Test as child, restore Suite.

This un-fixes #126,
which will be declined. Frameworks adapted to TAP by js-reporters will
not supported nested tests.

Frameworks directly providing TAP 13 can one of several strategies
to express relationships in a backwards-compatible manner, e.g. like
we do in js-reporters by flattening with '>' symbol, or through
indentation or through other manners proposed in
TestAnything/testanything.github.io#36.
Refer to #133 for
questions about how to support TAP.
This is a first draft at a specification that seeks to ratify existing
behavior of TAP harnesses and producers.

- YAML blocks standardized to 2 space indentation
- Subtests specified to behavior of `Test::More` and `node-tap`.
- Normative advice regarding exit code for harness programs
- Examples and usage comments made language-agnostic.
- Clarification of whitespace and hyphens in test lines.
- Clarification of handling of incorrect lines.
- Specification of Pragma lines

It'd be very helpful for implementors to point to a body of
language-agnostic tests for compliant parsers.  I've got a pretty nice
start at this over at
<https://github.com/substack/tap-parser/tree/master/test/fixtures>, but
we may want to bikeshed the event/property names a bit, and ideally I'd
like to get at least one other implementation passing those tests before
we sign off on them.

Note that this does not go as far as a lot of people would probably like
to see in a forward-looking TAP specification.  No fancy new magic is
added.  However, before we start talking about brand new features, it
seems wise to ratify the features we are already using.

With this change, node-tap and Test::More should be able to simply
change their version number from 13 to 14 in order to be fully
compliant.  Feedback from other TAP producers and harnesses is
necessary before adopting this officially.
Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isaacs has this been superseded by the work on the Specification repository? If so I think this can now be closed?

@isaacs
Copy link
Contributor Author

isaacs commented Apr 19, 2022

Ha, yes, superseded by TestAnything/Specification#25

@isaacs isaacs closed this Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.