741 changes: 741 additions & 0 deletions tap-version-14-specification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,741 @@
---
layout: default
title: TAP 14 specification
---

# NAME

TAP14 - The Test Anything Protocol v14

## SYNOPSIS

TAP, the Test Anything Protocol, is a simple text-based interface
between testing modules a test harness. TAP started life as part of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/modules a/modules and a/

the test harness for Perl but now has implementations in C/C++,
Python, PHP, Perl, JavaScript, MatLab, and probably others by the time
you read this.

This document describes version 14 of TAP.

## THE TAP14 FORMAT

A TAP stream is a UTF-8 encoded text stream representing a set of
tests. It should be parsed line by line. Lines are separated by
`\n`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should specify whether this \n is text mode or binary mode. I think the context implies that CRLF would be accepted, but I don't think that's the only reasonable interpretation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be comfortable specifying any of \r\n or \n.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm speculating here but it might be worth supporting \r too due to TAP being so old.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. So, end of line token is /\r\n|\r|\n/ then?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue against \r. I haven't touched a system that used that for line-endings in more than a decade, and neither does anyone else I know. I doubt anyone has TAP stream laying around that use it.

The argument to convince me otherwise would be "I have such streams laying around". Until then even Unicode line endings would be more sensible (but still not very).


TAP14's general format is:

```
TAPDocument ->
Version Plan Body |
Version Body Plan
```

The Version is always `TAP version 14`.

The Body is a collection of lines representing a test set.

The Plan reports on the number of tests included in the Body.

Any line matching the pattern `/^\s*#.*$/` is treated as a diagnostic
comment, and MAY be reported to the user or ignored, but MUST be
ignored for the purposes of interpreting test output semantics.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/^\s*#.*$/ always being ignored as a comment sounds great, but only works with JSON diagnostics, not YAML diagnostics. The YAML block literal can break this:

data: >
    This is YAML data and it might just contain a hash character (
    #) and have this line ignored because TAP 14 thinks it's a
    comment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, same with blank lines. Everything within --- ... needs to be preserved intact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Striking this paragraph entirely unless someone objects.


For example:

```
TAP version 14
1..4
ok 1 Description # Directive
# Diagnostic
---
message: 'Failure message'
severity: fail
data:
found:
- 1
- 3
- 2
wanted:
- 1
- 2
- 3
...
ok 2 Description
ok 3 Description
# Subtest: name of the child test
ok 1 this is ok
ok 2 this is also ok
not ok 3 this is a failure
---
message: 'Child Failure'
severity: fail
...
1..3
not ok 4 name of the child test
```

For example, a test file's output might look like:

```
TAP version 14
1..4
ok 1 - Input file opened
not ok 2 - First line of the input valid
---
message: 'First line invalid'
severity: fail
data:
got: 'Flirble'
expect: 'Fnible'
...
ok 3 - Read the rest of the file
not ok 4 - Summarized correctly # TODO Not written yet
---
message: "Can't make summary yet"
severity: todo
...
```

## HARNESS BEHAVIOR

In this document, the "harness" is any program analyzing TAP output.

A harness interpreting a subprocess MUST only read TAP output from
standard output and not from standard error.

Lines written to standard output matching `/^(not)? ok\b/` must be
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be /^(not )?ok\b/? Moved the space inside the parenthesis so that it matches "not ok" and "ok"; instead of "not ok" and " ok".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not many specs have regex; it would be nice to have these in explanatory documents instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinusU That is correct.

@jonathanKingston I'm not overly attached to regexes, necessarily, but as long as they avoid overly obscure or platform-specific features, it seems like an ok way to make things clear. I'm happy with any other approach that is equally clear.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only alternative I see would be to use (A)BNF, but that would require is to rewrite everything to a full grammar for it to really make sense IME.

interpreted as test lines. After a test line a block of lines starting
with `/^ {2}---$/` and ending with `/^ {2}\.\.\.$/` will be interpreted as an
inline YAML document providing extended diagnostic information about
the preceding test.

All other lines must not be considered test output.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to preclude subtests which are explicitly mentioned below as something that harnesses would consume. Is the idea here that they ignore the sub-tests and only interpret the ok/not ok summary line of the subtests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that's an oversight. Harnesses should definitely not ignore subtests, though if they do, and the subtests are formatted according to this specification, then they'll still get the proper pass/fail for the suite.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly most harnesses still ignore subtests in practice, I think (not an argument, just an observation)


## TESTS LINES AND THE PLAN

### The Version

To indicate that this is TAP14 the first non-comment, non-whitespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this include "non-unknown" too? Considering producers that log other random stuff that is hard to control, it might help a producer to conform easier without being ambiguous. For instance, ember test output can look like:

version: 1.13.8
Could not find watchman, falling back to NodeWatcher for file system events.
Visit http://www.ember-cli.com/user-guide/#watchman for more info.
Built project successfully. Stored in "/home/matt/testery/client/tmp/class-tests_dist-OyaYxzuN.tmp".
ok 1 PhantomJS 1.9 - JSHint - .: app.js should pass jshint
ok 2 PhantomJS 1.9 - JSHint - helpers: helpers/resolver.js should pass jshint
ok 3 PhantomJS 1.9 - JSHint - helpers: helpers/start-app.js should pass jshint
ok 4 PhantomJS 1.9 - JSHint - models: models/build.js should pass jshint
ok 5 PhantomJS 1.9 - JSHint - .: router.js should pass jshint
ok 6 PhantomJS 1.9 - JSHint - routes: routes/builds.js should pass jshint
ok 7 PhantomJS 1.9 - JSHint - .: test-helper.js should pass jshint
ok 8 PhantomJS 1.9 - Unit | Model | build: it exists
ok 9 PhantomJS 1.9 - Unit | Model | build: it has passes
ok 10 PhantomJS 1.9 - JSHint - unit/models: unit/models/build-test.js should pass jshint
ok 11 PhantomJS 1.9 - Unit | Route | builds: it exists
ok 12 PhantomJS 1.9 - JSHint - unit/routes: unit/routes/builds-test.js should pass jshint

1..12
# tests 12
# pass  12
# fail  0

# ok

ember test doesn't include a TAP version line now, but if it did in the future, all the lines above ok 1 would be troublesome. If a consumer was permitted to ignore those lines before the TAP stream officially starts, I think the job of producers would be easier.

This might have ramifications to the strict pragma that is proposed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring unknowns, I think, is mainly for forward-compat and to reduce impact of unintended output. In terms of intentional diagonistic output like this, I'd expect e.g. Ember to not let these through. Either muting them, or forwarding them with # ensured in front of it. If they are uncontrollable and intentionally let through, the coul also wreak havoc in more nominal ways if they happen to match TAP instructions.

line must be:

```
TAP version 14
```

A compliant TAP14 Harness SHOULD interpret TAP streams with `TAP
version 13` according to TAP13 specification.

TAP streams lacking a version number SHOULD be interpreted according to
the TAP12 specification.

In order to facilitate forward compatibility, a compliant TAP14
Harness MAY interpret test data according to TAP14, regardless of
stated version number. For example, it may accept subtests in TAP12
or TAP13 streams, even though these were not supported in those
versions of TAP.

Any stated version number less than 13 is an error.

### The Plan

The Plan tells how many tests will be run, or how many tests have run.
It's a check that the test file hasn't stopped prematurely.

The Plan MUST appear exactly once, EITHER:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a valid use case for multiple plan lines? A few weeks ago, I was testing the streaming interface for the Python TAP consumer that I've written and was surprised when prove -v provided multiple plan lines. It caused my consumer to report some errors because of this constraint to 1 appearance of the plan.

Initially I thought that prove might be deficient as a TAP producer, but the behavior seemed very reasonable the more that I thought about it. I've encountered a number of TAP producer libraries that specify a plan per file.

Maybe the question really should be: should TAP expect producers to aggregate plan counts into a single plan or should TAP consumers be able to deal with multiple plans from independent streams?

One option for supporting multiple plans would be a way to demarcate the end of a stream. Perhaps something matching /^EOTAP$/.


- the first line of TAP output after the Version, or
- the last line of TAP output prior to the end of the TAP stream.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about comments after a final plan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. In practice, we all put comments after the closing plan, so I think the assumption here is that comments don't count as a "line of TAP output". (Which is an understandable assumption, I think, but also weird and should be called out.)


The Plan matches the regular expression: `/^1\.\.(0|[1-9][0-9]*)( #.*)?$/`.

The Plan specifies how many test points are expected in the test
stream. For example,

```
1..10
```

means you plan on running 10 tests. This is a safeguard in case your
test file dies silently in the middle of its run.

In certain instances a test file may not know how many test points it
will ultimately be running. In this case the Plan should be the last
non-diagnostic line in the output.

A test set missing a Plan MUST be interpreted as a failure by the
harness.

### The test line

The core of TAP is the test line. A test file prints one test line
test point executed. There must be at least one test line in TAP
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a test file is skipped completely?

TAP version 14
1..0 # SKIP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is an oversight copied over from the previous spec. It's fine to have no test lines at all if the plan is 1..0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also, this is valid TAP:

1..999
Bail out! Can't do tests for some reason

output. Each test line comprises the following elements:

- ok or not ok

This tells whether the test point passed or failed. It must be at the
beginning of the line. /^not ok/ indicates a failed test point. /^ok/
is a successful test point. This is the only mandatory part of the
line. Note that unlike the Directives below, ok and not ok are
case-sensitive.

- Test number

TAP expects the ok or not ok to be followed by a test point number,
separated by 1 or more space characters from the `/^(not )?ok/`.

For example:

```
1..3
ok 1
not ok 2
ok 3
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that the optional test number must be a positive integer and they must be sequential. The following is a failure:

ok 1
ok 2
ok 3
ok 5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that a lot.

Copy link
Contributor

@Krinkle Krinkle Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Worth thinking through as well how this would play with compatibility.

It'll be fine to older TAP consumers as this would be ignored and seen as part of the optional description, right?

For newer TAP consumers of older TAP output, though, it might be a problem for tests that happen to have a number in (some) of the descriptions. But, perhaps that's okay as those newer consumers would only enforce this restriction if the output is TAP 14. For older versions' output, they'd do what they do today. Does that make sense?


If there is no number the harness must maintain its own counter until
the script supplies test numbers again. So the following test output

```
1..6
not ok
ok
not ok
ok
ok
```

has five tests. The sixth is missing. For example, in Perl,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made independent to language or sections like this clearly demarcated?

The W3C mark these as 'non normative'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

`Test::Harness` will generate

```
FAILED tests 1, 3, 6
Failed 3/6 tests, 50.00% okay
```

- Description

An optional description may be provided, separated from the ok/not-ok
and test number by one or more space characters, terminated by either
a `#` character, or the end of the line.

If the Description starts with a hyphen `-` and any amount of
whitespace, it SHOULD be discarded by the Harness. For example, these
test lines should all be identical:

```
ok 5 some description
ok 5 - some description
ok 5 - some description # and some comment
```

The harness may do whatever it wants with the description.

- Directive

The test point may include a directive, following a hash on the test
line.

There are currently two directives allowed: TODO and SKIP. These are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you have no ability to change this - however this is going to be a failing point for backwards/forward compatibility in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should say that there are 2 directives supported, and any other type of directive MUST be ignored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the rest has always been considered a comment otherwise hasn't it. Which means there could be some comments lying like bombs waiting for a parser to implement some new directive. Unless you know otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, adding any new directives would entail some risk for that reason. That was the strongest argument against standardizing the # time=Xms directive. I haven't come up with a reasonable way around that, or a better way to report test timing without attaching yaml diags to passing tests, so it remains a non-standard extension in node-tap.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never considered other directives to be comments. I do think there's a compatibility issue around adding new ones. I suspect extended diagnostics are a better solution to most problems one may want to solve using directives though.

discussed below. Any other directive is ignored.

#### Test Line Summary

To summarize:

- ok/not ok (required)
- Test number (recommended)
- Description (recommended)
- Directive (only when necessary)

The regular expression pattern for a test line is:

```
/^(not )?ok( +[1-9][0-9]*)?( +(- +)[\u0001-\u0009\u000B-\u000C\u000E-\u0022\u0024-\u00FF]+)?( #(.*))?$/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that this regular expression is trying to exclude the new line and # character in the description. By setting the end range to \u00FF, it also excludes a huge portion of the unicode set. Is this regex meant to be an example of a possible regex definition or the definitive definition? I'm trying to understand why there is a limit to \u00FF.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding is currently fairly underdefined, sadly. If we're seeing TAP as a byte-stream (which is fairly common in network protocols), then that is perfectly correct, if we see it as a character stream (which makes more sense in the future but not necessarily in the past) then it's rather limiting indeed.

```

### YAML blocks

If a test line is immediately followed by a 2-space indented block
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe that this is highly problematic. Not having a marker signaling to the consumer that a YAML document is following means that any streaming parser can only show results delayed (once the next result is in), which is highly problematic in interactive harnesses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I completely agree it's a pita for a parser. I've had to explain in the past why the line event for the next test line happens before the assert event for the previous test line.

However, it is what it is today, and writing it down makes it no more problematic than it already is.

The goal of this exercise is to document current implemented extensions to TAP13. Let's discuss some strategies for mitigating this issue in a way that is backwards as compatible as possible and include it in TAP15. I'd be happy to call it out as problematic in this version and mark it as a TODO item.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I completely agree it's a pita for a parser. I've had to explain in the past why the line event for the next test line happens before the assert event for the previous test line.

I resolved it by not linking the test with the diagnostic, which is less wrong but more useless.

However, it is what it is today, and writing it down makes it no more problematic than it already is.

The goal of this exercise is to document current implemented extensions to TAP13. Let's discuss some strategies for mitigating this issue in a way that is backwards as compatible as possible and include it in TAP15. I'd be happy to call it out as problematic in this version and mark it as a TODO item.

Fair enough.

beginning with the line `/^ {2}---$/` and ending with the line
`/^ {2}\.\.\.$/` that block will be interpreted as an inline YAML
document.

The YAML encodes a data structure that provides more detailed
information about the preceding failed test. The YAML document is
indented with 2 spaces to make it visually distinct from the
surrounding test results and to make it easier for the parser to
recover if the trailing `/^ {2}\.\.\.$/` terminator is missing.

For example:

```
not ok 3 Resolve address
---
message: "Failed with error 'hostname peebles.example.com not found'"
severity: fail
data:
got:
hostname: 'peebles.example.com'
address: ~
expected:
hostname: 'peebles.example.com'
address: '85.193.201.85'
...
```

The corresponding data structure in JSON would look like this:

```json
{
"message": "Failed with error 'hostname peebles.example.com not found'",
"severity": "fail",
"data": {
"got": {
"hostname": "peebles.example.com",
"address": null
},
"expected": {
"hostname": "peebles.example.com",
"address": "85.193.201.85"
}
}
}
```

If a line occurs within the YAML document that is not indented, or if
the TAP stream ends before the `/^ {2}\.\.\.$/` terminator line, or if the
YAML document does not parse as YAML, then it MUST be treated as
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make that MUST a SHOULD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Munging YAML to make it valid seems pretty bad, surely the "handled by the Harness the same as any other non-TAP lines" leaves if vague anyway?

non-compliant data, and handled by the Harness the same as any other
non-TAP lines.

Currently (2015-10-06) the format of the data structure represented by
a YAML block has not been standardized. It is likely that a future
version of TAP will standardize meanings for some of the fields in the
YAML diagnostics.

In order to adequately interpret and report the data within a test
output's YAML block, a full YAML parser is required. Specifying the
YAML language is outside the scope of this document. Harneses MAY
ignore YAML blocks entirely if the inclusion of a full YAML parser
would be overly difficult.

## DIRECTIVES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The casing here seems a little odd (I know that is how the current version is however I think I would change it for everything being sentence case or upper first)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Consistency would be nice. It's not a man page.


Directives are special notes that follow a # on the test line. Only
two are currently defined: TODO and SKIP. Note that these two keywords
are not case-sensitive.

### TODO tests

If the directive starts with `# TODO`, the test is counted as a todo
test, and the text after TODO is the explanation.

```
not ok 13 # TODO bend space and time
```

Note that if the TODO has an explanation it must be separated from
TODO by a space. These tests represent a feature to be implemented or
a bug to be fixed and act as something of an executable "things to do"
list. They are not expected to succeed. Should a todo test point begin
succeeding, the harness should report it as a bonus. This indicates
that whatever you were supposed to do has been done and you should
promote this to a normal test point.

### Skipping tests

If the directive starts with `# SKIP`, the test is counted as having
been skipped. If the whole test file succeeds, the count of skipped
tests is included in the generated output. The harness should report
the text after # SKIP\S*\s+ as a reason for skipping.

```
ok 23 # skip Insufficient flogiston pressure.
```

Similarly, one can include an explanation in a Plan line, emitted if
the test file is skipped completely:

```
1..0 # SKIP WWW::Mechanize not installed
```

## SUBTESTS

A Subtest contains a block of test points grouped together. This is
useful for the output of test runners that collect several child
processes' TAP streams, and output the result in TAP for another
harness to interpret, or for tests with a predictable number of
blocks, but where each block might not have a predictable number of
tests.

A Subtest is a TAP14 stream indented 4 spaces. It has the following
format:

- A comment line with the name of the subtest: `/^ {4}# Subtest: (.*)$/`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/^ {4}# Subtest: (.*)$/ Don't tell me 2 implementations exist here. I know this isn't true of others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://search.cpan.org/~exodist/Test-Simple-1.001014/lib/Test/More.pm#subtest documents the comment like that, but:

use Test::More tests => 3;

pass("First test");

subtest 'An example subtest' => sub {
    plan tests => 2;

    pass("This is a subtest");
    pass("So is this");
};

pass("Third test");

actually produces:

$ perl subtest.pl
1..3
ok 1 - First test
    1..2
    ok 1 - This is a subtest
    ok 2 - So is this
ok 2 - An example subtest
ok 3 - Third test

Perhaps there's a newer version of Test::More that adds the comment in? I'm a lot more comfortable with the explicit start of a subtest and a specific number of spaces of indentation, rather than just having to accept any indented stuff as possibly a nested TAP stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, updating Test::More did the trick.

$ perl subtest.pl
1..3
ok 1 - First test
    # Subtest: An example subtest
    1..2
    ok 1 - This is a subtest
    ok 2 - So is this
ok 2 - An example subtest
ok 3 - Third test

and, of course:

$ node -e 'require("tap").test("child",function(t){t.pass("ok");t.end()})'
TAP version 13
    # Subtest: child
    ok 1 - ok
    1..1
ok 1 - child # time=12.098ms

1..1
# time=40.435ms

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a lot more comfortable with the explicit start of a subtest and a specific number of spaces of indentation, rather than just having to accept any indented stuff as possibly a nested TAP stream.

I agree the parsing any indented stuff as nested TAP is unfortunate from a parsing point of view. I'd prefer an unambiguous, non-comment marker instead of a magic comment as the way forward though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding requiring # Subtest: $testname as part of the subtest, while newer versions of Test::More output that, I do not believe that later versions of the TAP::Parser recognize that. It was only put in (IIRC) to make the subtest more human-readable, but I'm unsure of the value of requiring that comment to introduce a subtest (though this might mitigate issues with YAML block directives accidentally being interpreted as test output).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporters included in node-tap use this comment to introduce a suite of tests, which makes it much easier to mimic the output of mocha and rspec by getting the name of the suite up-front. Without this, it's still easy enough to detect a subtest and distinguish from YAML (2 vs 4 space indentation, no --- marker), but you have to wait until the end before reporting on the results in-progress next to the name.

I'm much more comfortable documenting what's already in practice. It seems like the word "comment" is causing some valid concern here. Would it help if we struck all the stuff about # lines being "comments", and instead just say that they're always "directives", and that unrecognized directives MUST be ignored?

- A TAP14 stream, indented 4 spaces.
- A (non-indented) test point indicating the success or failure of the
subtest, with a description matching the Subtest's name.

For example:

```
TAP version 14
1..2
ok 1 - About to do a subtest
# Subtest: this is the subtest
ok 1 - This is ok
ok 2 - I am ok with how things are proceeding
not ok 3 - infinite loops # TODO Solve halting problem
not ok 4 - this is just a failure
1..4
not ok 2 - this is the subtest
---
plan: 4
count: 4
passing: 2
todo: 1
failed: 1
exitCode: 1
...
```

If not bailed out, a Subtest MUST be introduced by the indented
`# Subtest: ...` comment, and MUST be followed with a test point whose
description matches the Subtest name. If the Subtest is not
successful, then the Subtest test point MUST be a `not ok` test point.

A `Bail out!` in a Subtest SHOULD be immediately followed by a
`Bail out!` in the parent test.

If a Subtest TAP stream does not include a version number, it MUST be
interpreted as a TAP14 stream anyway. This is a backwards-compatible
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea being that in some future TAP14+n with some different behaviour could have subtests that are interpreted using TAP14 semantics? And the context here is that TAP14 is a codification of how people have been using/extending TAP13 in practice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the context of this spec is to codify how we're extending TAP13 in practice.

The wording of this is tricky. The issue here is that you don't want subtests to be interpreted as a TAP12 stream (since they'll likely have subtests and yaml diags), but including a TAP version XX line in a subtest upsets some existing parsers that don't require that the version start on column 0, so it must be acceptable to omit the version designator.

Perhaps it's better to say that, in absence of a TAP version XX in the subtest, it's assumed to be the same version as the parent test? (Ie, 14+)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kinda what I was thinking.. as useful as it would be to be able to have some older tests output appear as subtests, it seems like the facilities to produce them in a way that matches this spec is somewhat mutually exclusive of requiring this contingency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think people are restreaming of different versions very much as I personally think the format doesn't really lend itself to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For stuff like t.test('child test', function (t) { ... }), yeah, you're going to definitely use the same version, because you're generating TAP with the same test framework. But for spawning child processes, or especially collating tests from users of a web service, it's possible to get more random stuff showing up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, I just don't think TAP handles those test cases very well at all. With the fact that the debug isn't there for the process etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing more than one TAP version in a stream sounds like an implementors nightmare to me. I don't think it's worth the complication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing more than one TAP version in a stream sounds like an implementors nightmare to me.

Yeah, I agree, and in practice I'd probably just rely on TAP's relatively reliable backward compatibility, and parse it all as TAP14.

Since TAP14 is a superset of TAP12 (mostly; it's a bit stricter about yaml indentation, but in practice, everyone uses 2 spaces, not 1), and using other versions is unlikely anyway, it seems like it might just be reasonable to say that any embedded subtest SHOULD be TAP14?

affordance for pre-TAP14 parsers that complain about seeing a `TAP
version` line multiple times within a TAP stream.

## OTHER LINES

### Bail out!

As an emergency measure a test script can decide that further tests
are useless (e.g. missing dependencies) and testing should stop
immediately. In that case the test script prints the magic words

```
Bail out!
```

to standard output. Any message after these words must be displayed by
the interpreter as the reason why testing must be stopped, as in

```
Bail out! MySQL is not running.
```

A Harness SHOULD ignore all lines that come after a bail out.

### Diagnostics

Additional information may be put into the testing output on separate
lines. Diagnostic lines should begin with a #, which the harness must
ignore, at least as far as analyzing the test results. The harness is
free, however, to display the diagnostics.

YAML blocks are intended to replace diagnostics for most purposes but
consumers should maintain backwards compatibility by supporting them.

### Pragma
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure on this one. Isn't there only one implementer?
Forcing strict from the generators end seems a little odd really (I don't think the paradigm matches 'use strict' in ECMA).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me. It's a way for a producer to say, within a given context, that deviations from the "approved" syntax should be treated as syntax errors.

It's different from JS's 'use strict' pragma since that actually opts into a few profound semantic changes, but I think there's a valid use case for "this test should not dump any junk to stdout, and if it does, then that's an error".

There's 2 implementations now, since I added support to the JS tap-parser so that it would handle Test::Harness's tests :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup but because of TAPs forward/backwards philosophy it means nothing unless you can guarantee speaking to TAP 14 parser.
I feel like parsers should have a strict mode and they can choose to enforce it yourself.
It kinda feels on par with asking for XHTML strict... which never really worked so well did it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I understand what you're getting at, the issue is that pragma +strict in a TAP14 producer could be interpreted as strictly TAP15, or cause a failure because the consumer being a TAP13 consumer would fail on subtests?

I admit that it's a bit of an odd duck in the midst of the rest of the spec, and I'm ok with removing pragmas from node-tap if there's consensus to remove them from perl land.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one views pragmas as optional requests instead of hard requirements, I suspect most of these issues would go away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pragmas as optional requests instead of hard requirements

I'm totally fine with that. @jonathanKingston What's your take?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kind of seems as useful as priority markers on email. Having a marker for "I want to be treated nicely" or "I'm sloppy sometimes" seems a little like a road towards bad practice to me.


A Pragma line indicates that the parser should switch its behavior in
some way by setting a boolean field. Pragmas may occur anywhere
within the TAP stream, similar to Diagnostic comments.

A Pragma starts with the word `pragma`, followed by a space character,
and a space-separated list of field names, each prefixed with either a
`+` (true), or a `-` (false).

All Pragmas are false by default.

Pragmas set in Subtests do not affect the state of the pragma field in
the parent test set.

The only Pragma that is currently specified is `strict`. Additional
Pragmas may be added in the future.

To set the strict pragma, use:

```
pragma +strict
```

To disable the strict pragma, use:

```
pragma -strict
```

- `strict` When set, any non-parsing lines are reported to the user as
an error, and cause the test set to be considered a failure. When
disabled, non-parsing lines may be reported to the user in some way,
but do not cause the test set to be considered a failure.

### Blank Lines

For the purposes of this specification, a "blank" line is any line
consisting exclusively of zero or more whitespace characters.

Blank lines within YAML blocks MUST be preserved as part of the YAML
document, because line breaks have semantic meaning in YAML documents.
For example, multiline folded scalar values use `\n\n` to denote line
breaks.

Blank lines outside of YAML blocks MUST be ignored by the Harness.

### Anything else

Any output that is not a version, a plan, a test line, a YAML block, a
subtest, a diagnostic, a pragma, a blank line, or a bail out is
incorrect.

When the `pragma +strict` is enabled, incorrect test lines SHOULD
result in the test set being considered a failure, even if the test
set is otherwise valid. When `pragma -strict` is set, incorrect test
lines MUST NOT result in the test set being considered a failure if
the test set is otherwise valid.

How or if the incorrect line is displayed to the user by the Harness
is undefined.

- `Test::Harness` silently ignores incorrect lines, but will become more
stringent in the future.
- `TAP::Harness` reports TAP syntax errors at the end of a test run.
- `node-tap` prints incorrect lines to standard output, but otherwise
ignores them for the purposes of parsing or determining test
success.

## EXAMPLES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better intermingled with the rest of the content?


All names, places, and events depicted in any example are wholly
fictitious and bear no resemblance to, connection with, or relation to
any real entity. Any such similarity is purely coincidental,
unintentional, and unintended.

### Common with explanation

The following TAP listing declares that six tests follow as well as
provides handy feedback as to what the test is about to do. All six
tests pass.

```
TAP version 14
1..6
#
# Create a new Board and Tile, then place
# the Tile onto the board.
#
ok 1 - The object isa Board
ok 2 - Board size is zero
ok 3 - The object isa Tile
ok 4 - Get possible places to put the Tile
ok 5 - Placing the tile produces no error
ok 6 - Board size is 1
```

### Unknown amount and failures

This hypothetical test program ensures that a handful of servers are
online and network-accessible. Because it retrieves the hypothetical
servers from a database, it doesn't know exactly how many servers it
will need to ping. Thus, the test count is declared at the bottom
after all the test points have run. Also, two of the tests fail. The
YAML block following each failure gives additional information about
the failure that may be displayed by the harness.

```
TAP version 14
1..2
# Subtest: get list of servers
1..3
ok 1 - connect to database
ok 2 - retrieving list of servers
ok 3 - list of servers is an array of strings
ok 1 - get list of servers
# Subtest: ping servers
ok 1 - pinged diamond
ok 2 - pinged ruby
not ok 3 - pinged saphire
---
message: 'hostname "saphire" unknown'
severity: fail
...
ok 4 - pinged onyx
not ok 5 - pinged quartz
---
message: 'timeout'
severity: fail
...
ok 6 - pinged gold
1..6
not ok 2 - ping servers
---
servers_offline:
- saphire
- quartz
...
```

### Giving up

This listing reports that a pile of tests are going to be run.
However, the first test fails, reportedly because a connection to the
database could not be established. The program decided that continuing
was pointless and exited.

```
TAP version 14
1..573
not ok 1 - database handle
Bail out! Couldn't connect to database.
```

This listing reports a pile of tests where a child test bails out.
The Harness MAY continue to run subsequent tests, but in most cases,
SHOULD bubble the bail out up to the parent test.

```
TAP version 14
1..54
# Subtest: 00-setup.js
not ok 1 - database handle
Bail out! Couldn't connect to database.
Bail out! Coulnd't connect to database.
```

### Skipping a few

The following listing plans on running 5 tests. However, our program
decided to not run tests 2 thru 5 at all. To properly report this, the
tests are marked as being skipped.

```
TAP version 14
1..5
ok 1 - approved operating system
# $^0 is solaris
ok 2 - # SKIP no /sys directory
ok 3 - # SKIP no /sys directory
ok 4 - # SKIP no /sys directory
ok 5 - # SKIP no /sys directory
```

### Skipping everything

This listing shows that the entire listing is a skip. No tests were run.

```
TAP version 14
1..0 # skip English-to-French translator isn't installed
```

## Got spare tuits?

The following example reports that four tests are run and the last two
tests failed. However, because the failing tests are marked as things
to do later, they are considered successes. Thus, a harness should
report this entire listing as a success.

```
TAP version 14
1..1
# Subtest: child program
1..4
ok 1 - Creating test program
ok 2 - Test program runs, no error
not ok 3 - infinite loop # TODO halting problem unsolved
not ok 4 - infinite loop 2 # TODO halting problem unsolved
ok 1 - child program
```

## Creative liberties

This listing shows an alternate output where the test numbers aren't
provided. The test also reports the state of a ficticious board game
as a YAML block. Finally, the test count is reported at the end.

```
TAP version 14
ok - created Board
ok
ok
ok
ok
ok
ok
ok
---
message: "Board layout"
severity: comment
dump:
board:
- ' 16G 05C '
- ' G N C C C G '
- ' G C + '
- '10C 01G 03C '
- 'R N G G A G C C C '
- ' R G C + '
- ' 01G 17C 00C '
- ' G A G G N R R N R '
- ' G R G '
...
ok - board has 7 tiles + starter tile
1..9
```

## CHANGES FROM TAP13

- YAML blocks standardized to 2 space indentation
- Subtests specified to behavior of `Test::More` and `node-tap`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how formal this specification is and how important the Test::More and node-tap references are, it might be relevant to include version numbers.

- Normative advice regarding exit code for harness programs
- Examples and usage comments made language-agnostic.
- Clarification of whitespace and hyphens in test lines.
- Clarification of handling of incorrect lines.
- Specification of Pragma lines

## BUGS

Please report issues with this specification to the
[TestAnything/Specification
project](https://github.com/TestAnything/Specification).

## AUTHORS

The original TAP documentation (of which this is a hacked about
version) was written by Andy Lester, based on the original
`Test::Harness` documentation by Michael Schwern.

TAP13 documentation written by Andy Armstrong.

TAP14 documentation written by Isaac Z. Schlueter.

## ACKNOWLEDGEMENTS

Thanks to Pete Krawczyk, Paul Johnson, Ian Langworth and Nik Clayton
for help and contributions on this document. The basis for the TAP
format was created by Larry Wall in the original test script for Perl
1. Tim Bunce and Andreas Koenig developed it further with their
modifications to `Test::Harness`.

As of this writing, the TAP specification is now managed by the
[TestAnything organization](https://github.com/TestAnything).

## COPYRIGHT

Copyright Michael G Schwern <schwern@pobox.com>, Andy Lester
<andy@petdance.com>, Andy Armstrong <andy@hexten.net>, Isaac Z.
Schlueter <i@izs.me>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposely left myself off this when neatening up TAP13 which I did change a fair bit just to get live like you have.
My biggest scepticism is knowing who actually owns TAP. We should likely get to the bottom of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added myself because writing TAP14 is a substantial shift from TAP13. There are a lot of new words I've put here, and I'm not comfortable assigning responsibility for those to others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand that, the list certainly isn't complete either though. I would like to try and move it to some kind of group or foundation where group responsibility is held. However I wasn't really confident to change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to a foundation would require, at least, sign-off from whoever the owners are, and as you say, it's hard to know who that is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well technically as it has survived not contested so long; contacting just that list would likely be better.
There have been a lot more contributions than that list for certain so it seems pretty unfair for a list to be a set of people who put their name in.


This program is free software; you can redistribute it and/or modify
it under the same terms as Perl itself. See
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it probably however is safe to dual licence it with MIT.

[http://www.perl.com/perl/misc/artistic.html](http://www.perl.com/perl/misc/artistic.html).