Which validators are run and when? #26

thorehusfeldt · 2023-06-01T06:52:28Z

I’ve tried to understand which validators are run and when. This is particularly interesting with respect to input validation.

There are several sources of information about this, in particular “the specification” (which exists in many different forms), and two toolchains that are inspired by the specification.

To highlight the issue, there is currently no consensus about what the following entry in testdata.yaml means:

input_validator_flags: foo

But there are more complicated issues as well. Here follows an attempt at an overview.

Before the deep dive, here is an attempt at a crisp summary:

Is the name input_validator_flags or input_validators? Henceforce I’ll call it iv.
What is the semantics of the value of iv when it is a string? Is it a string of flags, such as --max_n 10 --connected or the name of a validator such as grammar.ctd?
If the value of iv is no a string, then what is it? Can it be a list of dicts?
If a validator is listed in the name key of iv (or, if it’s a list, in any of its entries), does that mean that only the listed validators are run on the testcases in the given testgroup?
If secret and secret/foo both have iv keys in their testdata.yaml, how is inheritance handled? If secret has an testdata.yaml/iv key but secret/foo has testdata.yaml but no testdata.yaml/iv then does secret/foo inherit the values from its parents? Does a flag for validator grammar.py specified in secret apply to secret/foo? Does that depend on if secret/foo has testdata.yaml, testdata.yaml/iv or mentions grammar.py in testdata.yaml/iv.

The most important issue is 2, because it breaks a core functionality it makes two toolchains (problemtools and BAPCtools) incompatible.

Specification

What does the specification say? At https://www.kattis.com/problem-package-format/spec/problem_package_format is says:

Key input_validator_flags
Type String or map with the keys "name" and "flags".
Default empty string.
Comments arguments passed to the input validator for this test data group.If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.

However, I have learned that this is not the intended mode of reading. The only source for the intended meaning is currently the source code, which is here:

input\_validator<s class="dep kattis">\_flags</s></s> |  
String or map with the keys "name" and "flags"</s> |  
empty string  | 
arguments passed to the input validator for this test data group.</s> If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.</s> |

The <s> tags seem to be unbalanced, but one way to make sense of this are the two readings:

Key input_validator_flags |
Type: String or map with the keys name and flags
Default empty string
Comments: arguments passed to the input validator for this test data group

and

class "dep kattis", presumably meaning “relevant to Kattis, deprectated”

Key input_validator |
Type: String or map with the keys name and flags
Default empty string
Comments: arguments passed to the input validator for this test data group. If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.

However, I’m a bit out of my depth of how to interpret the tags.

Current behaviour of `input_validator_flags: foo`

The most important thing for me to align what input_validator_flag: foo in data/a/b/testdata.yaml should mean. (This feature is extremely useful and widely used.) It should have high priority that there exists a nonzero number of accessible and authoritative sources that specify the name and intended functionality of this flag.

I believe the current implementation of problemtools

runs all validators in input_validators on all test-cases and
sends the string foo as an argument to all(?) those validators for the testcases in data/a/b.

BAPCtools does something else, it

runs validator foo on the testcases in data/a/b.

The three traditions (specification, problemtools, BAPCtools) further disagree on inheritance, having very different opinions on what to do when data/a/testdata.yaml exists and sets input_validator_flag: bar

data/a/b/testdata.yaml exists but does not contain input_validator_flags. Specification says: bar not set. Problemtools: travels upwards in settings tree and finds bar.
data/a/b/testdata.yaml does not exist. Should bar, when interpreted as a validator, run? As the only validator? Or is the absence of a specified validator at b an indication that all validators should run?

Provided validators

The specification says:

All input validators provided will be run on every input file.

Alas, the semantics of the word “provided” is not clear from the rest of the specification. It may mean “all input validators found in input_validators”. Or, the named validators somewhere in testdata.yaml settings files can restrict or extend which validators are “provided”.

Testdata settings inheritance

The specification says:

In each test data group, a file testdata.yaml may be placed to specify how the result of the test data group should be computed. If such a file is not provided for a test data group then the settings for the parent group will be used.

I don’t think that is what is implemented by Problemtools (but it is implemented by BAPCtools). Suggestions:
(i) slightly better as “…if ~~such a file~~ a setting is not provided for a test data group then the…”.
(ii) slightly better as “… to specify settings for a test data group, such as grading or validation”.

Speculative behaviour of `input_validator_flags` / `input_validator(s)`

This should have low priority.

What BAPCtool seems to implement is this:

Key input_validator_flags
Type String or map with the keys name and flags or nonempty list of such maps.
Default empty string.
Comments If a string this is the name of the input validator that will be used for this test data group; no other validators are run. If a map then this is the name as well as the flags that will be passed to the named input validator; no other validators are run. If a list then exactly the named validators in the list are run, with the given flags.

(What is new is the “list” type.) This may indeed be the intended definition of the speculative part of the specification. However, the inheritance rules are very unclear to me; there is no agreement on whether keys in testdata.yaml files are inherited, much less about what happens when those keys are lists of dicts.

The text was updated successfully, but these errors were encountered:

Tagl · 2023-07-21T12:06:00Z

I will join

jsannemo · 2023-07-22T09:38:59Z

Decision: run all of them all the time. input_validator_flags can be used to give specific flags to specific validators, with empty flags for unlisted validators

Tagl · 2023-07-22T09:47:13Z

Additional note: Idea is that to skip validators, validators can be implemented to support something like a --skip flag and the intent is to add support for that in CTD.

jsannemo · 2023-07-22T10:15:36Z

Decision: there can be only one output validator

thorehusfeldt mentioned this issue Jun 1, 2023

Align semantics of input_validator_flags: foo RagnarGrootKoerkamp/BAPCtools#259

Merged

niemela assigned thorehusfeldt Jul 21, 2023

niemela assigned Tagl Jul 21, 2023

Tagl mentioned this issue Jul 22, 2023

Update spec for validators #61

Merged

eldering closed this as completed in #61 Jul 22, 2023

RagnarGrootKoerkamp mentioned this issue Jul 24, 2023

legacy: cleanup of input_validator_flags #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which validators are run and when? #26

Which validators are run and when? #26

thorehusfeldt commented Jun 1, 2023 •

edited

Loading

Tagl commented Jul 21, 2023

jsannemo commented Jul 22, 2023

Tagl commented Jul 22, 2023

jsannemo commented Jul 22, 2023 •

edited

Loading

Which validators are run and when? #26

Which validators are run and when? #26

Comments

thorehusfeldt commented Jun 1, 2023 • edited Loading

Specification

Current behaviour of input_validator_flags: foo

Provided validators

Testdata settings inheritance

Speculative behaviour of input_validator_flags / input_validator(s)

Tagl commented Jul 21, 2023

jsannemo commented Jul 22, 2023

Tagl commented Jul 22, 2023

jsannemo commented Jul 22, 2023 • edited Loading

thorehusfeldt commented Jun 1, 2023 •

edited

Loading

Current behaviour of `input_validator_flags: foo`

Speculative behaviour of `input_validator_flags` / `input_validator(s)`

jsannemo commented Jul 22, 2023 •

edited

Loading