Specification for generators/ directory. #2

RagnarGrootKoerkamp · 2020-10-23T17:43:06Z

Moved the example generators.yaml here, and made the spec we had written in the BAPCtools repo a bit nicer.

Specification: https://ragnargrootkoerkamp.nl/problem-package-format/generators/spec/generators.html
Updated this a bit, so PTAL. I made it a list of lists instead of a table for the Directory object since tables will make the description a bit too small I think.

Commented examples: https://ragnargrootkoerkamp.nl/problem-package-format/generators/examples/generators.yaml.html
This is unchanged from what we discussed in our meetings.

Description in main text: https://ragnargrootkoerkamp.nl/problem-package-format/generators/spec/problem_package_format#generators

@niemela @eldering @nickygerritsen @deboer-tim @clevengr (Not sure what the current membership status is here, so to be sure.)
FYI: @SuprDewd

examples/generators.yaml.md

spec/generators.md

RagnarGrootKoerkamp · 2020-10-27T11:48:16Z

Thanks for the thorough review @eldering! Resolved most of the easy fixes.

Btw, is there a way to respond in bulk, like for reviews? -> Oh found it: should have gone to the changes page instead of doing things from the conversation page directly.

examples/generators.yaml.md

eldering · 2020-10-27T12:00:17Z

examples/generators.yaml.md

+                  d: stdout.py d
+# This is forbidden because data: dictionaries may not appear within data: lists.
+             #data:
+             #  c: stdout.py c


I'm still confused... :-/
This commented out data key is nested under the testgroup key in the YAML, right? Then isn't it simply disallowed because you can't have the same data key twice in the dictionary testgroup?

eldering · 2020-10-27T12:02:40Z

examples/generators.yaml.md

+          - f: stdout.py f
+          - g: stdout.py g
+          - h: stdout.py h
+          - i: stdout.py i


Would it make sense to have two different variables? One that refers to the base file path (like {name} currently does) and that should be used for file operations, e.g. when passing it to a visualizer script as argument, so that that can write to {name}.jpg, and one that can be used as a name to put in descriptions, for example.

…irectories.

…des numbers.

simonlindholm

Fredrik asked me to leave some review comments.

Note that in the current state we can't use this for the Swedish IOI organization:

we need testgroup/testcase reuse
specifying testdata.yaml files by hand is too verbose and error-prone. Could fix this using some validation tool, but it doesn't help with verbosity, and we'd prefer to not introduce more tooling. Some kind of plugin system?

Both these points could be solved backwards compatibly, however.

simonlindholm · 2020-12-07T20:48:03Z

spec/generators.md

+
+* `<generator_name>` must either be a program (file/directory) in `generators/` or else a key in the top level `generators` dictionary.
+* The generator will be invoked with `<arguments>`.
+  Arguments are separated by white space (space, tab, newline). Quoting white space is not supported.


Why not? If we want this in the future, let's make it an error to include a " char

Why not? -> mostly because it's quite hard to do correctly. The easiest way to 'spec' this is by saying 'you'll get whatever the output of python shlex.shlex gives you', but that's not really that great of a spec.
Just saying 'shell style quoting' is also severely under specified sadly.

I'd be ok with disallowing ' and " for now.

simonlindholm · 2020-12-07T20:49:38Z

spec/problem_format.md

+Generators are used to generate test cases.
+They are provided in the `generators/` directory together with the file
+`generators/generators.yaml` which specifies how to invoke the generators to generate the test cases.
+This directory must adhere to the [**Generators specification**](./generators) .


There are some legacy problems that have generators/ directories that don't adhere to it. Should we say that a directory without generators.yaml is ignored? (but that tools MAY warn about it)

right; the intention is that the presence of generators/generators.yaml enabled this spec, so that should be more clear

simonlindholm · 2020-12-07T20:50:32Z

examples/generators.yaml.md

+
+# Unknown keys are allowed inside directory dictionaries for tooling-specific
+# extensions. This includes both the global scope and explicit directories with type: directory.
+unknown_key: tool_specific_config


an explicit prefix reserved for extensions would make it possible to add things to the spec in the future

Ah good idea. Happy to guarantee that keys prefixed with e.g. . or _ (so .my_key or _my_key) are reserved for extensions.

simonlindholm · 2020-12-07T20:53:29Z

examples/generators.yaml.md

+# will determine the required number of digits to use and numbers will be
+# zero-padded accordingly, using a dash as separator from the given name (when
+# the given name is not empty). All items in a given dictionary will get the
+# same number. Use a list of 1-item dictionaries for incremental numbering.


Can we just require that the list has size 1? I don't see a use-case for larger sizes, but I can see it hiding mistakes and making tooling more complex.

Yeah, I can live with that.

simonlindholm · 2020-12-07T20:54:39Z

examples/generators.yaml.md

+          - a: stdout.py a
+            b: stdout.py b
+          - testgroup:
+              type: directory


can we make this implicit if there exists a data key?

Yeah this is a complicated point. The answer in the current setup is no, because testcase: generator.py is equivalent to:

testcase: type: testcase data: generator.py

this allows for per-testcase customization of solution and visualizer, although that's super rare in practice of course

Could we rename data to something else for testcases?

Oh my bad: yes, we're actually using input instead of data already for testcases.

There probably is/was a reason for having type: in the first place, but I don't remember it anymore.
One case where it does matter is to distinguish an empty directory object from an empty testcase object. But we could say that directories must always have a data key in them, which may be an empty list/dictionary.

@SuprDewd

simonlindholm · 2020-12-07T21:17:56Z

spec/generators.md

+A **Generator** takes one of the following four types/forms:
+
+1. Null / empty
+    * An empty generator means that the testcase is a manual case and must not be modified or deleted by generator tooling. The corresponding `.in` file must be present in the `data/` directory. The corresponding `.ans` may be present, but may also be generated once from a given solution. Note that this form is discouraged. Prefer specifying a path to a `.in` file as below.


It seems like the existence of this form means that if a file exists in data/ but isn't mentioned in generators.yaml (e.g. because of a git pull in a repo where test data isn't checked in) and we run the command to regenerate test data, we can't delete it directly but need to prompt the user. Unless:

we have tooling keep additional state (it would be nice to document how we expect tooling to work!)

we add some (per-directory?) key saying whether this feature is in use

we remove this feature

I don't see a strong use-case for it within secret/; it adds complexity and splits testdata across two directories. I see a weak use-case for it within sample/, since that consists solely of manual testdata, and deduplicating that testdata would make for e.g. slightly smaller version control diffs.

Documenting why we have it, and what best practices are, would be good.

I think the current state is that all testcases must be mentioned in generators.yaml.

However, I find that in practice during development, the easiest way to play with manual testcases is to just put them in data/secret, so BAPCtools does support this. It does complicate things quite a bit though and cleaning up spurious generated files is tricky indeed.

I also added --add-manual and --move-manual flags to my bt generate command to easily update the generators.yaml to match the files in the directory, but i don't like requiring running that every time you want to test a submissions.

simonlindholm · 2020-12-07T21:37:41Z

examples/generators.yaml.md

+      '3': manual_cases/sample/3.in
+# Every testcase present in the directory must be listed.
+# TOOLING: may still allow unlisted testcases and warn about them.
+     #'4':


Sample files for interactive problems use .interaction. How do we support that?

in the manual_cases/sample/3.in example would copy over all of (e.g.) 3.in, 3.ans, 3.png, and 3.interaction, so this works fine.

A generator can generate the .interaction file along the .in file if you'd want this by taking the {name} argument.

Note that interactive problems use only .interaction, there's no .in file. So it needs some additional spec text/implementation logic.

simonlindholm · 2020-12-07T21:45:35Z

examples/generators.yaml.md

+          range: 0 25
+          grader_flags: min
+
+# To enable automatic numbering of testcases, data: may also contain a list of


Basing automatic numbering only on whether there are - prefixes seems very easy to screw up as an author. Can we add that tooling SHOULD warn if it sees non-automatically numbered testcases that don't start with numbers?

i'd prefer not to do this. in BAPC we 'traditionally' don't number testcases, and if we do, it's only at the very end.

Personally I find that numbering during development is rather annoying because inserting in the middle renumbers everything.

Also, detecting whether all cases start with (consecutive? zero padded?) numbers is messy.

But tooling is free to still warn about this if it's up to me

Oh, that's annoying. I guess the downside of making a mistake here is small enough that I'll just drop this issue. Short comments though:

(zero padded?)

Detecting missing zero-padding would be nice, and in fact problemtools already does that for names of testdata groups (grep for natural_sort_le). But that's a quality of implementation issue and not something an implementation would be required to do.

But tooling is free to still warn about this if it's up to me

Perhaps as a tooling option that we can stick in contest repo's git root... but by default I don't think tooling should have different opinions on what patterns to warn about.

RagnarGrootKoerkamp · 2020-12-07T21:54:52Z

@simonlindholm

Quick reply; will get into details later this week.

Fredrik asked me to leave some review comments.

Thanks! Good feedback for such a quick look!

Note that in the current state we can't use this for the Swedish IOI organization:

we need testgroup/testcase reuse

We are aware. We did discuss extensively at some point about this and we have ideas, but since it's not used in ICPC style problems, it's hard to know what we're designing for without your involvement.
I actually implemented something but we decided to leave it out for now because it's not trivial to design something intuitive.

specifying testdata.yaml files by hand is too verbose and error-prone. Could fix this using some validation tool, but it doesn't help with verbosity, and we'd prefer to not introduce more tooling. Some kind of plugin system?

Hmm, you're generating testdata.yaml currently? I'd be perfectly happy to allow testdata.yaml: testdata_generator.py <arg1> <arg2> ..., where a string means generator command and dictionary object means literal text (or could put it under a differently named key)

Both these points could be solved backwards compatibly, however.

Jup, that's the plan :)

simonlindholm · 2020-12-07T22:26:13Z

Hmm, you're generating testdata.yaml currently?

Yes. We generate testdata.yaml files like the following:

/data/testdata.yaml:

on_reject: continue
range: 0 100 # where 100 = sum of scores
grader_flags: ignore_sample

/data/sample/testdata.yaml:

on_reject: continue
range: 0 0
accept_score: 0
grader_flags: first_error
input_validator_flags: n=100 m=100 # user-provided flags

/data/secret/testdata.yaml:

on_reject: continue
range: 0 100 # sum of scores
grader_flags: first_error accept_if_any_accepted

/data/secret/*/testdata.yaml:

on_reject: break
accept_score: 17 # user-provided
range: 0 17 # same as accept_score
grader_flags: min
input_validator_flags: n=100000 m=100000 # user-provided flags, different for each group

testdata.yaml: testdata_generator.py <arg1> <arg2> ... might work; it makes it impossible to sum up testgroup scores, but usually that's 100 and mismatches are already warned about by problemtools IIRC.

examples/generators.yaml.md

thorehusfeldt · 2023-06-02T09:29:07Z

Is there any informed speculation (say, by @simonlindholm or @RagnarGrootKoerkamp) about how to allow the inclusion of the testcases of one testgroup into another?

The testdata_tools toolchain used by Swedish @Kodsport has

group group3 12
include_group group1
tc other_testcase

which includes all the testcases generated for group1 also in group3, and also the testcase other_testcase. This is implemented using symbolic links. (So other_testcase.in only exists once in data/*.) The testcase other_testcase is only generated once, but will be validated several times (because the groups can have different tesdata settings).

I find this incredibly useful, in particular the inclusion of entire groups. I can see that both generator and directory could be extended with meaningful keys to make something like this happen, but admit to not having thought this through at all (much less implemented it).

RagnarGrootKoerkamp · 2023-06-02T09:35:46Z

extending generators.yaml for this sounds reasonable. Since we don't use testdata groups in BAPC/NWERC this hasn't really come up so far, but I think it should be OK to add this.

thorehusfeldt · 2023-06-05T06:31:37Z

@RagnarGrootKoerkamp : was the include key already reserved for exactly the purpose of testgroup and testcase inclusion? (Honest question.)

If so, part of a generators.yaml, extending BAPCtools/doc
/generators.yaml would look like this:

    reject-greedy: # another testgroup
      testdata.yaml:
        accept_score: 12 
      data:
       - m: stdout.py m # m belongs only to reject-greedy
      include:
      - hard_cases_group # include all testcases of hard_cases_group
      - 13_no_visualizer # also include another test case

Is this what you had in mind? The semantics of include: foo is to include foo/data and (recursively) foo/include (but not foo/testdata, I believe.)

Maybe specified like this:

directory :: {
    type: "directory"
    file_config
    "testdata.yaml"?: {
        ...
    }
    data?: data_dict | [...data_dict]
    include?: string | [...string]
    generator_reserved
    ...
}

(string for the names of test cases and groups would benefit from a stricter specification, by the way.)

Issues to clarify:

the semantics of automatic numbering. I guess that’s not really a problem if anything below include does not get a new automatic number; the testcase is only ever specified once in the whole script and that place determines its number.
which testdata settings (testdata.yaml) apply. I think this is clear as well, in the above example testcase e has accept_score: 25 for the purpose of grading testgroup hard_cases_group and accept_score: 12 for the purpose of grading testgroup reject_greedy.

But there may be many aspects I haven’t thought about. Please help.

RagnarGrootKoerkamp · 2023-07-22T10:27:56Z

We decided that this will not be part of the problem package format spec. Instead, the documentation and specification of this will move into the BAPCtools repo.

Specification for generators/ directory.

3e12653

eldering reviewed Oct 25, 2020

View reviewed changes

Address comments by Jaap.

9121008

eldering reviewed Oct 27, 2020

View reviewed changes

RagnarGrootKoerkamp added 3 commits October 27, 2020 14:59

more fixes.

9ea7d5c

Add link to how programs are built.

e93fe69

Further clarify that numbered directories may only contain numbered d…

eca2b96

…irectories.

niemela requested a review from ghamerly November 16, 2020 19:11

RagnarGrootKoerkamp added 2 commits November 25, 2020 15:52

Clarify that {name} refers to the current working directory and inclu…

fa6d5ee

…des numbers.

Add generators to list of programs that allow build/run scripts.

24aeffe

niemela requested review from austrin and simonlindholm December 7, 2020 19:14

simonlindholm reviewed Dec 7, 2020

View reviewed changes

eldering reviewed Dec 8, 2020

View reviewed changes

examples/generators.yaml.md Outdated Show resolved Hide resolved

fix typo

4536bda

SuprDewd mentioned this pull request Apr 29, 2021

Draft implementation of generators.yaml spec Kattis/problemtools#181

Merged

RagnarGrootKoerkamp mentioned this pull request Jul 21, 2023

[meta] Relevant BAPCtools issues #27

Closed

RagnarGrootKoerkamp closed this Jul 22, 2023

RagnarGrootKoerkamp deleted the generators branch July 24, 2023 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specification for generators/ directory. #2

Specification for generators/ directory. #2

RagnarGrootKoerkamp commented Oct 23, 2020

RagnarGrootKoerkamp commented Oct 27, 2020

eldering Oct 27, 2020

eldering Oct 27, 2020

simonlindholm left a comment

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 7, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 7, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 7, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 7, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 7, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 9, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 7, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 7, 2020

simonlindholm Dec 7, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp Dec 7, 2020

simonlindholm Dec 7, 2020

RagnarGrootKoerkamp commented Dec 7, 2020 •

edited

Loading

simonlindholm commented Dec 7, 2020

thorehusfeldt commented Jun 2, 2023 •

edited

Loading

RagnarGrootKoerkamp commented Jun 2, 2023

thorehusfeldt commented Jun 5, 2023 •

edited

Loading

RagnarGrootKoerkamp commented Jul 22, 2023

Specification for generators/ directory. #2

Specification for generators/ directory. #2

Conversation

RagnarGrootKoerkamp commented Oct 23, 2020

RagnarGrootKoerkamp commented Oct 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonlindholm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RagnarGrootKoerkamp commented Dec 7, 2020 • edited Loading

simonlindholm commented Dec 7, 2020

thorehusfeldt commented Jun 2, 2023 • edited Loading

RagnarGrootKoerkamp commented Jun 2, 2023

thorehusfeldt commented Jun 5, 2023 • edited Loading

RagnarGrootKoerkamp commented Jul 22, 2023

RagnarGrootKoerkamp commented Dec 7, 2020 •

edited

Loading

thorehusfeldt commented Jun 2, 2023 •

edited

Loading

thorehusfeldt commented Jun 5, 2023 •

edited

Loading