Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generating included testcases #262

Merged
merged 24 commits into from
Jun 28, 2023
Merged

Conversation

RagnarGrootKoerkamp
Copy link
Owner

@RagnarGrootKoerkamp RagnarGrootKoerkamp commented Jun 21, 2023

(Note: I converted @thorehusfeldt's original issue into a PR.)

Here is a simple generator, say for an addtwonumbers-like problem:

data:
  sample:
    type: directory
    data:
      '1': stdout.py 1 2

  secret:
      type: directory
      data:
        'foo': stdout.py 4 6
      include:
        - 'sample/1'

It uses the include key to include the 1 2 instance called 1 also in the secret data.

This works once:

> bt generate
PROBLEM generatorincludes
[…]
Generate: sample/1   NEW 1.in
Generate: sample/1   NEW 1.ans
Generate: secret/foo NEW foo.in
Generate: secret/foo NEW foo.ans
> ls data/secret
1.ans   1.in    foo.ans foo.in

but not twice

> bt generate
PROBLEM generatorincludes
Traceback (most recent call last):
[...]
  File "/Users/thore/Developer/BAPCtools/bin/generate.py", line 955, in parse
    assert t.path not in self.known_cases

@thorehusfeldt thorehusfeldt mentioned this pull request Jun 20, 2023
2 tasks
@RagnarGrootKoerkamp
Copy link
Owner

Re. ordering (#261 (comment)), this may already happen, but either way we should generate included testcases after all other testcases

@mzuenni
Copy link
Collaborator

mzuenni commented Jun 20, 2023

what is the intended behaviour of include? do all mentioned cases get copied? or could we just add a symlink?

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Jun 20, 2023

@mzuenni : as far as I can tell, there is no authoritative source for the intended semantics of include. However, I think BAPCtools does the right thing.

In particular, and this is not important for the semantics, it creates symlinks to those testcases. In my example above,

> ls -l data/secret
total 16
lrwxr-xr-x  1 thore  staff  15 20 Jun 17:01 1.ans -> ../sample/1.ans
lrwxr-xr-x  1 thore  staff  14 20 Jun 17:01 1.in -> ../sample/1.in
-rw-r--r--  1 thore  staff   3 20 Jun 17:01 foo.ans
-rw-r--r--  1 thore  staff   4 20 Jun 17:01 foo.in

So this already works (except for the can’t-run-it-twice issue reported here.)

More importantly when foo is a testgroup, the expression

bar:
  type: directory
  include: foo

means: include the testcases that are descendants of foo directly as children of bar.

(An alternative choice would be to include below bar a copy of the subtree of testgroups and testcases rooted at foo (and retain its tree structure). But that decision causes all kinds of pain, such as what to do with testdata_yaml-directives in that subtree.)

@RagnarGrootKoerkamp
Copy link
Owner

ok i'm starting to remember things slowly :D

The PR at https://github.com/Kattis/problem-package-format/pull/2/files is the best 'groundtruth' there is at the moment. I suppose I implemented include: but then we decided against merging that into the spec for now as we don't actively use it and then problemtools would also have to support more things.

So yes, I implemented include: in BAPCtools but it was never tested much. But indeed let's try to make it work. I'm happy that it already gets quite close to what you want :)

(side note: even though include may not be in the spec; you can use it without problems once it works, since in the end we anyway export the generated data/ directory. Just may have to be careful with the (non-)preservation of symlinks.)

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Jun 20, 2023

In fact, problemtools has no problems at all with symlinks to other testcases.

The generator script used by the Swedish crowd (https://github.com/Kodsport/testdata_tools) has a directive include_group <previousgroupname>, see here:

https://github.com/Kodsport/testdata_tools/blob/10abfe1431e1adf053d3e71973dc84cd08007aa1/generator_example.sh#L39

as well as the directive tc <previoustestcasename> see

https://github.com/Kodsport/testdata_tools/blob/10abfe1431e1adf053d3e71973dc84cd08007aa1/generator_example.sh#L28

Their semantics is exactly like proposed by include. problemtools and Kattis have worked with this for years, and it is routinely used by the Swedish and Nordic informatics olympiad.

I would really be nice to get this working. (It’s very close already.)

A difference is that testdata_tools only works with two levels, like secret/group1, not secret/foo/bar/baz, and that testdata_tools requires all testcases to have unique (short) names: sample/1 and secret/1 must be the same testcase. (This strikes me as a useful restriction to the specification that simplifies many things, but it’s not something the generators specifikation needs to emulate. bt generate should of course complain if an included testcase creates such a name clash.)

@RagnarGrootKoerkamp
Copy link
Owner

In fact, problemtools has no problems at all with symlinks to other testcases.

great! Unless one of you has a look, i'll get to it by tomorrow probably

@mzuenni
Copy link
Collaborator

mzuenni commented Jun 20, 2023

My guess is that in the second run, the symlinked files get picked up as unlisted manual testcases which are added to the set of known testcases? in that case the assert should be different?

@RagnarGrootKoerkamp
Copy link
Owner

yeah probaly something like that.

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Jun 20, 2023

If you want something more meaty to test than the minimal example upthread, here’s a tiny “add two numbers” problem with half a dozen testcases and a complicated test group structure generated by:

solution: /submissions/accepted/th.py
data:
  sample:
    type: directory
    data:
      '1': stdout.py 1 2
      '2': stdout.py 1 -3
  secret:
    type: directory
    data:
      small:
        type: directory
        data:
          positive:
            type: directory
            data:
              'sm-all-pos': stdout.py 4 6
              'sm-zero': stdout.py 4 0
            include:
              - 'sample/1'
          general:
            type: directory
            data:
              'sm-mixed': stdout.py 4 -6
              'sm-all-neg': stdout.py -4 -6
            include:
              - 'sample/2'
      general:
        type: directory
        data:
          'lg': stdout.py 83413870413975664 -11
        include:
          - 'secret/small'

I’m not very sure about the desired root of the include path names.

generatorincludes.zip

@RagnarGrootKoerkamp RagnarGrootKoerkamp self-assigned this Jun 21, 2023
@RagnarGrootKoerkamp
Copy link
Owner

RagnarGrootKoerkamp commented Jun 21, 2023

I'm working on this now. I've merged #261 to avoid merge conflicts later on. Will put some findings here:

When running for the first time with an empty data dir, I get this:

  File "/home/philae/git/bapc/bapctools/bin/generate.py", line 971, in parse
    assert Path(include) in self.known_cases
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

This will probably be fixed by either:

  • processing in order of the yaml file
  • doing a second pass for includes
  • toposorting

I think I prefer the 1st option.


Ah I see; this was because secret/small is a directory and not a file, and the line above has # TODO: This, or self.known_directories. 🙈


Currently the yaml entries are sorted (alphabetically) before processing them. Processing by file-order sounds more intuitive so will remove the sort.


More importantly when foo is a testgroup, the expression

bar:
  type: directory
  include: foo

means: include the testcases that are descendants of foo directly as children of bar.

Does this really already work? (or am I misreading?) For me there are a few more issues (such as only globbing for {include}/*.in, and not in subdirectories as your example requires.


I agree that flattening included directories simplifies things. Also, it prevents 'double inclusion' of the samples as in your generatorincludes.zip example. But that does require a deduplication strategy, since it finds multiple symlinks to sample/1.in. Currently these are not flattened and a symlink-to-symlink-to-sample is created, but flattening is probably nicer.
I'll go with:

  • Resolve symlink-to-symlink to their final non-symlink .in file.
  • Dedup doubly included testcases of the same name as long as they resolve to the same original .in file. Otherwise raise an error.
  • Ensure any created symlink is distinct from every named testcase in the directory. Otherwise error.

A separate issue is how to deal with numbered testcases/directories. This may or may not already work; will play with it after the above works.

@mzuenni
Copy link
Collaborator

mzuenni commented Jun 21, 2023

Python path.glob() can be used to glob recursively (with {include}/**/*.in) as far as i can see. Besides that flattening the symlinks sounds good to me.

@RagnarGrootKoerkamp
Copy link
Owner

I'll disallow including parent directories since that would create infinite recursion.
I could allow including child directories, but not sure this makes sense. @thorehusfeldt wdyt? Not allowing it is somewhat simpler since that removes the requirement that child-directories are generated before the current directory.

@RagnarGrootKoerkamp
Copy link
Owner

I'm also being lazy with warnings for now; symlinks are always written, also if they would overwrite distinct files/symlinks.

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Jun 21, 2023

Does this really already work? (or am I misreading?)

It does not. (Single files almost-work, subgroup inclusion does not work.)

. @thorehusfeldt wdyt?

Ancestral includes must be disallowed (they make no sense.)

I can’t see a use-case for descendant-includes, so I’d just disallow them.

The two actually-important use-cases (that I use all the time) are

  1. include every sample testcase somewhere in secret
  2. include all of, e.g., secret/group2 in, e.g., secret/group3 (each of which has no subgroups, only testcases)

Those are the two inclusion mechanisms supported by testdata_tools. Everything else is nice-to-have. I suggest to go for “include all leaves below u as children of v provided u and v don’t have an ancestral relation”.

What is important to understand is how this interacts with testdata_yaml (because group_2 may have other input-validator flags than group_3). So the generator needs to remember to re-validate the “copied” testcase also under the testdata_yaml-setting for group_3, rather than merely under the settings that are in place for group_2.

(There is a nag about output validator flags for “different” occurrences of the “same” testcase. But Kattis does not actually implement those anyway, as I learned somewhat painfully during a recent contest. Kattis seems to run every testcase exactly once and reuse the judgement. So we shouldn’t worry about that.)

@RagnarGrootKoerkamp
Copy link
Owner

RagnarGrootKoerkamp commented Jun 21, 2023

Does this really already work? (or am I misreading?)

It does not. (Single files almost-work, subgroup inclusion does not work.)

ok thanks for clarifying :) It now almost works for me locally.

Those are the two inclusion mechanisms supported by testdata_tools. Everything else is nice-to-have. I suggest to go for “include all leaves below u as children of v provided u and v don’t have an ancestral relation”.

Yes, these will work for sure. I'm rewriting to resolve all includes in code after all (rather than globbing the filesystem), and I think child-inclusions will work after all.

What is important to understand is how this interacts with testdata_yaml (because group_2 may have other input-validator flags than group_3). So the generator needs to remember to re-validate the “copied” testcase also under the testdata_yaml-setting for group_3, rather than merely under the settings that are in place for group_2.

Oh this is a good point I hadn't considered yet. That may imply I have to upgrade included testcases from simple strings to actual 'TestcaseRule' objects. I'll first get things working without this additional validation and then go from there.

(There is a nag about output validator flags for “different” occurrences of the “same” testcase. But Kattis does not actually implement those anyway, as I learned somewhat painfully during a recent contest. Kattis seems to run every testcase exactly once and reuse the judgement. So we shouldn’t worry about that.)

Hmm. So to reword: input validation should always happen for included testcases, but output validation not necessarily, since even though we can validate the provided .ans file / jury submission on the included cases, kattis won't actually rejudge any actual submission with the updated output_validator_flags anyway.

@RagnarGrootKoerkamp
Copy link
Owner

RagnarGrootKoerkamp commented Jun 21, 2023

Ok basic generating should work now. I've converted the isssue into a PR.

Outstanding issues:

  • input-validate included cases with updated flags.
  • Do an up-to-date check for included cases to prevent input-validating them when nothing changed
  • maybe output-validate included cases
  • add logic to not run on included cases. For the default 'pass-or-fail' logic one failing instance is enough, and the modified output-validator-flags isn't supported by kattis anyway as you said. (Maybe we should warn for this though.)
  • I see that you're working on grading in your fork (https://github.com/thorehusfeldt/BAPCtools), so i suppose that will come later.

Here's an extended config for the test problem:

solution: /submissions/accepted/th.py

data:
  sample:
    type: directory
    data:
      "1": stdout.py 1 2
      "2": stdout.py 1 -3
  secret:
    type: directory
    data:
      # ERROR: Forward includes are not possible.
      # forward:
      #   type: directory
      #   include: ["secret/general"]
      small:
        type: directory
        data:
          positive:
            type: directory
            data:
              "sm-all-pos": stdout.py 4 6
              "sm-zero": stdout.py 4 0
            include:
              - "sample/1"
          general:
            type: directory
            data:
              # ERROR: Include of same name as testcase.
              #"2": stdout.py 1 3
              # ERROR: Conflicting includes in secret/general
              #"sm-zero": stdout.py 5 0
              "sm-mixed": stdout.py 4 -6
              "sm-all-neg": stdout.py -4 -6
            include:
              - "sample/2"
        #include:
        # ERROR: Including child is not allowed
        #- "secret/small/positive"
        # ERROR: Including parent is not allowed
        #- "secret"
      general:
        type: directory
        data:
          "lg": stdout.py 83413870413975664 -11
          #"sm-all-pos": stdout.py 4 7
        include:
          - "secret/small"

@RagnarGrootKoerkamp RagnarGrootKoerkamp changed the title generator with include can only run once Support generating included testcases Jun 21, 2023
@RagnarGrootKoerkamp
Copy link
Owner

@thorehusfeldt Ok I think this is in pretty good shape now. I've added caching of validators flags so they should only rerun for includes if arguments changed.

Could you test it out? After generating once rerunning it should be pretty much instant provided there are no errors. Then if you modify the flags in a testgroup.yaml that validator should rerun.

One open question is whether I should by default skip running on included cases. Since I (and kattis) don't support having distinct output validator flags, running once should be OK right.

@RagnarGrootKoerkamp
Copy link
Owner

It seems there are some parallellism issues now: we must make sure testcases have finished processing before we can include them elsewhere. I'll probably rewrite things into two passes after all:

  • first pass in parallel to generate testcases
  • second pass sequential to generate includes. (Not in parellel, because includes can also depend on each other. Waiting for sequential validation should be rare and fast anyway.)

@mzuenni
Copy link
Collaborator

mzuenni commented Jun 22, 2023

Generation is already done in two steps (first all listen then all unlisted) a third step won't harm :D

@RagnarGrootKoerkamp
Copy link
Owner

let's hope this fixed it

@thorehusfeldt
Copy link
Collaborator Author

Alas, I have no success bt generate-ing the example problem generatorincludes (which is zipped above.)

It just gets stuck.

generatorincludes git:(pull_262) > bt generate
PROBLEM generatorincludes
^CFATAL ERROR: Running interrupted         [############------------------------------------------------------------------- 4/20]

I’m on 9a0d821, and on a 2017 Mac, Python 3.10.

@RagnarGrootKoerkamp
Copy link
Owner

RagnarGrootKoerkamp commented Jun 22, 2023

Can you run with -j0? Runtime errors usually make it hang when parallelization is on. (Probably it will still crash but at least it will show the error.)

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Jun 22, 2023

(to success using -j0 moved to slack)

In better news, I have rewritten https://dpop23.kattis.com/contests/dpop-23-1/problems/dpop23.storebededag to use generators.yaml (300+ cases in 2 testgroups), and it works as expected I think.

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Jun 22, 2023

An interesting issue is the following:

If, in the running example, we use numbered testcasenames in secret/small/positive

            data:
              - "sm-all-pos": stdout.py 4 6
              - "sm-zero": stdout.py 4 0

what is then the proper value in the later include in secret/small/generate:

            include:
              - "sample/2"
              - "secret/small/positive/sm-zero"

or

            include:
              - "sample/2"
              - "secret/small/positive/2-sm-zero"

Currently, the latter works, and the former predictably throws

  File "/Users/thore/Developer/BAPCtools/bin/util.py", line 467, in read_yaml
    assert path.is_file()
AssertionError

I don’t pretend to have thought this through, but I think I prefer sm-zero (otherwise the yaml-file author needs to count and possibly even rename sm-zero when the other testgroup changes.) Allowing both would also be option.

@RagnarGrootKoerkamp
Copy link
Owner

RagnarGrootKoerkamp commented Jun 23, 2023

I think the commit above has the fixes you also did.

Re. numbered testcases:

  • include: references should never include numbers, since they are meant to be unstable.
  • Numbered cases can (and at least for me often do) have identical names, and so they can have non-unique names when referring to them.
  • Anything included into a numbered directory should be renumbered. Probably by first numbering included cases from 1, and then continuing with generated cases.
  • But renumbering cases introduces non-unique names for them, which sounds problematic given that kattis only runs once for each unique name. (Or do they dedup by symlink target instead of name?) Either way having non-unique names (including number) doesn't sound ideal to me either.

So for now the simplest solution to me sounds like forbidding using include: inside numbered directories, and also forbidding include: to point to numbered directories.
Since the convention is already that all testcases should have unique names, this doesn't sound like a big drawback. (One benefit of numbered directories is that they allow being lazy and reusing names, but you already shouldn't do that it seems.)

A completely different approach would be to have a global numbering instead of per-directory numbering. Then, we could include cases and reuse their existing number. Still the issue remains that testcases may have non-unique names.

TL;DR: Unless you would really like to use it with numbering, let's only allow include: in named (non-numbered) directories and make explicit errors when used in combination with numbering.

@thorehusfeldt
Copy link
Collaborator Author

thorehusfeldt commented Jun 24, 2023

Since the convention is already that all testcases should have unique names

I agree with that observation and also like the convention very much, and would be very happy to move the specification in that direction more explicitly.

Before going there I just want to make sure we aren’t talking past each other about what “unique names” means: “name” here means that the name of a testcase whose input is at secret/hard/foo.in is foo. And unique means “appears exactly once as a testcase name in the problem” (as opposed to “the testgroup”). Right?

@thorehusfeldt
Copy link
Collaborator Author

generators.yaml:

accept_score: 25

should be

accept_score: "25"

(The joys of actually validating against a schema. 25 looks like a number to yaml. But the spec requires string here.)

@RagnarGrootKoerkamp
Copy link
Owner

Oh ok, sure. (I assume the testdata.yaml spec is somewhat well established, and changing it to a number is too late now? And this was not simply an oversight?)

@RagnarGrootKoerkamp
Copy link
Owner

ok I think it's time to merge this; some other changes depend on it, and it seems to work so far

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants