-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature step dependencies are a pain to work with #2
Comments
i guees i agrree. how do we parse yaml in r? |
ok sorry :) |
No worries :) |
Hi Lars, I agree that the current format of the feature groups is an issue. However, please note first of all, Furthermore, I don't like YAML so much. Cheers, |
Ok, that the example is wrong would have been much clearer in the new format :) I don't see how editing YAML is more painful than editing a non-standard format. |
In the end, I can live with YAML. |
Again, I don't see how using two different standard formats is worse than using a standard and a non-standard format. In principle I don't have a problem with using YAML for everything. |
How would one of the other files look like in YAML? |
Hmm, I guess something like
I don't really see a problem with being user friendly -- you're not supposed to edit/write those files manually. |
such a format would blow up our files by more than factor 2 I guess. The description.txt is a file I always write manually. |
you can forget arff for such files immediatly |
Yes, everything would be much larger. But as I said, I'm not opposed to keeping everything but |
OK. I also asked Matthias whether he likes this new format. and he agreed. Cheers, |
Ok, what's your feeling on making the lists proper YAML lists as well? I.e. instead of comma-separated they would be
|
I like the comma-separated more since I can look up the corresponding feature step to a feature by looking one line above (and not n lines).
However, we should change the entire file. |
Ok, but presumably you're not going to parse the YAML yourself but use a library? And yes, that would apply for everything -- if the data structure is serialized by a YAML library we may not even be able to control which type of list we get (and don't need to care). So I guess my real question is whether you're planning to use a library to parse/write the YAML. |
parsing: of course. but i would prefer it if people could still manually write (smaller) files without programing. can we do that? |
I often have a look into the description.txt files to get a better feeling for the scenarios, e.g., which algorithms are used; how many feature are used and how are the feature distributed in the feature groups. |
well that argument i find slightly strange? why not use the eda overview? |
Of course you can still read/write the files manually and that shouldn't even be much more difficult than it is now. But it would be much easier to parse/write programmatically because we can just use YAML libraries. |
i meam we invested lots of time to write exactly scripts for that purpose.... web based..... |
Which, come to think of it, we should rerun to update the web pages at some point. |
Proposal: Use travis for that. People do PRs for a new scenario. Then travis builds all EDA stuff. This even checks the validity of the scenario files. Only then we merge. The only thing we then have to run manually might be the selector benchmarks. |
+1 |
|
Ok, so you think that
is harder to read than
|
Yes, but in the end, I don't feel strongly about this. |
Ok, I've updated the spec, converted all the scenarios and updated the R code. @mlindauer Could you please update the Python code/checker? |
I'm on vacation for the next two weeks. I will do it afterwards. |
Ok, thanks. No rush :) |
It just occurred to me that we should also have a look at the feature_runstatus.arff files for instances that are presolved. The spec doesn't say what should happen to dependent feature steps in this case and the data is inconsistent. For example for ASP, feature steps that depend on one that presolved seem to be listed as presolved" as well but the costs aren't given, implying that they weren't actually run. For the SAT data sets, the runstatus of feature steps that depend on one that presolved are listed as unknown (which probably makes more sense in this case). |
Hi, I started to implement the new description.txt parser and I found an issue. So, the format according to YAML should be: The same issue holds for "maximize" and "performance_type". |
The same issue applies to feature_step->"requires" in same senarios.
IN SAT11-HAND it is not OK:
|
I updated the checker tool (and flexfolio). |
Thanks, good catch. Could you fix the files please? |
Hi, I fixed it. All scenarios in the master branch are now compatible with the checker tool again. However, I found another issue.
Parsing this file (at least with Python) will give you a dictionary without a defined order of the feature steps. So, we either have to change "feature_steps" to list (which would look unintuitively and ugly imho) or we add another list, such as "feature_step_order". Cheers, |
Just remind me what the order is needed for? You can derive any ordering constraints from the provides/requires right? |
If I correctly remember, the problem was the presolved feature steps.
|
|
|
Ok, so let's have a feature status "not computed because instance presolved by previous feature step". We don't need to know what that feature step was, do we? |
OK, I agree that we should have something like "not computed because instance presolved by previous feature step".
|
Should the order of the feature steps used when generating the data for the scenarios be part of the metadata? |
Yes? |
Ok, then let's do that. |
The way feature steps are currently implicitly encoded is a pain. First, you have to read the spec very carefully to understand the semantics (which are the opposite of what at least I would intuitively expect), and modifying the features/feature steps (e.g. for feature filtering) is a complex and error-prone operation.
In particular, to remove a feature step, you have to check all the other feature steps if they contain features that are also provided by the feature step that was removed and if so remove those.
Another (albeit minor niggle) is that the format of description.txt is unnecessarily hard to parse and write because the key-value convention is broken for the feature steps (the key is not a primitive value but constructed from other things).
I propose two changes. First, use YAML for description.txt, which will introduce only minor changes but allow us to use off-the-shelf libraries for parsing and writing rather than having to write custom code. Second, encode feature step dependencies explicitly through
requires
andprovide
keys.Example:
This makes it intuitively clear what
Pre
does and that it doesn't actually provide any features on its own. It also makes thenumber_of_feature_steps
attribute redundant and it could be removed.@mlindauer @berndbischl
The text was updated successfully, but these errors were encountered: