Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

submission: invalidate a single YAML data table upload #34

Closed
GraemeWatt opened this issue May 5, 2021 · 4 comments
Closed

submission: invalidate a single YAML data table upload #34

GraemeWatt opened this issue May 5, 2021 · 4 comments
Assignees

Comments

@GraemeWatt
Copy link
Member

An Uploader can currently upload a single YAML data table consisting only of independent_variables and dependent_variables. The HEPData code interrupts this as being a single YAML file and calls split_files to write a submission.yaml file containing a default comment: No description provided. and no data tables. The directory containing the submission.yaml file is then deleted by cleanup_old_files(hepsubmission) since no data tables have been processed. The submission still passes validation and the user is sent an email saying the upload was successful. However, the record just shows the upload dialogue because no tables have been processed.

The split_files function should be more careful in checking that a YAML document containing independent_variables and dependent_variables also contains the required metadata fields (name, description, keywords), otherwise the upload should be invalidated with a suitable error message returned. The submission documentation on single YAML files should also be clarified.

@alisonrclarke alisonrclarke self-assigned this Sep 15, 2021
@alisonrclarke alisonrclarke transferred this issue from HEPData/hepdata Sep 15, 2021
@alisonrclarke
Copy link
Contributor

Transferred this issue to hepdata-validator as the new full submission validator means it can be fixed there.

@alisonrclarke
Copy link
Contributor

It looks like such files being allowed as validating against the additional_info_schema which has no required properties and allows additional properties.

I've added a check that at least one item in a submission file validates against the submission_schema, which fixes this issue. However, is it worth checking whether we can make the additional_info_schema stricter?

@GraemeWatt
Copy link
Member Author

Good point. Most of the properties in additional_info_schema.json were used for migration from the old hepdata.cedar.ac.uk site and should not be present for new submissions. New submissions should only contain comment and additional_resources, but neither is required. Conversion from the oldhepdata format also writes record_ids, although they are not used anywhere. We should also allow an optional hepdata_doi similar to the table_doi in submission_schema.json so that YAML downloads from hepdata.net can be uploaded and pass validation. So I think we could just keep these four properties (comment, additional_resources, record_ids, hepdata_doi) and set "additionalProperties": false, but we'd need to check whether this breaks anything and the test data might need to be updated.

@GraemeWatt
Copy link
Member Author

I've added a temporary commit to PR HEPData/hepdata#392 to install the new validator branch so that we can check the tests pass on the main repo before releasing v0.3.0 of the validator. Some tests are failing. Can you please take a look? For example, tests/search_test.py::test_reindex_all imports a record with 0 data tables, so I guess we need to allow for this case, i.e. don't require at least 1 data table if validating against the old schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants