Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License file and datacite value validation #55

Merged
merged 15 commits into from
Sep 14, 2020

Conversation

achilleas-k
Copy link
Member

@achilleas-k achilleas-k commented Sep 10, 2020

Validation

License

  • Repositories require a LICENSE file to be able to initiate a request.
  • If the name of the license in the datacite.yml is known, the full text of the LICENSE file is compared against the known licenses in https://gin.g-node.org/G-Node/Info/src/master/licenses.
    • This wont work in many cases. License names in the datacite.yml can appear in many forms (full name, abbreviated name like CC-BY...) and some wont match. If the name in the datacite.yml doesn't match any files in the repo, the request is allowed.
    • The most common licenses used to publish on GIN are caught, so this should be able to catch some license mismatches early (which has been the most common issue lately).
  • I've edited the licensing page on the Wiki to help users get the full text when needed, in case they didn't add it during the repository creation phase: https://gin.g-node.org/G-Node/Info/wiki/Licensing

Restricted value selection

The datacite.yml keys ResourceType and reference RefType can only take a small number of values. These are listed in the instructional comments of the datacite.yml template. With this PR, the values are also checked during preparation and submission and if they're not valid, the allowed values are shown to the user.

Code quality

This PR also includes a lot of reshuffling of code. I've moved functions around to different files to better separate them by purpose. They're all still in the same package though.

Some validation flow that was duplicated in both the preparation and submission phases has been converted to a function that's called in both cases.

The project should be much easier to navigate in general now.

Fail with error both at the preparation and the request stage.  Link to
new help page with information about licensing and links to license
documents.
Takes care of URL path escaping.
This wont work for most variations, but it's a start.
Check if keys with limited choice of values contain a valid value.
Order functions based on request flow.  Makes the file easier to
navigate... sort of.
The datacite.yml is read, parsed/unmarshalled and validated at two
points in the registration flow: During request preparation and after
the request is submitted.  A lot of code was duplicated for these two
parts to manage logging and error reporting separately, but it's much
cleaner and easier to work with if we have a single function that
manages both.

The user-friendly error message is returned from the new function and
can be displayed on the page immediately.  More detailed error logging
happens inside the function.
Comparison is still made case insensitively.
@achilleas-k achilleas-k linked an issue Sep 10, 2020 that may be closed by this pull request
Copy link
Contributor

@mpsonntag mpsonntag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR

@mpsonntag mpsonntag merged commit edc4e44 into G-Node:master Sep 14, 2020
@achilleas-k achilleas-k deleted the more-validation branch September 14, 2020 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Try to validate license matching
2 participants