Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EWG action: review auto-validation for Principle #4 Versioning #2223

Open
nataled opened this issue Dec 6, 2022 · 9 comments
Open

EWG action: review auto-validation for Principle #4 Versioning #2223

nataled opened this issue Dec 6, 2022 · 9 comments
Assignees
Labels
attn: Editorial WG Issues pertinent to editorial activities, such as ontology reviews and principles attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools automated validation of principles Issues for the editorial WG pertinent to the automating the validation of the Principles.

Comments

@nataled
Copy link
Contributor

nataled commented Dec 6, 2022

No description provided.

@nataled nataled added the attn: Editorial WG Issues pertinent to editorial activities, such as ontology reviews and principles label Dec 6, 2022
@nataled nataled self-assigned this Dec 6, 2022
@cstoeckert
Copy link
Contributor

P4 - automated check needs to account for both methods for version IRIs. Also needs to do at least some spot checks for resolvability.

Req: (1) Unique version IRI that (2) resolves to correct file and (3) has format -date- or -semantic-. If date, must be yyyy-mm-dd format.
Test: (1) Stated version IRI (as opposed to unique) but (2) no resolve test done and (3) only allows date format to fully pass.
Changes needed: Is version IRI unique? (probably this can be tested just by round-tripping back to downloaded file, which will also test #2). Must allow semantic type of format (but note that there’s no consistency to how this is done in the current set of ontologies; this basically means we allow everything!).

@nlharris nlharris added the automated validation of principles Issues for the editorial WG pertinent to the automating the validation of the Principles. label Dec 6, 2022
@nataled
Copy link
Contributor Author

nataled commented Dec 20, 2022

Related issue: #1016

@nataled
Copy link
Contributor Author

nataled commented Jan 17, 2023

From Principle:

  1. The released ontology MUST have a version IRI.
  2. Version IRI MUST be unique to the stated version.
  3. The ontology SHOULD have an owl:versionInfo statement
  4. Version IRI must be unique to the stated version.
  5. The version IRI MUST use a date format (NS/YYYY-MM-DD/ontology.owl) OR use a semantic versioning format (e.g., NS/NN.n/ontology.owl).
  6. The version IRI MUST resolve to an ontology artifact that is associated with the same version identifier as used in the version IRI.
  7. Regardless of the versioning system used for the artifact, PURLs pointing to the version MUST be date format.

From Automated check:

  1. The released ontology must have a version IRI. CHECK IS OKAY
  2. The version IRI should follow a dated format (NS/YYYY-MM-DD/ontology.owl) CHECK IS PARTIAL in that there's no allowance for semantic versions (but see below for question).

NEEDED:

  • Many missing parts as indicated above. However, a few implementation issues are unclear. For example, how would one go about using a semantic versioning system for the artifact but use a date system for the PURL? In other words, how would this be stated using OBO and OWL? Also, as written, it isn't super clear that there is a difference between the version IRI and the PURL.
  • In a previous comment it's stated "there’s no consistency to how [semantic versioning] is done in the current set of ontologies". Would be useful to survey these to see if there's something concrete that can be said in terms of format.

@nataled
Copy link
Contributor Author

nataled commented Jan 17, 2023

EWG discussion:

  • In what way could a version IRI be non-unique? Possibly the stated v is the same as a previous release.
  • How could we know if the version IRI wasn't unique? Resolving is a partial check since the resolved file could be correct, but the error is in a different file.

@nataled
Copy link
Contributor Author

nataled commented Feb 14, 2023

Account of semantic version types to see if there is any consistent elements usable for automated checks. These were found by reviewing the Dashboard specifically for Warnings about versioning or Fails when the version doesn't resolve. Warnings for ontologies that don't have recommended format are not included below if the version system uses dates but somehow doesn't use the right syntax.

chebi 220
gsso 2.0.5
mmo 2.39
mod 1.031.6
ms 4.1.108
pr 67.0
pw 7.52
rs 6.107
spd 1.0
swo 1.7
xco 4.46

In all cases only digits and periods are used. Note that the Maintenance principle can use a diff for this field to determine if changes were made. Unclear at the moment how to calculate the age of the last release without relying on the versionIRI in date format.

@nataled
Copy link
Contributor Author

nataled commented Jun 6, 2023

To TWG: The above is the result of review of the P4 dashboard check vs principle. General recommendations are given, but we need to create specific issues/tasks with your input.

@nataled nataled added the attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools label Jun 6, 2023
@matentzn
Copy link
Contributor

I am happy to help with this issue, but I am not sure what aspects you would like comments on. Can you create a comment that has a list of bullet points with everything you would like comments on?

Here is one thing I personally think is a tiiiny bit too restrictive:

Regardless of the versioning system used for the artifact, PURLs pointing to the version MUST be date format.

I don't see any real reason for this restriction, and I feel this puts too much of a burden on groups using semantic versioning.

What else do you need input on?

@nataled
Copy link
Contributor Author

nataled commented Jun 15, 2023

@matentzn agree about the restrictiveness, and this actually is no longer applicable so that's something that was going to change anyway.

@matentzn Below is a list of requirements and recommendations from the Principle; bold indicates whether or not the dashboard has the proper checks:

  • The released ontology MUST have a version IRI. GOOD: AUTOMATED CHECK DOES THIS
  • Version IRI MUST be unique to the stated version. BAD: AUTOMATED CHECK DOES NOT TEST THIS
  • The ontology SHOULD have an owl:versionInfo statement. BAD: AUTOMATED CHECK DOES NOT TEST THIS
  • Version IRI must be unique to the stated version. BAD: AUTOMATED CHECK DOES NOT TEST THIS
  • The version IRI MUST use a date format (NS/YYYY-MM-DD/ontology.owl) OR use a semantic versioning format (e.g., NS/NN.n/ontology.owl). BAD: AUTOMATED CHECK FALSELY REPORTS ERROR FOR SEMANTIC VERSIONING Note that all current semantic versioning makes use solely of numbers and periods, nothing else.
  • The version IRI MUST resolve to an ontology artifact that is associated with the same version identifier as used in the version IRI. BAD: AUTOMATED CHECK DOES NOT TEST THIS

Questions:

  • Is the version IRI and the versioned PURL meant to be the same string? For example, for PRO the artifact for the latest release is https://proconsortium.org/download/release_68.0/pro_reasoned.owl but the PURL is http://purl.obolibrary.org/obo/pr/68.0/pr.owl (we state in the OWL file that the PURL is the version IRI; is that correct?).
  • In what way could a version IRI be non-unique? Perhaps it could happen if, for example, the the version IRI for the artifact that corresponds to version 2023-06-15 states that the version is 2023-04-23, which is a real already-existing release? How could we know this?

In general, each issue needs its own tracker number and, ideally, at least one suggestion of how to implement the check. Other than the questions, this is what we need help with.

@matentzn
Copy link
Contributor

Excellent @nataled, thanks for breaking that down for me. Three more comments before I get to your questions:

Version IRI MUST be unique to the stated version. BAD: AUTOMATED CHECK DOES NOT TEST THIS

Can you explain what the phrase "unique to the stated version" means? That there are no two files with the same version IRI? If so, this is not really automatically testable.

The ontology SHOULD have an owl:versionInfo statement. BAD: AUTOMATED CHECK DOES NOT TEST THIS

Definitely request that here: https://github.com/OBOFoundry/OBO-Dashboard/issues and we will explain how we will deal with it!

The version IRI MUST use a date format (NS/YYYY-MM-DD/ontology.owl) OR use a semantic versioning format (e.g., NS/NN.n/ontology.owl). BAD: AUTOMATED CHECK FALSELY REPORTS ERROR FOR SEMANTIC VERSIONING Note that all current semantic versioning makes use solely of numbers and periods, nothing else.

YES definitely request this check at https://github.com/OBOFoundry/OBO-Dashboard/issues and we will deal with it, and explain how it will be implemented.

The version IRI MUST resolve to an ontology artifact that is associated with the same version identifier as used in the version IRI. BAD: AUTOMATED CHECK DOES NOT TEST THIS

This is a great test to have, its a bit tricky to implement, but doable - it could be a tad expensive though, because it means we need to download every ontology twice. Make an issue and we will discuss it internally.

Is the version IRI and the versioned PURL meant to be the same string? For example, for PRO the artifact for the latest release is https://proconsortium.org/download/release_68.0/pro_reasoned.owl but the PURL is http://purl.obolibrary.org/obo/pr/68.0/pr.owl (we state in the OWL file that the PURL is the version IRI; is that correct?).

They are meant to be the same string. I have heard this confusion often though, the principle would benefit from a sentence clarifying that the Version IRI is supposed to be a versioned PURL. Great question!

In what way could a version IRI be non-unique? Perhaps it could happen if, for example, the the version IRI for the artifact that corresponds to version 2023-06-15 states that the version is 2023-04-23, which is a real already-existing release? How could we know this?

I stumbled across this formulation as well. I think this is too difficult to test, at least the way that I understand it. There could be two issues on uniqueness I can think off:

  1. More than two physical files are associated with the same version IRI (at least one of them will violate the resolution rule, ie. that their versionIRI must resolve to itself)
  2. The same file (URL location) has the same version IRI in different commits, which means it is theoretically possible that it was downloaded at different timepoints looking different (ask me again if you don't understand this, its important I think)

To guard against both, you can use a formulation like: "the versioned PURL should point to a single file, at a single state - i.e. there should not exist two files, or the same file at different states, with the same versioned PURL" (I can see its hard to understand this phrase, maybe you can phrase it better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attn: Editorial WG Issues pertinent to editorial activities, such as ontology reviews and principles attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools automated validation of principles Issues for the editorial WG pertinent to the automating the validation of the Principles.
Projects
None yet
Development

No branches or pull requests

4 participants