Skip to content

Conversation

@cthoyt
Copy link
Collaborator

@cthoyt cthoyt commented Jun 17, 2022

Closes #1967

What: This PR crawls the git history from this repo to figure out when each ontology metadata file was created, then annotates it into the frontmatter of the ontology metadata. This is applicable to inactive, active, orphaned, and obsolete ontologies. This information is purely for technical purposes, and not meant to be displayed on the website. This PR also adds the corresponding field to the metadata JSON schema for validation purposes.

Why: In order to exert more strict standards on new ontologies, it makes sense to have a way to not have to retroactively apply them to old ontologies (which might not be able to update in a timely manner before imposing new standards). Therefore, all new OBO Foundry standards can be optionally tagged with the date when they go active, and ontologies added before that date don't necessarily have to conform.

How

Note, for some reason, this script failed to add the metadata for a few ontologies, which I then did manually:

ERROR	AISM added: 'added' is a required property
ERROR	APOLLO_SV added: 'added' is a required property
ERROR	EPIO added: 'added' is a required property
ERROR	FIDEO added: 'added' is a required property
ERROR	NCIT added: 'added' is a required property
ERROR	OMO added: 'added' is a required property
ERROR	OOSTT added: 'added' is a required property
ERROR	XLMOD added: 'added' is a required property

cthoyt added 3 commits June 17, 2022 17:11
Note, the script failed to add a few that I had to go and do manually:

ERROR	AISM added: 'added' is a required property
ERROR	APOLLO_SV added: 'added' is a required property
ERROR	EPIO added: 'added' is a required property
ERROR	FIDEO added: 'added' is a required property
ERROR	NCIT added: 'added' is a required property
ERROR	OMO added: 'added' is a required property
ERROR	OOSTT added: 'added' is a required property
ERROR	XLMOD added: 'added' is a required property
@cthoyt cthoyt marked this pull request as ready for review June 17, 2022 15:21
@cthoyt cthoyt added the ontology metadata Issues related to ontology metadata label Jun 17, 2022
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely nuts and I love this PR. Big approve.

I will circle it to operations now via email, but I don't see how there should be an issue.

@matentzn
Copy link
Contributor

Tentative merge date: Friday 24th June.

Copy link
Contributor

@cmungall cmungall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these ones with 2015-07-28 are wrong/misleading (also some ones marked a month or so after that). That is the date when I created the first version of the registry in github which is purely a technical detail. Many of the ontologies were part of the first OBO in 2003.

Let's leave any field we don't know the answer for blank. If anyone has the inclination to do the archaeology they can do it in a separate PR

ontology/peco.md Outdated
repository: https://github.com/Planteome/plant-experimental-conditions-ontology
preferredPrefix: PECO
depicted_by: http://planteome.org/sites/default/files/garland_logo.PNG
added: 2017-06-05
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't correct. This was a prefix move from OE. PECO/EO have been in OBO for over a decade

Copy link
Collaborator Author

@cthoyt cthoyt Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well let's say that this is indeed the first date that the peco.md file was committed to this git repository. I think we can save quite a bit of time (and still accomplish the original goal of adding it) if we change the communication about this field rather than the content itself

@cmungall
Copy link
Contributor

If anyone wants to do some archaeology, I am attaching @selewis's email with all the ontologies in the original OBO, sent March 2003.

gobo.txt

Or if anyone can figure out how to get CVS archives from sourceforge we could mine that too.

@cthoyt
Copy link
Collaborator Author

cthoyt commented Jun 21, 2022

If anyone wants to do some archaeology, I am attaching @selewis's email with all the ontologies in the original OBO, sent March 2003.

gobo.txt

Or if anyone can figure out how to get CVS archives from sourceforge we could mine that too.

I looked at this for 60 seconds and I was properly scared

@matentzn
Copy link
Contributor

To make it even more scary, you could https://web.archive.org/web/*/obofoundry.org

@cthoyt cthoyt changed the title Add added field to ontologies Add github_date_added field to ontology metadata Jun 22, 2022
@cthoyt
Copy link
Collaborator Author

cthoyt commented Jun 22, 2022

From my understanding, the two requests for changes were either to

  1. Manually figure out the dates that the pre-GitHub OBO Foundry ontologies were added/accepted into OBO as well as resolve all file renames to the added date of their original names
  2. Leave the added field either blank or some sentinel value that isn't a date for ontologies added on 2015-07-28 (the date of creation of this GitHub repo)

The problem with Option 1 is that the file Chris attached in #1969 (comment) is very difficult to read through and after trying unsuccessful to download the OBO Sourceforge with SVN, I determined that this option was too much effort. The problem with Option 2 is that making this field optional means that there is no way to test its integrity - new ontologies would simply omit the field, then we would not be any further than before.

What I explicitly want to capture is the date on which the file was created on GitHub. This is a good enough proxy for date added to OBO Foundry for the purposes of enabling us to apply potential new metadata standards based on the date the ontology is added, i.e., exert stricter standards on new ontologies while not forcing old ones to make updates.

Rather than making a technical solution based on options 1 and 2, I opted to update the way this improvement is communicated. In 9ee8179, I renamed the field added to github_date_added and further updated the entry in the metadata schema to explain that this isn't the same thing as date added to the OBO Foundry. Therefore, this field explicitly reflects when the file was added to GitHub, and not necessarily the date added to OBO Foundry. This is good enough for what I want to be able to accomplish, and is canonically correct.

Nico mentioned that this could be calculated on-the-fly using the same git command that I encoded in https://github.com/cthoyt/OBOFoundry.github.io/blob/7fdface2c60757ee680f63264adb35aaff980df5/util/add_dates.py#L20, but this would only work if the data from the repository is in a git context (i.e., what if we want to consume the data directly, what if it gets put into a python package..), so I think that having this explicit is still important.

@cthoyt cthoyt requested review from cmungall and matentzn June 22, 2022 09:21
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this, but lets make sure we get at least one or two others to chime in.

Remember all: this information is purely for technical purposes, not for display on the website.

@hoganwr
Copy link
Contributor

hoganwr commented Jun 22, 2022 via email

@ddooley
Copy link
Contributor

ddooley commented Jul 12, 2022

Damion: Being part of OBO involves some work over the years. I like an annotation that indicates when their ontology was added to the OBO Library.
Use empty value OR bogus start date - 1000-01-01 (whatever is best for development. Will make an issue vote.

Allowance for ontologies that can't use GitHub (But Charlie's aim: when was metadata record to OBO Library).

@matentzn matentzn marked this pull request as draft August 19, 2022 10:06
@matentzn
Copy link
Contributor

Want to salvage anything from this PR @cthoyt ?

@cthoyt
Copy link
Collaborator Author

cthoyt commented Nov 15, 2022

Want to salvage anything from this PR @cthoyt ?

Despite getting #2146 merged in, the only way forward is to have a fully complete field across all ontologies that says when they were added so we can progressively add more strict standards for newer ontologies, so the idea in this PR isn't done yet

@cthoyt
Copy link
Collaborator Author

cthoyt commented Jan 29, 2023

as of #2277, there's a more simple way of adding new checks that only apply to new ontologies going forward, so I'm abandoning this PR.

@cthoyt cthoyt closed this Jan 29, 2023
@cthoyt cthoyt deleted the add-dates branch January 29, 2023 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ontology metadata Issues related to ontology metadata

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add "date added" field to ontologies

5 participants