Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put redirects in place for all salmon URIs we used but are changing #120

Open
amoeba opened this issue Nov 19, 2021 · 10 comments
Open

Put redirects in place for all salmon URIs we used but are changing #120

amoeba opened this issue Nov 19, 2021 · 10 comments
Assignees
Labels
SALMO salmon data Open salmon data mobilization project

Comments

@amoeba
Copy link
Collaborator

amoeba commented Nov 19, 2021

We're going to make some tweaks to the term URIs in the salmon ontology but we don't want to break any of the URIs we've already used. We decided to throw in redirects for any deprecated terms in order to avoid the mess in the ontology we'd see if we kept the terms in and deprecated them.

All the URIs are managed under the purl.dataone.org namespace so I'll apply a config change there. I'll probably go with a RewriteMap and use a static list to redirect terms.

@amoeba amoeba self-assigned this Nov 19, 2021
@amoeba amoeba added SALMO salmon data Open salmon data mobilization project labels Nov 19, 2021
@amoeba
Copy link
Collaborator Author

amoeba commented Nov 19, 2021

Here's a full list of the URIs we've annotated with to date:

http://purl.dataone.org/odo/salmon_000127
http://purl.dataone.org/odo/salmon_000128
http://purl.dataone.org/odo/salmon_000129
http://purl.dataone.org/odo/salmon_000130
http://purl.dataone.org/odo/salmon_000131
http://purl.dataone.org/odo/salmon_000132
http://purl.dataone.org/odo/salmon_000133
http://purl.dataone.org/odo/salmon_000142
http://purl.dataone.org/odo/salmon_000186
http://purl.dataone.org/odo/salmon_000187
http://purl.dataone.org/odo/salmon_000188
http://purl.dataone.org/odo/salmon_000189
http://purl.dataone.org/odo/salmon_000200
http://purl.dataone.org/odo/salmon_000216
http://purl.dataone.org/odo/salmon_000235
http://purl.dataone.org/odo/salmon_000239
http://purl.dataone.org/odo/salmon_000240
http://purl.dataone.org/odo/salmon_000241
http://purl.dataone.org/odo/salmon_000242
http://purl.dataone.org/odo/salmon_000243
http://purl.dataone.org/odo/salmon_000480
http://purl.dataone.org/odo/salmon_000481
http://purl.dataone.org/odo/salmon_000492
http://purl.dataone.org/odo/salmon_000493
http://purl.dataone.org/odo/salmon_000504
http://purl.dataone.org/odo/salmon_000520
http://purl.dataone.org/odo/salmon_000525
http://purl.dataone.org/odo/salmon_000527
http://purl.dataone.org/odo/salmon_000529
http://purl.dataone.org/odo/salmon_000569
http://purl.dataone.org/odo/salmon_000570
http://purl.dataone.org/odo/salmon_000582
http://purl.dataone.org/odo/salmon_000630
http://purl.dataone.org/odo/salmon_000642
http://purl.dataone.org/odo/salmon_000647
http://purl.dataone.org/odo/salmon_000659
http://purl.dataone.org/odo/salmon_000663
http://purl.dataone.org/odo/salmon_000665
http://purl.dataone.org/odo/salmon_000666
http://purl.dataone.org/odo/salmon_000667
http://purl.dataone.org/odo/salmon_000668
http://purl.dataone.org/odo/salmon_000669
http://purl.dataone.org/odo/salmon_000670
http://purl.dataone.org/odo/salmon_000671
http://purl.dataone.org/odo/salmon_000680
http://purl.dataone.org/odo/salmon_000681
http://purl.dataone.org/odo/salmon_000691
http://purl.dataone.org/odo/salmon_000692
http://purl.dataone.org/odo/salmon_000693
http://purl.dataone.org/odo/salmon_000694
http://purl.dataone.org/odo/salmon_000695
http://purl.dataone.org/odo/salmon_000696
http://purl.dataone.org/odo/salmon_000697
http://purl.dataone.org/odo/salmon_000698
http://purl.dataone.org/odo/salmon_000699
http://purl.dataone.org/odo/salmon_000700
http://purl.dataone.org/odo/salmon_000701
http://purl.dataone.org/odo/salmon_000705
http://purl.dataone.org/odo/salmon_000709
http://purl.dataone.org/odo/salmon_000710
http://purl.dataone.org/odo/salmon_000711
http://purl.dataone.org/odo/salmon_000712
http://purl.dataone.org/odo/salmon_000713
http://purl.dataone.org/odo/salmon_000718
http://purl.dataone.org/odo/salmon_000719
http://purl.dataone.org/odo/salmon_000720
http://purl.dataone.org/odo/salmon_000721
http://purl.dataone.org/odo/salmon_000727
http://purl.dataone.org/odo/salmon_000728
http://purl.dataone.org/odo/salmon_000729
http://purl.dataone.org/odo/salmon_000754
http://purl.dataone.org/odo/salmon_000755
http://purl.dataone.org/odo/salmon_000777
http://purl.dataone.org/odo/salmon_000780
http://purl.dataone.org/odo/salmon_000782
http://purl.dataone.org/odo/salmon_000783
http://purl.dataone.org/odo/salmon_000785

@amoeba
Copy link
Collaborator Author

amoeba commented Apr 20, 2022

Of the above IRIs, all but four could be easily matched with the corresponding 8-wide variant (salmon_000127 -> salmon_00000127). My process was to compare the labels in all annotations we've issued with the label in the ontology. I'm not done and will figure out what we need to do with the rest of these tomorrow:

I think these two terms got dropped by accident. They aren't in the ontology but we have a term of "Commercial fishery harvest count"

Screen Shot 2022-04-19 at 4 01 37 PM

  • http://purl.dataone.org/odo/salmon_000783 labeled "Subsistence fishery harvest count"
  • http://purl.dataone.org/odo/salmon_000785 labeled "Sport fishery harvest count"

@amoeba
Copy link
Collaborator Author

amoeba commented May 5, 2022

  • http://purl.dataone.org/odo/salmon_000713: Every time we annotated with http://purl.dataone.org/odo/salmon_000713, the valueLabel was "Age class 2.2 recruits" when the attribute was named "R2.5" with a definition of "number of age 2.5 recruits". So this was clearly a mixup somewhere in the source spreadsheets. The annotations should be to http://purl.dataone.org/odo/salmon_00000713 (Age class 2.5 recruits) and we should follow with @jeanetteclark about making updates to the EML. I'm not going to issue a sameAs here.

The last two issues that remain are the lost terms mentioned above,

I'm going to follow up with @mpsaloha about these.

@amoeba
Copy link
Collaborator Author

amoeba commented May 5, 2022

@mpsaloha emailed this:

http://purl.dataone.org/odo/salmon_000647 labeled "Fish stock name"
http://purl.dataone.org/odo/salmon_00000674 is the individual for Rainbow trout, so I think the number might have been transposed in the Google Sheet.
Resolution: sameas salmon_000647 to salmon_00000674, leave salmon_00000647 untouched

I wasn't quite clear on the resolution you suggested for the "coastal rainbow trout" issue. GUID for rainbow trout in the "latest" is 00000647, not 00000674 as I think you are suggesting above. However your proposed solution sounds correct.

You also stated:

The last two issues that remain are the lost terms mentioned above,

http://purl.dataone.org/odo/salmon_000783 labeled "Subsistence fishery harvest count"
http://purl.dataone.org/odo/salmon_000785 labeled "Sport fishery harvest count"
I'm going to follow up with @mpsaloha about these.

These may have been added by Sam in that brief period where Sam and I were out-of-sync. I did try to "sync" up with Sam's version when we realized we were both adding items, so I might have missed adding those two entries then.

I think you can safely add these to the Ontology as subclasses under salmon:00000491 "Salmon harvest count", as the 783 and 785 GUIDs are not yet taken. (Obviously tempting to axiomatize these further...)

Thanks for finding these issues!

cheers,
Mark

@amoeba
Copy link
Collaborator Author

amoeba commented May 6, 2022

Thanks @mpsaloha, my mistake there. Let me try that again...

We annotated with SALMON:000647 in 5 EML docs, and each time the valueLabel was "Fish stock name" and the EML attribute associated with the annotation was about "Fish Stock" or similar. If I pad that IRI to SALMON:00000647, that's the individual for rainbow trout. We have term in the ontology, "Fish stock name" SALMON:00000674 so I'm just assuming there was a transposition in the spreadsheet. So I'm going to sameAs SALMON:000647 with SALMON:00000674.

I'll add those two subclasses. Thanks for the look-over.

@amoeba
Copy link
Collaborator Author

amoeba commented Jun 11, 2022

Okay, so for all of the 77 terms we've already annotated with, I've got a matching term in the ontology. I started in on tweaking the ontology to put in those mappings, ran into some trouble, and could use some advice.

I first tried putting in an owl:equivalentClass to do the mapping and Protégé gives me this:

Screen Shot 2022-06-10 at 5 25 06 PM

The first "Post-orbit to fork of tail length" is the term in the ontology and the salmon_000129 is the term we annotated with that I mapped to it. If we do this for the 75 mappings we need to do, we're going to be adding a lot of noise to the ontology, or at least the Protégé view of the ontology. Beyond the noise, another downside is that BioPortal probably won't return a definition for the old term so things like the popovers in MetacatUI wouldn't work as well as they could. I'm not sure if semantic search would work completely, but it would at least work partially.

It makes sense to me that asserting owl:equivalentClass from A to B implies both A and B are classes so we're effectively creating a class with no information about it when we do this.

This got me thinking about maybe just deprecating the old term so I tried the OBO Foundry method for term deprecation and you get this view:

Screen Shot 2022-06-10 at 5 32 25 PM

This adds the same amount of noise to the view but it is more clear what's going on because the deprecated term has annotation properties, axioms, etc. However, we did talk about whether we consider this change deprecation and I think the consensus was that it wasn't.

All that said, I don't really like either because we end up with a messy ontology and we haven't even released the first version yet.

There are some alternatives though:

  1. Just put in redirects under purl.dataone.org and be done. Semantic search wouldn't work at all for datasets with the bad annotations, the annotation popovers on dataset landing pages wouldn't have definitions but clicking on the IRIs would redirect to the right spot in BioPortal or wherever.
  2. Not doing mappings and just fix all ~55 datasets that have bad annotations. We'd still put in redirects just in case someone got an old EML doc but the latest versions would match the ontology.
  3. Revert the IRI changes that got us into this mess and have a mix of 6-wide and 8-wide IRIs in the ontology. I kinda like this idea.

All are welcome to comment. Either way I'd like us to spend some time on next week's Salmantics call coming to a decision.

amoeba added a commit that referenced this issue Jun 11, 2022
Ref: #120

These got removed at some point so this is just me adding them back using a similar definition and axiomitization.
@amoeba
Copy link
Collaborator Author

amoeba commented Jul 15, 2022

We talked about this in this week's Salmantics call and came up with a solution which I'll outline below. I need to follow up with @mpsaloha to make sure we're all in agreement.

  1. Use an alignment ontology to align all of our deprecated terms to active terms in the ontology
    a. Add all deprecated terms to the alignment ontology, asserting owl:equivalentClass and owl:sameAs as appropriate, pointing to correct terms
    b. Add labels in the alignment ontology so classes/individuals don't show up as a opaque IRIs
    c. Add an rdfs:comment to each term (and perhaps other terms) to indicate it's deprecated
    d. Add owl:deprecated true to each deprecated term
  2. Put in redirects at the purl.dataone.org web server to automatically send users resolving deprecated terms to the correct location
  3. Plan to load both the main ontology and the alignment ontology in the DataONE and Metacat indexing components so query expansion works
  4. Plan to update all affected datasets with the correct IRIs

I'm planning on having most of this done by next Thursday.

amoeba added a commit that referenced this issue Jul 16, 2022
Ref #120

This isn't yet done but it's close.
@amoeba
Copy link
Collaborator Author

amoeba commented Jul 16, 2022

I made good progress on a script to generate the alignment from the deprecated terms list in 3dd08fc. I'll pick this up next week.

@amoeba
Copy link
Collaborator Author

amoeba commented Jul 23, 2022

I PR'd a merge to develop of the salmon ontologies in #124 and they're ready for another set of eyes. 2-4 above are yet undone but I'm working on them next.

@mpsaloha
Copy link
Collaborator

mpsaloha commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SALMO salmon data Open salmon data mobilization project
Projects
None yet
Development

No branches or pull requests

2 participants