Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dupe ids and URI issue #10

Closed
janelomax opened this issue Oct 1, 2019 · 19 comments
Closed

Dupe ids and URI issue #10

janelomax opened this issue Oct 1, 2019 · 19 comments
Assignees
Milestone

Comments

@janelomax
Copy link

Hi Allyson - hope you are well, long time no see!

One of our pharma customers is using SWO and spotted a dupe id: SWO_0000075

In OLS they have slightly different URIs:

http://www.ebi.ac.uk/efo/swo/SWO_0000075
http://www.ebi.ac.uk/swo/SWO_0000075

(the http://www.ebi.ac.uk/efo/swo/ URIs don't resolve anywhere)

thanks

Jane

@allysonlister allysonlister self-assigned this Oct 1, 2019
@allysonlister allysonlister added this to the v1.7 milestone Oct 1, 2019
@allysonlister
Copy link
Owner

Thanks Jane - lovely to hear from you! I've added this to the current milestone for our imminent next release, so this is perfect timing. There are some big changes coming up, mostly in how we import EDAM and the general tidiness of the ontology.

Now, at a very quick count there are 712 classes that have the "http://www.ebi.ac.uk/efo/swo" prefix and 554 that have the standard "http://www.ebi.ac.uk/swo" prefix. I'm not sure why the efo prefix came in... @jamesmalone do you remember anything reason from SWO history for these?

I'm happy to reconcile and remove the http://www.ebi.ac.uk/efo/swo prefix and replace with http://www.ebi.ac.uk/swo but as you've pointed out Jane, this will require checking for duplicates and therefore needs to be done carefully.

If James also agrees that we should clean these IRIs, I'll try to do it this week one evening as I'd like to get the 1.7 release out very soon.

Jane - if you or your pharma customers would like a peak at the new SWO, your thoughts would be appreciated. It isn't published as a release, but the current working copy of the pre-release file can always be found at https://github.com/allysonlister/swo/blob/master/dev/ontology/swo-merged.owl

Thanks! Ally :)

@drmarkreuter
Copy link

Hi Ally, indeed, the URIs http://www.ebi.ac.uk/efo/swo/SWO_nnnnnnn don't resolve. It's frustrating because some of the software URIs are fine (http://www.ebi.ac.uk/swo/SWO_0000015 for Excel 2002), but most of dead. Happy to shoot over a list of URI I've tested and their status codes, if that's helpful. Many thanks, Mark.

@allysonlister
Copy link
Owner

allysonlister commented Oct 1, 2019

I've created a list of efo URIs from SWO (see attached). Happy to have your list, and compare them if that helps? Please note I'm working from the upcoming release file, which is currently available at https://github.com/allysonlister/swo/blob/master/dev/ontology/swo-merged.owl

Here's the list of 661 URIs that begin with http://www.ebi.ac.uk/efo/swo:
efo-swo-1.txt

For each of these I'd need to:

  1. check for a pre-existing "http://www.ebi.ac.uk/swo" IRI that matches it. If present, I'd need to generate a new SWO_ class number. Otherwise, just update the IRI and retain the class number.
  2. Transfer any annotation/axioms to the "new" class with the proper IRI.
  3. Deprecate the old "efo-swo" IRI, using appropriate replaced-by and owl deprecation flags.

@drmarkreuter
Copy link

Thanks! Here's the results of my testing this morning.
SWOtesting1_20191001.zip

@drmarkreuter
Copy link

SWOtest2_20191001.zip
I quickly tested the list of 661 URIs. All 404 :-(

@allysonlister
Copy link
Owner

That's OK - it makes sense that they're 404 - I expect there was a batch creation of SWO classes some time ago, and the "efo" URI was used instead of the standard one. I'd just like confirmation from James before I make any such large changes to the IRIs. Thanks!

@drmarkreuter
Copy link

oo, since you're working on SWO, could you add some new terms? URIs for BCBio and ShinyNGS would be good.

This was referenced Oct 1, 2019
@allysonlister
Copy link
Owner

@drmarkreuter thanks for the suggestion - you'll find I've asked you to provide some basic info on each at #11 and #12 :-)

@allysonlister
Copy link
Owner

I've spoken with @TheOntologist @jamesmalone and Helen Parkinson. James' thought is that, because SWO was created under the auspices of EFO, some of the IRIs have that structure. However, as they don't resolve, I'm going to go ahead and make the changes. This will be a multi-step process, and the code used to generate the changes together with mappings will be stored in its own directory in the dev/ folder.

Broadly, here's what will happen:

  1. Refactoring of all "efo" IRIs as per the mapping file (formatted for ROBOT rename) at https://github.com/allysonlister/swo/blob/master/dev/IriRefactor/refactor-efo-swo-mappings.csv . This step will transfer all of the axioms / annotations etc to their new IRIs. There are two types of mappings in this csv file. The first 115 mappings will have new SWO IRIs as otherwise the ID used would clash with an existing ID in the "swo" namespace (either active or obsolete). The rest will not clash, and therefore will be able to retain their previous ID.

  2. Adding back all the original IRIs as obsolete terms with appropriate deprecated annotation. The list of 638 obsolete IRIs is in https://github.com/allysonlister/swo/blob/master/dev/IriRefactor/efo-swo-1.txt and the sparql that will be used to insert the annotation is at https://github.com/allysonlister/swo/blob/master/dev/IriRefactor/deprecation-annotation.ru

The file for step 1 is ready, and I'm just working on step 2 now. I'll let you know when I've made the change.

@allysonlister
Copy link
Owner

After Step 1 above, there remained 14 clashes in the SWO namespace:

In all cases, the updated IRI as suggested above was manually added to https://github.com/allysonlister/swo/blob/master/dev/IriRefactor/refactor-efo-swo-mappings.csv and the mappings were re-run. Equally, we sorted out #18 #16 and #17 in this mapping file.

@allysonlister
Copy link
Owner

This is closed now, but I would really appreciate @janelomax and @drmarkreuter checking the resulting SWO pre-release file. As you requested the IRI change, please can you download https://github.com/allysonlister/swo/blob/master/dev/ontology/swo-merged.owl and see if you are happy with these changes? Many IRIs were modified.

If you notice any issues, please reopen this ticket.

Please note that, as this is still a pre-release, the new IRIs will not be resolvable in OLS etc, as they won't be loaded/indexed by those sites until the actual release happens.

Thanks very much!

@janelomax
Copy link
Author

I loaded it into Protege and checked a few of the ids we have been discussing and all looks good to me!

@drmarkreuter
Copy link

checking now. I'm no ontology expert, but I'm spend some time on this this morning. Thanks for adding those terms!

@allysonlister
Copy link
Owner

Thanks guys I appreciate it. It's important to have a few people look at such a big change, I don't want to miss anything. :-)

@drmarkreuter
Copy link

Potential issue with SWO0000199 (GenePix Pro 4.0). robot couldn't convert from owl to obo...
image
However, I can convert to ttl, checking this now...

@drmarkreuter
Copy link

I don't see any issues...??
image

@allysonlister
Copy link
Owner

Thanks @drmarkreuter - I'm not familiar with OBO conversions, but the error message seems to indicate that there is more than one "name" in a class. If "name" in OBO corresponds to rdfs:labels, then I do think that there may be a few cases in SWO where more than one label has crept in over time. However, http://www.ebi.ac.uk/swo/SWO_0000199 is not one of these; it has a single label. Could it be that the obo converter is seeing the old obsolete IRI http://www.ebi.ac.uk/efo/swo/SWO_0000199 as "the same class"?

In any case, the output you're looking at after a ttl conversion looks correct. If you can figure out exactly what the robot converter is upset about in that class, I can look into changing things, but the OWL itself is sound as far as I can tell.

Thanks - I think we're good with the refactoring!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants