Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create and document selection criteria for inclusion in the ontology #7

Open
mellybelly opened this issue Feb 27, 2019 · 9 comments
Open
Assignees

Comments

@mellybelly
Copy link
Collaborator

We should have a modular strategy for inclusion of ontology content. What we don't want is a random smattering of content from numerous resources as these will not have good semantic interoperability. We want to be able to do analytics on these data, not just link them.

My understanding is that the goal here is purely as an application ontology for our discovery tools. As such we should not need any new content.

Some considerations:
For library related document types we should use whatever is standard in the LinkedData4Libraries project or in MARC. @eichmann please inform this.

There could be a priority order - such as VIVO-ISF>OBI>NCIT etc. I would avoid one-off avoid one-off content from domain specific sources, such as a flybase controlled vocabulary.

We should choose content that is resolvable and has definitions, synonyms, and is updated regularly.

Lets also think about the modules for import purposes - we can't maintain things very well if there is no modularity, and its also more useful to others if its modular.

@marijane
Copy link
Member

marijane commented Feb 27, 2019

  • I think the Outputs & Activities Mapping spreadsheet might be a good place to prioritize which ontologies we want to borrow from: https://docs.google.com/spreadsheets/d/1Mw8gK2NUGM8po7GGRtJTShRM2QNFoR19VJrjQuVCDW8/edit#gid=1412211690
  • We're using ROBOT to create most of the extraction modules. You can see them in the src/ontology/imports folder. They've all been generated from downloadable OWL files for the source ontologies, so they are resolvable, but I do think some of them may be missing definitions. There are a couple ontologies from which we are only taking a single term, we have also created modules for them, but manually rather than with ROBOT.
  • We need to make some decisions about what kind of extraction we want. The initial extractions brought in a bunch of extra content, basically everything mentioned in class annotations, I think.
  • There are some output types for which we have not been able to find existing ontology terms, so those have been created as new classes.

@nicolevasilevsky
Copy link
Collaborator

From my discussion with Melissa, it sounds like we want to try to only extract terms from a limited number of ontologies, and if we cannot find existing terms, we should make new term requests to the ontologies we will use

@nicolevasilevsky
Copy link
Collaborator

nicolevasilevsky commented Feb 27, 2019

I created a new tab in the spreadsheet- see NISO output-ROOmapping2019-02-27

I am making note of all the ontologies we've used so far, and noting where we need new term requests (marked as NTR)

@marijane we can discuss further when we meet today at 4pm

and the rest of us can discuss when we meet tomorrow.

@nicolevasilevsky
Copy link
Collaborator

nicolevasilevsky commented Feb 27, 2019

In this spreadsheet, you can see we are currently using the following ontologies in ROO:

Ontology Number of classes in ROO currently
Bibo 6
BRO 1
CLO 1
Edam 1
Fabio 1
IAO 4
MeSH 1
NCIT 55
OBI 3
OMIT 17
SIO 3
VIVO-ISF 32

There are 35 terms that didn't map to existing terms, that we added as new classes and have ROO IDs, which I guess can be considered placeholders, and we'll need to request new terms to our preferred ontologies, once we determine what those are.

Questions/Blockers

  1. @mellybelly and all - what ontologies do we want to remove from the list above (and re-map those terms)
  2. What other ontologies/vocabularies should we try to use?
    2a. What is the priority for each ontology, ie VIVO-ISF > NCIt, etc.?
  3. Can someone review the mappings (when they are redone) and confirm they are happy with these mappings from the ontologies we do decide to use?

@nicolevasilevsky
Copy link
Collaborator

@tricfran can we discuss this ticket on our call tomorrow?

@mellybelly
Copy link
Collaborator Author

Can we first hear from some of the linkedData4libraries folks @eichmann @marijane or folks at NWU about what is used most for bibliographic info?

Will review spreadsheet in meantime.
@marijane don't we want the class annotations?

@nicolevasilevsky
Copy link
Collaborator

thanks for reviewing the spreadsheet @mellybelly

@marijane
Copy link
Member

@eichmann can you post some links to the LD4L stuff you showed us in the call today?

@mellybelly we want the labels/comments/etc, but ROBOT appears to have pulled in all of the logical definitions as well, and everything referenced in them. There are whole trees of stuff that got pulled in, I'm not sure we want that? You can see what I mean if you open the OWL file in Protege.

@mellybelly
Copy link
Collaborator Author

Ok i made notes in the document. I think we should prioritize bibliographic resources, resources with synonyms/text definitions, and those where we'd use more than one term.

We also need some basic requirements analysis for what this application ontology structure should be. There is no benefit of having an ontology if there is no classification, else its just a tag library (which might be fine also!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants