Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ontologies aren't being transformed fully because some OWL files contain imports to other OWL files #75

Closed
justaddcoffee opened this issue Oct 1, 2021 · 4 comments · Fixed by #97
Labels
bug Something isn't working enhancement New feature or request

Comments

@justaddcoffee
Copy link
Collaborator

Describe the desired behavior

Some OWL files contain imports to other OWL files, and KGX does not seem to follow these imports. For example, here is the OWL representation of Upheno:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
    <!ENTITY owl "http://www.w3.org/2002/07/owl#" >
    <!ENTITY obo "http://purl.obolibrary.org/obo/" >
    <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" >
    <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" >
    <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
    <!ENTITY oboInOwl "http://www.geneontology.org/formats/oboInOwl#" >
]>


<rdf:RDF xmlns="&obo;x-bfo.owl#"
     xml:base="&obo;x-bfo.owl"
     xmlns:obo="http://purl.obolibrary.org/obo/"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#">
    <owl:Ontology rdf:about="&obo;upheno.owl">
        <owl:imports rdf:resource="&obo;upheno/metazoa.owl"/>
    </owl:Ontology>
</rdf:RDF>

Note this block:

    <owl:Ontology rdf:about="&obo;upheno.owl">
        <owl:imports rdf:resource="&obo;upheno/metazoa.owl"/>
    </owl:Ontology>

which points to upheno/metazoa.owl, where all the good stuff is.

Because of this, kg-obo transforms currently upheno to this JSON, which is not terribly useful:

{
    "nodes": [
        {
            "id": "OBO:upheno.owl",
            "type": "owl:Ontology",
            "category": [
                "biolink:NamedThing"
            ],
            "provided_by": [
                "uphenolm7m33re"
            ]
        },
        {
            "id": "OBO:upheno/metazoa.owl",
            "category": [
                "biolink:NamedThing"
            ],
            "provided_by": [
                "uphenolm7m33re"
            ]
        }
    ],
    "edges": [
        {
            "subject": "OBO:upheno.owl",
            "predicate": "owl:imports",
            "object": "OBO:upheno/metazoa.owl",
            "relation": "owl:imports",
            "knowledge_source": [
                "uphenolm7m33re"
            ]
        }
    ]
}

Additional context

I don't think support for this is critical for our immediate use case that is driving development, i.e. kg-idg.

For now, we should possibly look for imports like this in the XML and abandon the transform with an error if they are present.

Eventually, we will want to parse the XML, find these imports, download these OWL files, and feed these to KGX in addition to the "main" OWL file.

@cmungall @matentzn @caufieldjh

@justaddcoffee justaddcoffee added bug Something isn't working enhancement New feature or request labels Oct 1, 2021
@caufieldjh
Copy link
Collaborator

See #76 for temporary fix

@caufieldjh
Copy link
Collaborator

This relates back to #21 in terms of pre-processing

@justaddcoffee
Copy link
Collaborator Author

justaddcoffee commented Oct 1, 2021

@caufieldjh
Copy link
Collaborator

For reference, as of the last build with the temp fix in #76:

05:35:01  INFO:kg-obo:Successfully transformed 133: ['bfo', 'chebi', 'doid', 'go', 'obi', 'pato', 'pr', 'xao', 'zfa', 'aeo', 'agro', 'aism', 'amphx', 'apo', 'aro', 'bco', 'bspo', 'bto', 'cdno', 'cheminf', 'chmo', 'cio', 'cl', 'clao', 'clo', 'clyh', 'cmo', 'cob', 'ddanat', 'ddpheno', 'dpo', 'dron', 'ecao', 'eco', 'ecocore', 'ecto', 'emapa', 'eupath', 'exo', 'fao', 'fbbt', 'fbcv', 'fbdv', 'fma', 'fovt', 'gecko', 'geno', 'gno', 'hancestro', 'hao', 'hom', 'hsapdv', 'hso', 'htn', 'iao', 'ico', 'ido', 'labo', 'ma', 'mco', 'mi', 'miapa', 'mmo', 'mmusdv', 'mod', 'mondo', 'mop', 'mp', 'mpath', 'mro', 'nbo', 'ncbitaxon', 'ncit', 'ncro', 'oae', 'oarcs', 'obcs', 'obib', 'ogg', 'ogms', 'ohd', 'olatdv', 'omo', 'omp', 'omrse', 'opl', 'opmi', 'ornaseq', 'ovae', 'pco', 'pdro', 'pdumdv', 'plana', 'planp', 'ppo', 'pw', 'rbo', 'rs', 'rxno', 'so', 'spd', 'stato', 'swo', 'symp', 'taxrank', 'trans', 'tto', 'uberon', 'uo', 'vt', 'vto', 'wbbt', 'wbls', 'wbphenotype', 'xco', 'xpo', 'zeco', 'zfs', 'zp', 'hp', 'sbo', 'scdo', 'txpo', 'sibo', 'fix', 'rex', 'ehdaa2', 'upa', 'ero', 'idomal', 'miro', 'tads', 'tgma']
05:35:01  INFO:kg-obo:Failed to transform 60: ['po', 'apollo_sv', 'caro', 'cdao', 'chiro', 'cido', 'cro', 'cteno', 'cto', 'cvdo', 'dideo', 'duo', 'envo', 'fbbi', 'fideo', 'flopo', 'foodon', 'fypo', 'genepio', 'geo', 'iceo', 'ino', 'maxo', 'mf', 'mfmo', 'mfoem', 'mfomd', 'micro', 'mpio', 'ms', 'nomen', 'oba', 'ogsf', 'ohmi', 'ohpi', 'omit', 'one', 'ons', 'ontoneo', 'oostt', 'peco', 'phipo', 'poro', 'psdo', 'pso', 'ro', 'sepio', 'to', 'upheno', 'vo', 'xlmod', 'fobi', 'gsso', 'kisao', 'mamo', 'vario', 'ogi', 'ceph', 'gaz', 'rnao']

The previous build had 30 failed transforms, so an additional 30 contain import statements.

@caufieldjh caufieldjh linked a pull request Oct 13, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants