New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a INOH Cleaner #218

Closed
IgorRodchenkov opened this Issue Jun 29, 2015 · 5 comments

Comments

Projects
None yet
1 participant
@IgorRodchenkov
Member

IgorRodchenkov commented Jun 29, 2015

http://inoh.hgc.jp (www.inoh.org is not properly resolved anymore - domain name has been probably lost...)

INOH BioPAX L3 looks good in general (there are warnings reported by the biopax-validator); let's fix the following in the cleaner:

  • looks, xrefs, specifically UniProt unification xrefs, are added there to physical entities (PEs, e.g., Protein) and missing from corresponding entity references (ERs, e.g., ProteinReferences); let's move UnificationXrefs to the ERs or convert to RelationshipXref as follows:
    • if PE's xref.db="INOH" -> keep as is, won't convert/copy;
    • if PE's xref.db="UniProt" -> move to the ProteinReference if exists (normally does);
    • convert the rest of UnificationXrefs to RelationshipXrefs and keep where they belong;
  • (invalid.id.format) optionally fix xref.id values for GO, MI, SO - add the prefix to the id, e.g., "12345" -> "GO:12345" (by-the-bye, the validator/normalizer does also auto-fix such cases);
  • if a CV, such as InteractionVocabulary, has a term that starts with one of the CV xrefs' xref.id followed by ":", - then remove the 'id:' from that term (e.g., "IEV_0000183:Transcription" must become "Transcription")
  • finally, convert shared unification xrefs to relationship ones (will affect most of 'INOH' xrefs).
  • shall we change xref.db names like "http://www.inoh.org (MolecularRoleOntology)" to "IMR" (http://www.ontobee.org/browser/index.php?o=IMR)? (this ontology/db is not in MIRIAM, nor obofoundry.org, but can be seen at http://www.berkeleybop.org/ontologies/imr.owl, http://inoh.hgc.jp/ontologies/MoleculeRoleOntology.obo, and the OWL URIs are still resolved (e.g., http://purl.obolibrary.org/obo/IMR_0702625).
  • also, shall we change xref.db names like "http://www.inoh.org (EventOntology)" - that could be called "IEV"(http://www.ontobee.org/browser/index.php?o=IEV, http://www.berkeleybop.org/ontologies/iev.owl, http://inoh.hgc.jp/ontologies/EventOntology.obo) ontology (IEV_1234567 ids)...
@IgorRodchenkov

This comment has been minimized.

Member

IgorRodchenkov commented Jul 7, 2015

Also discovered that INOH BioPAX has the following issues:

  • different objects using the same URI (looks like - just duplicate xrefs; can be solved by setting simpleIoHandler.mergeDuplicates(true), or - manually remove all such duplicates);
  • many weird PublicationXrefs using db="UniProt" (and, e.g, id="P63092"; see, e.g., GPCR_signaling-pertussis_toxin-.owl), attached to e.g., Evidence objects; these in fact cause Normalizer to fail (URI clash between PX and PR); so, we'd replace those with RelationshipXrefs and have to fix the BioPAX Normalizer (to skip all PublicationXrefs).
  • Seems, all ProteinReferences in INOH were supposed to be generic (but instead a PR has multiple xrefs pointing to different species protein IDs); we'd convert all such PRs to proper generic PRs (i.e., create new PR per UniProt ID and attach to the original generic one via memberEntityReference property);
  • there are Xrefs attached to objects with no such property (e.g., Stoichiometry); thus, these will be auto-ignored by the biopax reader (and warnings get logged);

IgorRodchenkov added a commit that referenced this issue Jul 7, 2015

IgorRodchenkov added a commit that referenced this issue Jul 8, 2015

IgorRodchenkov added a commit that referenced this issue Jul 17, 2015

Refs issue #218 - improved INOH Cleaner and tests: makes member PRs w…
…hen a protein has several uniprot uni. xrefs; also fixes previously forgotten xrefs of complexes. Also, some dependencies were bumped.
@IgorRodchenkov

This comment has been minimized.

Member

IgorRodchenkov commented Jul 20, 2015

Another type of problem -
there are many TemplateReactions that have a Complex as the value of the 'template' property, which is apparently wrong. There can be only values/objects of Dna, Rna, RnaRegion or DnaRegion type; other types get ignored by the Paxtools' parser (and another BioPAX/OWL parser might simply fail short).

(Note: INOH BioPAX RDF/XML files were created somehow ignoring such critical errors; data files that contain illegal use of OWL properties are simply impossible to write with Paxtools Java library.)
Example INOH files that have this sort of errors are, e.g., BMP2_signaling_TGF-beta_MV.owl ("id1354461388_Transcription"), FGF8_Mouse.owl ("id814152498_Transcription" and several more.

This cannot be fixed with Paxtools; perhaps the Validator could do (with some new code added there); could be fixed with a custom script, based on regex or RDF tools...
Ideally, this must be addressed by the data Ca provider, INOH.

Addition.
After some analysis (with Gary B.), we've decided to remove all such illegal TemplateReaction objects that have a "gene" Complex (of two DNAs - coding and responsive el.) as 'template' from the model. These are hardly useful for public, for there are no standard gene/sequence identifiers (there are xrefs to INOH and IGS ontology that cannot be found any more, and DnaReferences are trivial, with name 'Dna' and no xrefs...)

@IgorRodchenkov

This comment has been minimized.

Member

IgorRodchenkov commented Jul 28, 2015

Also, e.g., in BMP2_signaling_TGF-beta_MV.owl, Pathway has pathwayOrder (steps) but no pathwayComponent values. Shall we copy reactions from the listed steps to pathwayComponent?

IgorRodchenkov added a commit that referenced this issue Jul 28, 2015

Refs issue #218 (INOH cleaner), fixes issue #219 (see BasicController…
….java); and also I updated dependencies, spring-security config, maven deploy configuration (to use OSSRH), etc.
@IgorRodchenkov

This comment has been minimized.

Member

IgorRodchenkov commented Jul 29, 2015

Ok, we've done what we could. Closing the issue for now (let's re-open if we gonna need to do more, later).

@IgorRodchenkov

This comment has been minimized.

Member

IgorRodchenkov commented Aug 7, 2015

We downloaded and saved most (if not all) of the INOH v4 data
for future use/fix here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment