Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix wrong order of entities being parsed in metakg cnstruction #39

Merged
merged 1 commit into from
Sep 16, 2021

Conversation

marcodarko
Copy link
Contributor

MetaKG construction is creating operations with object/subject in the wrong order going against the pattern that they are stored in the file being read. This PR fixes that order and correctly gets results from previously not working queries.

@colleenXu
Copy link
Contributor

colleenXu commented Sep 15, 2021

Note:

  • Automat TRAPI v1.1 APIs are doing predicate expansion. This basically leads to "duplication" of edges for BTE since it doesn't recognize that it is getting the same data from a "related_to" query as it did from the corresponding "correlated_with" query.
  • They may also be doing node expansion (which I think is maybe less of a problem with BTE?)

If this is happening for Automat TRAPI v1.2 APIs....we will likely have to be addressed in a separate issue. maybe in api-response-transform...so BTE doesn't keep records from Automat APIs where its predicate field =/= the smartapi metadata predicate

Example: POST to https://automat.renci.org/covidkopkg/1.1/query. Even though the query given used the "related_to" predicate, all the edges actually have correlated_with edges. BTE will take something like this and assume all the edges have the "related_to" predicate when they actually dont'...

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["PUBCHEM.COMPOUND:1935"],
                    "categories":["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories":["biolink:MolecularActivity"]
               }
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            }
        }
    }
}

@colleenXu
Copy link
Contributor

colleenXu commented Sep 15, 2021

Re-review of BTE's ability to query Automat APIs is described (the previous review was here).

The APIs being called by these queries have changed:

  • CHEBI:3558 -> Gene uses covidkop, ctd, pharos
  • HP:0004382 -> Disease uses Biolink, Uberongraph, (and covidkop)
  • MONDO:0009747 -> Gene uses Robokop (and covidkop, Pharos Uberongraph)
  • CHEBI:30830 -> Disease uses CORD19, hetio (and biolink, covidkop, ctd, robokop)
  • MONDO:0011565 -> Disease uses (biolink cord19, covidkop, robokop, uberongraph)
  • UBERON:0001905 -> AnatomicalEntity uses (covidkop, robokop, uberongraph)

These queries now get results from the APIs:

  • NCBITaxon:105667 -> MolecularEntity uses FoodDB
  • SequenceVariant CAID:CA16727036 -> Gene uses GTEx
  • NCBIGene:728882 -> GeneFamily uses HGNC
  • CHEBI:6896 -> Pathway uses HMDB
  • GO:0006629 -> BiologicalProcess uses Human-GOA
  • Protein UniProtKB:Q8JPQ9 -> Protein uses Intact
  • UBERON:0005453 -> AnatomicalEntity uses ontological-hierarchy
  • AnatomicalEntity UBERON:0015048 -> AnatomicalEntity uses text-mining
  • Protein UniProtKB:A3KCJ9 -> OrganismTaxon uses viral proteome

Not getting results from example queries for:

  • Gtopdb
  • Panther

@colleenXu
Copy link
Contributor

colleenXu commented Sep 15, 2021

My suggestion is to merge / push to prod ASAP. Also, automated testing is passing on my local for this particular module's tests.

Noting though:

  • tests / queries from other modules may break and need fixing because they were built on incorrect operations that are now fixed
  • Automat APIs seem to be doing automatic predicate and semantic-type expansion. This may result in some unwanted behavior (essentially “duplicated” info) - which I suggest revisiting AFTER we migrate to TRAPI v1.2 APIs.

@marcodarko
Copy link
Contributor Author

We also expand the predicate filter I think, so if they expand it I don't think that's a problem is it? if we both have correlated_to as the predicate it should get results if I'm right

@newgene newgene merged commit 87333a5 into main Sep 16, 2021
@newgene newgene deleted the fix-entity-switch branch September 16, 2021 04:57
@colleenXu
Copy link
Contributor

In that above example, the problem is that we are querying their API both for related_to edges and correlated_with edges...and getting the same data set for both and not recognizing that they're the same.

We are instead saying there's a set of edges with related_to + a set of edges with correlated_with...

@colleenXu
Copy link
Contributor

Looks like TRAPI v1.2 APIs might not actually do this...so it's not a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants