Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solr semantic category filter appears to not work #52

Open
lhannest opened this issue Jun 30, 2017 · 4 comments
Open

solr semantic category filter appears to not work #52

lhannest opened this issue Jun 30, 2017 · 4 comments

Comments

@lhannest
Copy link
Contributor

lhannest commented Jun 30, 2017

I am trying to modify GolrAssociationQuery to allow for searching for associations where either the subject or object match an ID, and then also filter by semantic categories. This filtering should only apply on the concept that doesn't match an ID. The logic looks like this:

filter_queries.append(
    '(' + subject_id_filter + ' AND ' + object_category_filter + ')'    \
    ' OR '                                                              \
    '(' + object_id_filter + ' AND ' + subject_category_filter + ')'
)

Here I have implemented this, and you can see how I'm building these filters: https://github.com/lhannest/ontobio/blob/4b32b28ac83fe2e8aad0366c799490e520b4b87a/ontobio/golr/golr_query.py#L765-L783

My problem is that filtering on categories (and I'm removing the second disjunct for simplicity) isn't working. When I set the filter query to be ['(subject:"NCBIGene:84570" AND object_category:"biological process")'], I expect to get associations where the subject is NCBIGene:84570 and the object_category contains "biological process", but instead I'm getting a lot of associations where the object_category is simply "cellular component".

Even more odd, when I instead set the object_category to "cellular component", no associations are returned.

On the other hand when I instead search for the object_category being "gene", the filter seems to work perfectly well.

@cmungall
Copy link
Contributor

This is likely caused by the fact that currently GO annotations are not available in the Monarch golr index, they are available via the AmiGO golr, which follows a different schema.

Now we could have simply left the client to navigate two different APIs, but one of the goals of biolink is to provide a high level API that can be used over different databases, at least those that are similar enough, and to provide a convenient one-stop-shop endpoint.

Currently the reference implementation of biolink will introspect the category to determine how to route the query behind the scenes. This all works if you stick to the high-level categories exposed by biolink. Here "function" is the category that encompassed all GO annotation (this is potentially confusing: the GO cell component hierarchy represents material entities not function, but the semantics of the association if of functional association).

Monarch uses more granular categories, but the biolink contract (which needs to be better documented) is for a smaller set of categories

I think it's going to be most straightforward if you implement this as two calls. E.g.

def exec(...):
   if bidirectional:
      assocs1 = self.exec(bidirection=False, subject=subject, ...)
      assocs2 =  self.exec(bidirection=False, object=subject, ...)
      return assocs1+assocs2

this is perhaps less efficient but the stratification simplifies the logic as you don't need to mesh your OR logic with the logic of how the queries are implemented.

@cmungall
Copy link
Contributor

I'll leave open as a placeholder for improving the docs here. Comments from @kshefchek and others welcome

@lhannest
Copy link
Contributor Author

That could work, the only problem is that it would make pagination more awkward. What I could do is get half a page from the one, and half a page from the other. But then it would be possible, and probably even common, to receive much less than a full page of data. I've been trying to avoid this (we've run into this sort of thing before), but maybe it's not worth the effort.

@cmungall
Copy link
Contributor

good point. I don't have a satisfactory answer yet...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants