Skip to content
Permalink
Browse files

Merge pull request #2813 from antgonza/redbiom.util.ids_from-exact-False

redbiom.util.ids_from exact True->False
  • Loading branch information...
charles-cowart committed Feb 21, 2019
2 parents bc4eed2 + c3e4cf4 commit d7604e81d5081eeec364e63dcdc407ae1500ace9
@@ -14,6 +14,7 @@
import redbiom.util
import redbiom.fetch
from tornado.gen import coroutine, Task
from tornado.web import HTTPError

from qiita_core.util import execute_as_transaction
from qiita_db.util import generate_study_list_without_artifacts
@@ -58,7 +59,7 @@ def _redbiom_feature_search(self, query, contexts):
study_artifacts = defaultdict(lambda: defaultdict(list))
query = [f for f in query.split(' ')]
for ctx in contexts:
for idx in redbiom.util.ids_from(query, True, 'feature', ctx):
for idx in redbiom.util.ids_from(query, False, 'feature', ctx):
aid, sample_id = idx.split('_', 1)
sid = sample_id.split('.', 1)[0]
study_artifacts[sid][aid].append(sample_id)
@@ -71,7 +72,12 @@ def _redbiom_taxon_search(self, query, contexts):
# find the features with those taxonomies and then search
# those features in the samples
features = redbiom.fetch.taxon_descendents(ctx, query)
for idx in redbiom.util.ids_from(features, True, 'feature', ctx):
# from empirical evidence we saw that when we return more than 600
# features we'll reach issue #2312 so avoiding saturating the
# workers and raise this error quickly
if len(features) > 600:
raise HTTPError(504)
for idx in redbiom.util.ids_from(features, False, 'feature', ctx):
aid, sample_id = idx.split('_', 1)
sid = sample_id.split('.', 1)[0]
study_artifacts[sid][aid].append(sample_id)
@@ -15,22 +15,29 @@ Search Options
--------------
* **Metadata**:

* The search will be on the full metadata.
* The metadata search engine uses natural language processing to search for word stems within a samples metadata. A word stem disregards modifiers and plurals, so for instance, a search for "antibiotics" will actually perform a search for "antibiot". Similarly, a search for "crying" will actually search for "cry". The words specified can be combined with set-based operations, so for instance, a search for "antibiotics & crying" will obtain the set of samples in which each sample has "antibiot" in its metadata as well as "cry". N.B., the specific category in which a stem is found is not assured to be the same, "antibiot" could be in one category and "cry" in another. A set intersection can be performed with "&", a union with "|" and a difference with "-".
* In addition to the stem-based search, value based searches can also be a applied. These use a Python-like grammar and allow for a rich set of comparisons to be performed based on a metadata category of interest. For example, "where qiita_study_id == 10317" will find all samples which have the qiita_study_id metadata category, and in which the value for that sample is "10317."
* Examples:
* The search will be on the **full metadata**.
* **Natural language processing:** The metadata search engine uses natural language processing to search for word stems within a sample metadata. A word stem disregards modifiers and plurals, so for instance, a search for *antibiotics* will actually perform a search for *antibiot*. Similarly, a search for *crying* will actually search for *cry*. The words specified can be combined with set-based operations, so for instance, a search for *antibiotics & crying* will obtain the set of samples in which each sample has *antibiot* in its metadata as well as *cry*.

* Find all samples in which the word infant exists, as well as antibiotics, where the infants are under a year old:
N.B., the specific category in which a stem is found is not assured to be the same, *antibiot* could be in one category and *cry* in another. A set intersection can be performed with "&", a union with "|" and a difference with "-".
* **Value search:** In addition to the stem-based search, value based searches can also be applied. These use a Python-like grammar and allow for a rich set of comparisons to be performed based on a metadata category of interest. For example, *where qiita_study_id == 10317* will find all samples which have the *qiita_study_id* metadata category, and in which the value for that sample is *10317*.
* **Examples:**

* Find all samples in which both the word 'infant', as well as 'antibiotics' exist, and where the infants are under a year old:

* *infant & antibiotics where age_years <= 1*

* Find all samples only belonging to the EMP in which the ph is under 7 for a variety of sample types:
* Find all samples only belonging to the EMP in which the pH is under 7, for a variety of sample types:

* soil:
*soil where ph < 7 and emp_release1 == 'True'*

* ocean water:
*water & ocean where ph > 7 and emp_release1 == 'True'*

* soil: *soil where ph < 7 and emp_release1 == 'True'*
* ocean water: *water & ocean where ph > 7 and emp_release1 == 'True'*
* non-ocean water: *water - ocean where ph > 7 and emp_release1 == 'True'*
* non-ocean water:
*water - ocean where ph > 7 and emp_release1 == 'True'*

* Or instead of ph you could search for a different metadata category:
* Or instead of pH you could search for a different metadata category:

* *water & ocean where salinity > 20*

@@ -43,17 +50,18 @@ Search Options

* **Feature**:

* The search will be on all the features, in specific: OTU ids for close reference and exact sequences for deblur.
* The search will be on all the features, in specific: **OTU ids for closed reference** or **exact sequences for deblur**.

* Examples:
* **Examples:**

* Find all samples in which the Greengenes feature 4479944 is found: "4479944"
* Find all samples in which the Greengenes feature 4479944 is found: *4479944*
* Find all samples in which the sequence exists: *TACGAAGGGTGCAAGCATTACTCGGAATTACTGGGCGTAAAGCGTGCGTAGGTGGTTCGTTAAGTCTGATGTGAAAGCCCTGGGCTCAACCTGGGAACTG*

* **Taxon**:

* The search will be only on closed reference and based on the taxonomies available. Only exact matches are returned. Note that currently only the Greengenes taxonomy is searchable, and that it requires nomenclature of a rank prefix, two underscores, and then the name.
* The search will be **only on closed reference data** and based on the taxonomies available. Only exact matches are returned. Note that currently **only the Greengenes taxonomy** is searchable, and that it requires nomenclature of a rank prefix, two underscores, and then the name.

* Examples:
* **Examples:**

* Find all samples in which the genera Escherichia is found: "g__Escherichia"
* Find all samples in which the order Clostridiales is found: "o__Clostridiales"
* Find all samples in which the genera Escherichia is found: *g__Escherichia*
* Find all samples in which the phylum Tenericutes is found: *p__Tenericutes*
@@ -174,104 +174,9 @@
<!-- Date to be fixed once we fix: https://github.com/biocore/qiita/issues/2773 -->
Redbiom only searches on public data. Last update: December 18th, 2018. Note that you will only be able to expand and add artifacts to analyses if you are signed into Qiita.
<br/><br/>
<button class="btn btn-info btn-sm" data-toggle="collapse" data-target="#redbiom-help">Help and examples?</button>
<a href="{% raw qiita_config.portal_dir %}/static/doc/html/redbiom.html" class="btn btn-info btn-sm" target="_blank">Help and examples?</a>
<br/>
</small>
<div id="redbiom-help" class="collapse">
<br/>
We have 3 search options:
<ul>
<li>
<b>Metadata</b><br/>
The search will be on the full metadata.
<br/><br/>
The metadata search engine uses natural language processing to search for
word stems within a samples metadata. A word stem disregards modifiers and
plurals, so for instance, a search for "antibiotics" will actually perform
a search for "antibiot". Similarly, a search for "crying" will actually
search for "cry". The words specified can be combined with set-based
operations, so for instance, a search for "antibiotics &amp; crying" will
obtain the set of samples in which each sample has "antibiot" in its
metadata as well as "cry". N.B., the specific category in which a stem is
found is not assured to be the same, "antibiot" could be in one category
and "cry" in another. A set intersection can be performed with "&amp;", a
union with "|" and a difference with "-".
<br/><br/>
In addition to the stem-based search, value based searches can also be a
applied. These use a Python-like grammar and allow for a rich set of
comparisons to be performed based on a metadata category of interest. For
example, "where qiita_study_id == 10317" will find all samples which have
the qiita_study_id metadata category, and in which the value for that
sample is "10317."
<br/><br/>
Examples:
<br/>
<ul>
<li>
Find all samples in which the word infant exists, as well as antibiotics,
where the infants are under a year old:
<ul>
<li>
<i>infant &amp; antibiotics where age_years <= 1</i>
</li>
</ul>
</li>
<li>
Find all samples only belonging to the EMP in which the ph is under 7 for a variety of sample types:
<ul>
<li>soil: <i>soil where ph < 7 and emp_release1 == 'True'</i></li>
<li>ocean water: <i>water &amp; ocean where ph > 7 and emp_release1 == 'True'</i></li>
<li>non-ocean water: <i>water - ocean where ph > 7 and emp_release1 == 'True'</i></li>
</ul>
<li>Or instead of ph you could search for a different metadata category:</li>
<ul>
<li><i>water &amp; ocean where salinity > 20</i></li>
</ul>
</li>
<li>Some other interesting examples:
<ul>
<li><i>feces &amp; canine</i></li>
<li><i>(beer | cider | wine | alcohol)</i></li>
<li><i>where sample_type == 'stool'</i></li>
<li><i>usa where sample_type == 'stool' and host_taxid == 9606</i></li>
</ul>
</li>
</li>
</li>
</ul>
</li>
<li>
<b>Feature</b>:<br/>
The search will be on all the features, in specific: OTU ids for close reference and exact sequences for deblur.
<br/><br/>
Examples:
<br/>
<ul>
<li>
Find all samples in which the Greengenes feature 4479944 is found: "4479944"
</li>
</ul>
</li>
<li><b>Taxon</b>:<br/>
The search will be only on closed reference and based on the taxonomies available.
Only exact matches are returned. Note that currently only the Greengenes taxonomy is
searchable, and that it requires nomenclature of a rank prefix, two underscores, and then the
name.
<br/><br/>
Examples:
<br/>
<ul>
<li>
Find all samples in which the genera Escherichia is found: "g__Escherichia"
</li>
<li>
Find all samples in which the order Clostridiales is found: "o__Clostridiales"
</li>
</ul>
</li>
</ul>
<br/>
</div>
<br/>
<form data-toggle="validator" role="form" id="submitForm">
<div class="form-group row">
@@ -89,7 +89,8 @@ def test_post_taxon(self):
}
data = deepcopy(DATA)
data[0]['artifact_biom_ids'] = {
'5': ['1.SKM3.640197'], '4': ['1.SKM3.640197']}
'5': ['1.SKD2.640178', '1.SKM3.640197'],
'4': ['1.SKM3.640197', '1.SKD2.640178']}
response = self.post('/redbiom/', post_args)
exp = {'status': 'success', 'message': '', 'data': data}
self.assertEqual(response.code, 200)

0 comments on commit d7604e8

Please sign in to comment.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.