Pick OTUs (for full and trimmed data) at approximately genus level resolution (97% similarity) using SortMeRNA, closed reference against Greengenes 13_8.

In [1]:
import os
import multiprocessing

import qiime_default_reference as qdr

import americangut.util as agu
import americangut.notebook_environment as agenv

Before we go too far, let's make sure the files we need are present.

In [2]:
filtered_sequences       = agu.get_existing_path(agenv.paths['filtered-sequences'])
filtered_sequences_100nt = agu.get_existing_path(agenv.paths['filtered-sequences-100nt'])

greengenes_reference_sequences = qdr.get_reference_sequences()
greengenes_reference_taxonomy  = qdr.get_reference_taxonomy()

And, let's make sure that the output files we need do not already exist.

In [3]:
gg_otus            = agu.get_new_path(agenv.paths['gg-otus'])
gg_otus_biom       = agu.get_new_path(agenv.paths['gg-otus-biom'])
gg_otus_100nt      = agu.get_new_path(agenv.paths['gg-otus-100nt'])
gg_otus_100nt_biom = agu.get_new_path(agenv.paths['gg-otus-100nt-biom'])

We're going to now setup a parameters file for the OTU picking runs. It is possible to specify a precomputed SortMeRNA index by indicating it's path as the environment variable `$AG_SMR_INDEX`. The reason we're using an environment variable is that it makes it much easier to inject an index during continuous integration testing.

In [4]:
_params_file = agu.get_path('sortmerna_pick_params.txt')

with open(_params_file, 'w') as f:
    f.write("pick_otus:otu_picking_method sortmerna\n")
    f.write("pick_otus:similarity 0.97\n")
    f.write("pick_otus:threads %d\n" % multiprocessing.cpu_count())
    
    if agenv.get_sortmerna_index():  
        f.write("pick_otus:sortmerna_db %s\n" % agenv.get_sortmerna_index())

And now we can actually pick the OTUs. This will take sometime. Note, we're issuing two separate commands as we're picking against the untrimmed and the trimmed data.

In [5]:
!pick_closed_reference_otus.py -i $filtered_sequences \
                               -o $gg_otus \
                               -r $greengenes_reference_sequences \
                               -t $greengenes_reference_taxonomy \
                               -p $_params_file
                
!pick_closed_reference_otus.py -i $filtered_sequences_100nt \
                               -o $gg_otus_100nt \
                               -r $greengenes_reference_sequences \
                               -t $greengenes_reference_taxonomy \
                               -p $_params_file

Traceback (most recent call last):
  File "/Users/mcdonadt/miniconda3/envs/agdev/bin/pick_closed_reference_otus.py", line 233, in <module>
    main()
  File "/Users/mcdonadt/miniconda3/envs/agdev/bin/pick_closed_reference_otus.py", line 224, in main
    status_update_callback=status_update_callback)
  File "/Users/mcdonadt/miniconda3/envs/agdev/lib/python2.7/site-packages/qiime/workflow/upstream.py", line 506, in run_pick_closed_reference_otus
    close_logger_on_success=close_logger_on_success)
  File "/Users/mcdonadt/miniconda3/envs/agdev/lib/python2.7/site-packages/qiime/workflow/util.py", line 122, in call_commands_serially
    raise WorkflowError(msg)
qiime.workflow.util.WorkflowError: 

*** ERROR RAISED DURING STEP: Make OTU table
Command run was:
 make_otu_table.py -i agp_processing/otus/gg-13_8-97-percent-otus/sortmerna_picked_otus/filtered-sequences_otus.txt -t /Users/mcdonadt/miniconda3/envs/agdev/lib/python2.7/site-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_ot

And we'll end with some sanity checking of the outputs.

In [None]:
assert os.stat(gg_otus_biom).st_size > 0
assert os.stat(gg_otus_100nt_biom).st_size > 0