## Prepare unassigned OTUs for re-analysis

Based on the results of a metaBEAT run, generate a new BIOM table containing only OTUs that were not assigned with a taxonomy and prepare a fasta file with the corresponding sequences in it.

Example files:

- `global_queries.fasta` - fasta file containing all query sequences (global centroids), as produced by e.g. an initial metaBEAT run
- `test-OTU-taxonomy.biom` - taxonomy annotated OTU biom table in json format from a metaBEAT run. Not the taxonomy collapsed BIOM table. 

Load the necessary functions. Functions are in place as of version '0.97.4-global' (commit: 9110e5a3f4a979e85733f83cb0388b00586544f6).

In [2]:
import metaBEAT_global_misc_functions as mb

Read in BIOM file.

In [3]:
table = mb.load_BIOM('test-OTU-taxonomy.biom', informat='json')


Specified BIOM input format 'json' - ok!


In [4]:
#double check that we've got a table
#print table

Extract only OTUs that were not assigned to a taxonomy - i.e. 'unassigned'.

In [5]:
unassigned_table = mb.BIOM_return_by_tax_level(taxlevel='unassigned', BIOM=table, invert=False)

Found taxonomy metadata with OTUs - ok!


In [7]:
#check metadata in new table to see if we only got the unassigned bits
#print unassigned_table.metadata(axis='observation')

Extract only the sequences mentioned in the table.

In [6]:
mb.extract_fasta_by_BIOM_OTU_ids(in_fasta='global_queries.fasta', 
                                 BIOM=unassigned_table, 
                                 out_fasta='unassigned_only.fasta')

Looking to extract 647 sequences
Parsing global_queries.fasta
identified 647 target sequences .. OK!
Writing sequences to file: unassigned_only.fasta


Remove taxonomy metadata from table. If you want to use the table as input for another metaBEAT run this needs to be done.

In [7]:
unassigned_table_notax = mb.drop_BIOM_taxonomy(unassigned_table)

In [8]:
#double check that the taxonomy is gone
#print unassigned_table_notax.metadata(axis='observation')

Write reduced table without taxonomy metadata, i.e. denovo table, to file.

In [9]:
mb.write_BIOM(BIOM=unassigned_table_notax, target_prefix='unassigned_only_denovo', outfmt=['json','tsv'])

Writing 'unassigned_only_denovo.biom'
Writing 'unassigned_only_denovo.tsv'


The files '`unassigned_only_denovo.biom`' and '`unassigned_only.fasta`' can be used as input for new metaBEAT run.

For a BLAST search of records against full Genbank, e.g.:
```bash
!metaBEAT_global.py \
-B unassigned_only_denovo.biom \
--g_queries unassigned_only.fasta \
--blast --blast_db ~/path/to/your/nt --min_ident 0.85 \
-o unassigned_only &> log
```