#### Metagenomics

QIIME2 and its Artifact API, this notebook contains an analysis of 
microbiome data from microbiome transplant. [1]


In [1]:
import pandas as panda

from qiime2.metadata.metadata import Metadata, CategoricalMetadataColumn
from qiime2.sdk import Artifact, PluginManager, Result


In [2]:
pm = PluginManager()
demux_plugin = pm.plugins['demux']
demux_summarize = demux_plugin.actions['summarize']
pm.plugins

{'alignment': <qiime2.plugin.plugin.Plugin at 0x7fba84a46ee0>,
 'composition': <qiime2.plugin.plugin.Plugin at 0x7fba84420610>,
 'cutadapt': <qiime2.plugin.plugin.Plugin at 0x7fba843c3ee0>,
 'dada2': <qiime2.plugin.plugin.Plugin at 0x7fba843c3160>,
 'deblur': <qiime2.plugin.plugin.Plugin at 0x7fb9d86a0640>,
 'demux': <qiime2.plugin.plugin.Plugin at 0x7fba66c9ee80>,
 'diversity': <qiime2.plugin.plugin.Plugin at 0x7fba43d0e280>,
 'diversity-lib': <qiime2.plugin.plugin.Plugin at 0x7fba43d2fc10>,
 'emperor': <qiime2.plugin.plugin.Plugin at 0x7fba43d2fd90>,
 'feature-classifier': <qiime2.plugin.plugin.Plugin at 0x7fba43d0ec10>,
 'feature-table': <qiime2.plugin.plugin.Plugin at 0x7fba332ddac0>,
 'fragment-insertion': <qiime2.plugin.plugin.Plugin at 0x7fba332ddc70>,
 'gneiss': <qiime2.plugin.plugin.Plugin at 0x7fba332cdb20>,
 'longitudinal': <qiime2.plugin.plugin.Plugin at 0x7fb9df259e20>,
 'metadata': <qiime2.plugin.plugin.Plugin at 0x7fb9df23c9d0>,
 'phylogeny': <qiime2.plugin.plugin.Plugin

In [3]:
print(demux_summarize.description)
demux_summarize_signature = demux_summarize.signature
print(demux_summarize_signature.inputs)
print(demux_summarize_signature.parameters)
print(demux_summarize_signature.outputs)

Summarize counts per sample for all samples, and generate interactive positional quality plots based on `n` randomly selected sequences.
OrderedDict([('data', ParameterSpec(qiime_type=SampleData[SequencesWithQuality | PairedEndSequencesWithQuality | JoinedSequencesWithQuality], view_type=<class 'q2_demux._summarize._visualizer._PlotQualView'>, default=NOVALUE, description='The demultiplexed sequences to be summarized.'))])
OrderedDict([('n', ParameterSpec(qiime_type=Int, view_type=<class 'int'>, default=10000, description='The number of sequences that should be selected at random for quality score plots. The quality plots will present the average positional qualities across all of the sequences selected. If input sequences are paired end, plots will be generated for both forward and reverse reads for the same `n` sequences.'))])
OrderedDict([('visualization', ParameterSpec(qiime_type=Visualization, view_type=None, default=NOVALUE, description=NOVALUE))])


Code below removes noise from data, plots quality scores of sequencing reads 
as a box plot of 100 randomly selected samples.

In [4]:
seqs1 = Result.load('fmt-tutorial-demux-1-10p.qza')
sum_data1 = demux_summarize(seqs1)

sum_data1.visualization

  context['result_data'] = context['result_data'].append(df)


In [5]:
seqs2 = Result.load('fmt-tutorial-demux-2-10p.qza')
sum_data2 = demux_summarize(seqs2)

print(dir(sum_data2))
print(type(sum_data2.visualization))
sum_data2.visualization

  context['result_data'] = context['result_data'].append(df)


['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_asdict', '_fields', '_result', 'count', 'index', 'visualization']
<class 'qiime2.sdk.result.Visualization'>


In [8]:
dada2_plugin = pm.plugins['dada2']
dada2_denoise_single = dada2_plugin.actions['denoise_single']

quality_control1 = dada2_denoise_single(
    demultiplexed_seqs = seqs1,
    trunc_len =150,
    trim_left = 13
)

quality_control2 = dada2_denoise_single(
    demultiplexed_seqs = seqs2,
    trunc_len = 150,
    trim_left = 13
)


Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/qiime2/coniglio/data/5db90b62-c6e4-4c09-8c79-c0cdfbe2cea0/data --output_path /tmp/tmppx0sklmb/output.tsv.biom --output_track /tmp/tmppx0sklmb/track.tsv --filtered_directory /tmp/tmppx0sklmb --truncation_length 150 --trim_left 13 --max_expected_errors 2.0 --truncation_quality_score 2 --max_length Inf --pooling_method independent --chimera_method consensus --min_parental_fold 1.0 --allow_one_off False --num_threads 1 --learn_min_reads 1000000 --homopolymer_gap_penalty NULL --band_size 16

R version 4.2.3 (2023-03-15) 


Loading required package: Rcpp


DADA2: 1.26.0 / Rcpp: 1.0.10 / RcppParallel: 5.1.6 
2) Filtering .........................................................................
3) Learning Error Rates
45139719 total bases in 329487 reads from 73 samples will be used for learning the error rates.
4) Denoise samples 
.........................................................................
5) Remove chimeras (method = consensus)
6) Report read numbers through the pipeline
7) Write output
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada.R --input_directory /tmp/qiime2/coniglio/data/5f459ea8-c6a8-438e-b42f-35e4f86166d7/data --output_path /tmp/tmpegfz6v61/output.tsv.biom --output_track /tmp/tmpegfz6v61/track.tsv --filtered_directory /tmp/tmpegfz6v61 --truncation_length 150 --trim_left 13 --max_expected_errors 2.0 --truncation_qu

Loading required package: Rcpp


DADA2: 1.26.0 / Rcpp: 1.0.10 / RcppParallel: 5.1.6 
2) Filtering ................................................
3) Learning Error Rates
21584213 total bases in 157549 reads from 48 samples will be used for learning the error rates.
4) Denoise samples 
................................................
5) Remove chimeras (method = consensus)
6) Report read numbers through the pipeline
7) Write output


In [14]:
metadata_plugin = pm.plugins['metadata']
metadata_tabulate = metadata_plugin.actions['tabulate']

In [15]:
stats_metadata1 = metadata_tabulate(
    input=quality_control1.denoising_stats.view(Metadata)
                                   )
stats_metadata1.visualization

In [16]:
stats_metadata2 = metadata_tabulate(
    input=quality_control2.denoising_stats.view(Metadata)
                                    )
stats_metadata2.visualization

In [19]:
ft_plugin = pm.plugins['feature-table']
ft_merge = ft_plugin.actions['merge']
ft_merge_seqs = ft_plugin.actions['merge_seqs']
ft_summarize = ft_plugin.actions['summarize']
ft_tab_seqs = ft_plugin.actions['tabulate_seqs']

In [13]:
table_merge = ft_merge(
    tables=[quality_control1.table, quality_control2.table]
                       )
seqs_merge = ft_merge_seqs(
    data=[quality_control1.representative_sequences, 
          quality_control2.representative_sequences]
                           )

  for id_, seq in data.iteritems():


In [17]:
ft_sum = ft_summarize(table=table_merge.merged_table)
ft_sum.visualization

In [20]:
tab_seqs = ft_tab_seqs(data=seqs_merge.merged_data)
tab_seqs.visualization