This notebook contains all commands to be used to run the standard QIIME2 analyses on the merged feature table and representative sequences. These are the files named 'merged_representative_sequences.qza' and 'merged_table.qza' in the folder that can be downloaded from: https://doi.org/10.6084/m9.figshare.12217682

This assumes that you have:
- the full length SILVA 16S classifier saved as 'ref_alignments/classifier_silva_132_99_16S.qza' (https://docs.qiime2.org/2019.10/data-resources/ - follow the link for 'Silva 132 99% OTUs full-length sequences')
- the SEPP reference alignment saved as 'ref_alignments/sepp-refs-silva-128.qza' (https://docs.qiime2.org/2019.10/data-resources/ - follow the link for 'Silva 128 SEPP reference database')

In [None]:
import os
os.system('conda activate qiime2-2019.10')

1. Summarize the combined feature tables (this is to check that everything looks OK after the merges, and can be skipped if not necessary)

In [None]:
os.system('qiime feature-table summarize \
            --i-table merged_table.qza  \
            --o-visualization merged_table_summary.qzv')

2. Classify the features (this part will probably take the longest - it may take at least a day or so and is the part that may not be possible on a local computer)

In [None]:
os.system('qiime feature-classifier classify-sklearn \
            --i-reads merged_representative_sequences.qza \
            --i-classifier ref_alignments/classifier_silva_132_99_16S.qza \
            --p-n-jobs 12 \
            --output-dir taxa')

3. Export this file to look at the classifications

In [None]:
os.system('qiime tools export \
            --input-path taxa/classification.qza \
            --output-path taxa')

4. Filter low abundance features

In [None]:
os.system('qiime feature-table filter-features \
            --i-table merged_table.qza \
            --p-min-frequency 10 \
            --p-min-samples 1 \
            --o-filtered-table merged_table_filtered.qza')

5. Filter potential contaminants and those not classified at the kingdom level

In [None]:
os.system('qiime taxa filter-table \
            --i-table merged_table_filtered.qza \
            --i-taxonomy taxa/classification.qza \
            --p-include D_1__ \
            --p-exclude mitochondria,chloroplast \
            --o-filtered-table merged_table_filtered_contamination.qza')

6. Summarize the filtered table and find out how many features you have as well as the maximum sample depth (this is the "Maximum Frequency" in the "Frequency per sample" section)

In [None]:
os.system('qiime feature-table summarize \
            --i-table merged_table_filtered_contamination.qza \
            --o-visualization merged_table_filtered_contamination_summary.qzv')

7. Obtain rarefaction curves for samples

In [None]:
os.system("qiime diversity alpha-rarefaction \
            --i-table merged_table_filtered_contamination.qza \
            --p-max-depth 995391 \
            --p-steps 20 \
            --p-metrics 'observed_otus' \
            --o-visualization merged_rarefaction_curves.qzv")

8. Filter samples that have below 2000 reads

In [None]:
os.system('qiime feature-table filter-samples \
            --i-table merged_table_filtered_contamination.qza \
            --p-min-frequency 2000 \
            --o-filtered-table  merged_table_final.qza')

9. Rarefy remaining samples to 2000

In [None]:
os.system('qiime feature-table rarefy \
            --i-table merged_table_final.qza \
            --p-sampling-depth 2000 \
            --o-rarefied-table merged_table_final_rarefied.qza')

10. Filter the sequences to contain only those that are in the rarefied feature table

In [None]:
os.system('qiime feature-table filter-seqs \
            --i-data merged_representative_sequences.qza \
            --i-table merged_table_final_rarefied.qza \
            --o-filtered-data  representative_sequences_final_rarefied.qza')

11. Export feature table and sequences

In [None]:
os.system('qiime tools export \
            --input-path representative_sequences_final_rarefied.qza \
            --output-path exports')
os.system("sed -i -e '1 s/Feature/#Feature/' -e '1 s/Taxon/taxonomy/' taxa/taxonomy.tsv")
os.system('qiime tools export \
            --input-path merged_table_final_rarefied.qza \
            --output-path exports')
os.system('biom add-metadata \
            -i exports/feature-table.biom \
            -o exports/feature-table_w_tax.biom \
            --observation-metadata-fp taxa/taxonomy.tsv \
            --sc-separated taxonomy')
os.system('biom convert \
            -i exports/feature-table_w_tax.biom \
            -o exports/feature-table_w_tax.txt \
            --to-tsv \
            --header-key taxonomy')

12. Obtain phylogenetic tree using SEPP fragment insertion and the silva reference database

In [None]:
os.system('qiime fragment-insertion sepp \
            --i-representative-sequences representative_sequences_final_rarefied.qza \
            --i-reference-database ref_alignments/sepp-refs-silva-128.qza \
            --o-tree insertion_tree_rarefied.qza \
            --o-placements insertion_placements_rarefied.qza \
            --p-threads 12')

13. Export the resulting insertion tree

In [None]:
os.system('qiime tools export \
            --input-path insertion_tree_rarefied.qza \
            --output-path exports')

14. The files inside the exports folder should then be copied to the folder that the subsequent analyses will be carried out in, e.g.

In [None]:
os.system('for i in exports/* ; cp $i paper_data_20-04-14/qiime_output/; done')

Optional further diversity analyses (these will give some metrics and QIIME2 visualizations that can be viewed on the QIIME2 website, but if you include all samples that we have, then the website won't cope too well with the >2000 samples)
To do these, you will need to upload a metadata file containing all samples. You can take the metadata file that we have used from the 'python_analysis_20-04-14' folder and add your samples to this

In [None]:
os.system('qiime diversity core-metrics-phylogenetic \
            --i-table merged_table_final_rarefied.qza \
            --i-phylogeny insertion_tree_rarefied.qza \
            --p-sampling-depth 2000 \
            --m-metadata-file metadata.txt \
            --p-n-jobs 12 \
            --output-dir diversity')
os.system('qiime tools export \
            --input-path diversity/weighted_unifrac_distance_matrix.qza \
            --output-path diversity')
os.system('mv diversity/distance-matrix.tsv exports/weighted_unifrac_not_agglom.tsv')
os.system('qiime tools export \
            --input-path diversity/unweighted_unifrac_distance_matrix.qza \
            --output-path diversity')
os.system('mv diversity/distance-matrix.tsv exports/unweighted_unifrac_not_agglom.tsv')