Skip to content

Alpha and beta diversity

Gavin Douglas edited this page Apr 13, 2016 · 13 revisions

We use QIIME scripts for creating diversity plots.

The first step is to create a summary of the final BIOM file that was created after filtering low confidence OTUs:

biom summarize-table -i clustering/otu_table_high_conf.biom -o clustering/otu_table_high_conf_summary.txt

You can get more info on biom (i.e. biom-format) commands here: http://biom-format.org
We are especially interested in the sorted list of read counts under the "Counts/sample detail:" heading. This is because the next step is to normalize the OTU table to the same sample depth ("X" below). Ideally you would normalize to the sample with the lowest depth, however you might be throwing out too much information if that sample's depth is too low. This is how the command is run (you first need to make a directory called "final_otu_tables"):

mkdir final_otu_tables
single_rarefaction.py -i clustering/otu_table_high_conf.biom -o final_otu_tables/otu_table.biom -d X

Other normalization methods such as [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) can be used instead if you don't want to throw out too much data (you should run "normalize_table.py -h" for more details):
normalize_table.py -i clustering/otu_table_high_conf.biom -a DESeq2 -z -o final_otu_tables/otu_table_deseq2_norm_no_negatives.biom

Using DESeq2 is still not mainstream, although that could be changing.


Once the OTU table has been normalized the diversity plots can be generated. Note that these steps require metadata to be added into the mapping file as different columns (e.g. to classify samples into healthy vs disease).

Also, if you want to compare subsets of your samples based upon metadata then you should first run the biom "subset-table" command to create a new BIOM file (i.e. you do not need to re-run OTU-picking to compare a different subset of your data). See the 16S tutorial for an example for how to do this.

This command will generate UniFrac beta diversity plots (weighted and unweighted):

beta_diversity_through_plots.py -m map.txt -t clustering/rep_set.tre -i final_otu_tables/otu_table.biom -o plots/bdiv_otu

You will be able to open the generated HTML files (e.g. plots/bdiv_otu/weighted_unifrac_emperor_pcoa_plot/index.html) in your browser. Note that you will not be able to view the plots unless the "emperor_required_resources" directory is in the same directory as index.html.

The alpha diversity rarefaction plot can be generated by:

alpha_rarefaction.py -i final_otu_tables/otu_table.biom -o plots/alpha_rarefaction_plot -t clustering/rep_set.tre -m map.txt --min_rare_depth X1 --max_rare_depth X2 --num_steps X3

Note that the values for min_rare_depth, max_rare_depth and num_steps are dependents on the number of sequences in your normalized OTU table. You can recreate a BIOM summary file to help choose what these values should be (the max value is typically around the value that you normalized with the single_rarefaction.py script).

Finally several other tools can be useful for downstream analyses, such as STAMP, Phinch and phyloseq. Before using your data with these tools some slight formatting is needed.

To convert the BIOM file to STAMP format run:

biom_to_stamp.py -m taxonomy final_otu_tables/otu_table.biom >final_otu_tables/otu_table.spf
sed -i 's/f__Erysipelotrichaceae\tg__Clostridium/f__Erysipelotrichaceae\tg__"Clostridium"/g' final_otu_tables/otu_table.spf

The last line is necessary since there are a few OTUs where the genus Clostridium is within a different family, so we change the genus name to "Clostridium" in the other family to distinguish them.

Adding sample metadata to the BIOM file allows it to be used with Phinch, phyloseq and other tools:

biom add-metadata -i final_otu_tables/otu_table.biom -o final_otu_tables/otu_table_with_metadata.biom -m map.txt

To run additional QIIME analyses on your data, such as testing for significant differences in beta diversities, check out this page.

Clone this wiki locally