# Measures of Alpha Diversity 

First, we'll ssh into our machine. Then, get the data from dropbox:

In [None]:
wget https://www.dropbox.com/s/bomdx08n71lb9z1/may-data-for-class.tgz

Next, we need to unzip the file. 

In [None]:
tar -xvzf may-data-for-class.tgz

Our first processing step will be to count the number of sequences in each sample. This is important because we need to determine what overall sampling depth to use for our analysis. 

In [None]:
count_seqs.py -i all_seqs_all_samples.fna_rep_set.fasta

Need to go into this directory: 

In [None]:
cd uclust_taxa_0.9_10_0.90/

Summarize these counts using a summary table. Excluding the output portion of this command (-o) will cause the summary table to be printed in your terminal window. Including -o will store it in a .txt file. 

In [None]:
biom summarize_table -i OTU_table_singletonfiltered.biom -o summary_OTU_table_singletonfiltered.txt 

Instead of choosing just one sampling depth, we decided to create multiple rarefaction curves at different sampling depths. We will run this command in “parallel”, which means that the data will be processed using multiple cores (30 out of our 32. 
Parameters of the function that we used:
-m: minimum sampling depth (look at output from summary table to decide - we went with 1000)
-x: maximum sampling depth 
-i: input OTU table
-s: step size
-n: number of iterations
-o: output (a directory, not a file, because this produces many files)
-O: number of “jobs,” or cores


In [None]:
parallel_multiple_rarefactions.py -i OTU_table_singletonfiltered.biom -m 8203 -x 38070 -s 5000 -n 10 -o multiple_rare_8203-38070/ -O 30

Now that we have our curves, we will calculate some alpha diversity metrics. We chose to calculate Shannon diversity, Chao1, and the 'observed_otus' metrics, but others could have been chosen as well. Full documentation on all measures of alpha diversity that can be calculated in QIIME can be found here: http://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.html

In [None]:
parallel_alpha_diversity.py -i multiple_rare_8203-38070/ -o alpha_diversity/ -m 'shannon','chao1','observed_otus' -O 30

Since we originally created multiple rarefaction curves, and then calculated alpha diversity measures for all of them, we now have multiple different datasets. Our next step is to bind all of those datasets together. 

In [None]:
collate_alpha.py -i alpha_diversity/ -o collated_alpha/

Now we can make plots to visualize our analyses! 

In [None]:
make_rarefaction_plots.py -i collated_alpha/ -m map6.txt -o rarefaction_plots/

In order to actually view our plots, we need to convert to a tgz file: 

In [None]:
tar -cvzf rarefaction_plots.tgz rarefaction_plots/

Almost done! The final step is to move this file onto our desktop. 

In [None]:
scp -r -i ~/path/to/key.pem ubuntu@yourpublicDNS:~/may-data-for-class/uclust_taxa_0.9_10_0.90/rarefaction_plots.tgz ~/Desktop
