# Microbial forensics

In this tutorial, we'll use QIIME to analyze a subset of a real-world data set, where human-associated microbial communities were shown to have forensic potential, potentially allowing investigators to determine who touched an object based on the "microbial fingerprint" they leave behind. This forensic study was initially published in [Fierer et al (2010)](http://www.pnas.org/content/early/2010/03/01/1000162107.full.pdf).

## Getting started

Set up environment and obtain tutorial data.

In [None]:
from os import chdir
from os.path import exists, join
from functools import partial
from IPython.display import FileLinks, FileLink

chdir(working_dir)

if not exists('microbial-forensics'):
    !wget https://github.com/biocore/qiime-workshops/raw/master/mahidol-university-thailand-2015/data/microbial-forensics.tar.gz
    !tar -xzf microbial-forensics.tar.gz

tutorial_dir = 'microbial-forensics/'
chdir(tutorial_dir)
FileLink = partial(FileLink, url_prefix=join('exercises', name, tutorial_dir))
FileLinks = partial(FileLinks, url_prefix=join('exercises', name, tutorial_dir))

## Analysis steps

To see what data files are present, and view or download them, we can use ``FileLinks`` (or ``FileLink``, if providing a single file path). Execute this cell and open the `forensic-map.txt` file to view the sample metadata associated with this study.

In [None]:
FileLinks('.')

The first step in a QIIME analysis is to prepare the mapping file and validate it. In this case, the mapping file has been prepared for us, so we just need to validate it:

In [None]:
!validate_mapping_file.py -o vmf-out -m forensic-map.txt

There should be no errors or warnings reported.

Next we're going to run open reference OTU picking at 88% identity using [pick_open_reference_otus.py](http://qiime.org/scripts/pick_open_reference_otus.html). We're using a lower percent identity for decreased tutorial runtime. This will take a few minutes to run.

In [None]:
!pick_open_reference_otus.py -i forensic-seqs.fna -o otus -r 88_otus.fasta -s 0.88

We can now review the output generated by this command. Much of these are intermediary data files, so not useful to view directly, though do open the log file (listed at the top as ``log...txt``) to get an idea of what information is stored in the log file (e.g., how can you compute the run time from the information in the log file?

In [None]:
FileLink("otus/index.html")

Next we'll view summary statistics of the OTU table that was created by the previous command:

In [None]:
!biom summarize-table -i otus/otu_table_mc2_w_tax_no_pynast_failures.biom

The summary shows how many sequences were obtained per sample. One thing we need to do here is choose an even sampling depth for our diversity analyses. Any samples with fewer than that number of sequences will be discarded, and any samples with more than that number of sequences will be randomly subsampled (without replacement, i.e., *rarefied*) to contain that number of sequences. Choose an even sampling depth (hint: it should probably be over 100).

We'll next run [`core_diversity_analyses.py`](http://qiime.org/scripts/core_diversity_analyses.html), which runs several different diversity analysis commands, including alpha and beta diversity. Use the even sampling depth you chose in the previous step to replace `EVEN-SAMPLING-DEPTH` in the command below.

In [None]:
!core_diversity_analyses.py -i otus/otu_table_mc2_w_tax_no_pynast_failures.biom -o cdout -t otus/rep_set.tre -m forensic-map.txt -e EVEN-SAMPLING-DEPTH 

We can now view the output of our diversity analyses. This output will be used to answer some of the questions below.

In [None]:
FileLink('cdout/index.html')

## Exercises

1. What was the minimum number of sequences per sample? What was the maximum number of sequences per sample? What even sampling depth did you choose, and why?
2. How long did ``pick_open_reference_otus.py`` take to run (to the second)? Review the log file and compute the run time from information in that file. How long did ``core_diversity_analyses.py`` take to run (again, to the second)?
3. Which subjects had the most observed OTUs on average? See *Alpha diversity results* in the ``core_diversity_analyses.py`` output.
4. The focus of the *Fierer 2010* paper was to show that it is possible to match an individual to the objects they touch based on the microbial communities that the individual leaves behind. The Unweighted UniFrac Emperor plots (linked from the ``core_diversity_analyses.py`` ``index.html`` file) will allow you to figure out which subject (`M2`, `M3`, or `M9`) touched which keyboard (`K1`, `K2`, or `K3`). Match the individuals to the keyboard they touched, and explain how you came to this answer. There is one right answer to this question.