In [13]:
#first run the other notebook to make the figs, then this to generate the reports

In [14]:
import IPython
#this snippet insert a link to toggle the code visibility
from IPython.display import HTML
display(HTML("""
<script>
    var code_show=true; //true -> hide code at first

    function code_toggle() {
        $('div.prompt').hide(); // always hide prompt

        if (code_show){
            $('div.input').hide();
        } else {
            $('div.input').show();
        }
        code_show = !code_show
    }
    $( document ).ready(code_toggle);
</script>
<a href="javascript:code_toggle()">[Toggle Code]</a>
"""))


<img style="float: right" height="50%" width="50%" src="img/logo.svg">

### Your Beyond Bacteria Sample Deep Shotgun Metagenomics Report

Thank you for your participation in the American Gut Project though the Beyond
Bacteria Perk and we appreciate your patience while we transitioned the sample
processing for this perk from our collaborators to our in-house team at UC San
Diego.

 __DISCLAIMER: The following report is intended FOR RESEARCH USE ONLY and is not a diagnostic test of
any kind. It should *NOT* be used to inform any clinical, medical, or otherwise health- or 
lifestyle-related decision-making, behavior, or activity. As scientists we do our best to
vet our data, ensure data integrity and provide the latest and best tools and analyses
available, but we do not provide any medical or clinical information or advice and no
information on specific organism found in the sample you provided is intended to be used
for this purpose.__

### Background Methods and Processing Information

In the Knight Lab, part of the Center for Microbiome Innovation at UC San Diego, 
we extracted DNA from your sample using the Earth Microbiome Project(EMP) standardized
protocol that was recently published in the November 2017 edition of [Nature](https://www.nature.com/articles/nature24621),
and is available on the [EMP website](http://www.earthmicrobiome.org/protocols-and-standards/dna-extraction-protocol/) for easy reference. The extracted metgenomic DNA was
then prepared for sequencing using our state-of-the art shotgun library preparation
and sequenced on a HiSeq4000.

<img style="float: right" height="100%" width="100%" src="img/Shotgun_overview.svg">

Following sequencing, your samples were processed using the [Oecophylla](https://github.com/biocore/oecophylla) sequencing analysis
pipeline under development in the Knight Lab using our supercomputing cluster
'Barnacle', which is housed in the SDSC data center and managed by Knight Lab systems administrators. The Barnacle cluster includes 1024 Intel Ivy-bridge compute cores as well as 384 AMD compute cores, 12TB of total Ram with a 10GbE compute
network. Storage includes 250TB of primary storage with equal amounts of dedicated backup for the different file systems. Unlike the amplicon sequencing data we use for the standard kit processing for the AGP which only detects bacteria whose 16S rRNA gene matches the patterns we commonly look for, deeper shotgun metagenomic sequencing detects all genomic DNA in the sample regardless of the type of organism prsent. This means that not only can we pick up microbes other than bacteria, but we have a lot more data to sort through and we can go beyond the operational taxonomic units (OTUs) reported for our
standard kit assessment to determine species and sometimes even strain-level identity for the microbes in
your sample.

If you would like to access your raw (.fastq) or processed (.biom) data, please contact us at info@americangut.org and we can provide you with instructions to access these large files. This data has been filtered to remove sequences that did not pass our quality control parameters, including the removal of sequences that matched to the human genome or our sequencing controls so these will not be
present in the data.

In the Oecophylla pipeline, the sequencing reads are matched to phyla, genera, and species using the [Kraken
database](https://ccb.jhu.edu/software/kraken/) and functional pathways are identified using [HUMANN2](http://huttenhower.sph.harvard.edu/humann2).

### Results

The vast majority of the organisms detected in your sample were bacterial, and although many species were identified in your sample, a few types of organisms tended to dominate most of the stool sample you provided.
<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_phylum.svg">
<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_genera.svg">
<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_species.svg">

Since the whole metagenomic sequencing pipeline is able to capture organisms beyond bacteria we have highlighted the most commonly detected viruses and fungi as well, though overall these were in much lower abundance. 

Compared to the bacteriome, relatively little is known about the rapidly evolving viruses that dwell on the boundary between life and abiotic existance. This is relfected in the large number of unassigned, putative viruses detected from their DNA:

<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_virus_families.svg">
<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_virus_genera.svg">
<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_virus_species.svg">

The fungal microbiome has been studied in detail primarily through model organisms and the food we consume such as beer, wine, cheese, which often contains fungi, or in the case of mushrooms and truffles entirely fungal. We found a small amount of fungi in your stool sample and the top organisms are highlighted here:

<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_fungi_phyla.svg">
<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_fungi_genera.svg">
<img style="float: right" height="100%" width="100%" src="./BB_static_plots/10317.000017160_fungi_species.svg">

You can explore all the organisms detected in your sample in the Qiime2 View below:

In [15]:
#displays the individual's combined_profile biom summary to explore the organisms detected
biom_summary_url = 'https://view.qiime2.org/visualization/?src=https%3A%2F%2Fdl.dropbox.com%2Fs%2Fzwkm1w22wqc67nw%2Fbiom_summary.qzv%3Fdl%3D1'
biom_summary_iframe = '<iframe src=' + biom_summary_url + ' width=100% height=800></iframe>'
IPython.display.HTML(biom_summary_iframe)

The raw table of these values, viewable in Excel is [here](raw/tsv/10317.000017160_raw_counts.tsv). For advanced users, you can access the [Qiime2 Artifact(.qza)](raw/qza/10317.000017160_kraken_comb_prof_table.qza) or [BIOM(.biom)](raw/biom/10317.000017160_kraken_comb_prof.biom) files.

### Functional pathways

Determining which pathways and processes are active in a sample is best determined using alterative methods, but by looking at the gene pathways detected in the samples can provide us insight into the functional potential of the microbial community. Life is complex from the smallest organism to the largest and the microbes that live in us have adapted the unique environment of the human body. To do so, they rely on a huge variety of functional pathways to keep growing, multiplying, helping, and sometimes harming, us.

The top 10 pathways detected in the provided stool sample are below:
<img style="float: right" height="100%" width="100%" src="img/10317.000017160_function.svg">

The raw table of these values, viewable in Excel is [here](raw/tsv/10317.000017160_pathway_counts.tsv). For advanced users, you can access the [Qiime2 Artifact(.qza)](raw/qza/10317.000017160_humann2_cpm_table.qza) or [BIOM(.biom)](raw/biom/10317.000017160_humann2_cpm.biom) files.

### Your Sample in Context

In our latest round of participation for this perk we had 17 total participants
who all provided stool samples. Your sample was combined for analysis with these
participants and ~500 other AGP participants whose samples have been processed
for deeper shotgun metagenomic sequencing through sponsorship from outside partners.

To visualize your sample in the context of these individuals, we have compressed all of the information about the composition of your sample into a single point and measured its similarity to the other individuals to display in a Principal Coordinates Analysis (PCoA) plot, just like the figures that you have likely seen Rob Knight present in his [TED talk](https://youtu.be/i-icXZ2tMRM?t=8m34s).

In the plots below, your samples is represented by the enlarged diamond-shaped point. You can compare where your sample sits relative to a variety of other information by clicking on the  'Color' tab from the menu on the right side of the plot and changing the category from 'country' to any of the other survey information our participants report.

The first plot shows a comparison of Jaccard distance, which only considers the presence or absence of organisms in a sample from an individual. You can think of this as measuring 'who's there' in the sample. For example, if Forest A and Forest B each have lions, tigers, and bears, they are more similar to each other than a third forest (Forest C) that has lions and tigers, but no bears. So on this plot, two dots are closer together if they have similar organisms inside their samples, regardless of how many there are.

In [16]:
### Here's a plot of your sample relative to others based only on 'who's there'

In [17]:
jaccard_url = 'https://view.qiime2.org/visualization/?src=https%3A%2F%2Fdl.dropbox.com%2Fs%2Ffpx069q25tykihr%2Fjaccard_emperor_preset.qzv%3Fdl%3D1'
jaccard_iframe = '<iframe src=' + jaccard_url + ' width=100% height=800></iframe>'
IPython.display.HTML(jaccard_iframe)

The second plot shows a comparison of your Bray-Curtis distance, which takes into account not just who's there, but how many of each organism are in your sample. Using the example above, if Forest A has 50 lions, 48 tigers, and 2 bears while Forest B has 50 lions, 25 tigers and 25 bears, Forest A may be more similar to Forest C with 50 lions, 50 tigers, and 0 bears than Forest B is to Forest C.

It can be a little confusing to think about these relationships so it's best to just focus on the fact that we have different ways of measuring how similar two samples are to each other, and the different measurements will tell us different things about the community and its structure. 

In [18]:
### Here's a plot of your sample relative to others based on 'who's there and how many there are

In [19]:
braycurtis_url = 'https://view.qiime2.org/visualization/?src=https%3A%2F%2Fdl.dropbox.com%2Fs%2Fvww6m1dk8ih09iy%2Fbray_curtis_emperor_preset.qzv%3Fdl%3D1'
braycurtis_iframe = '<iframe src=' + braycurtis_url + ' width=100% height=800></iframe>'
IPython.display.HTML(braycurtis_iframe)

### Conclusions

No matter what you find in your gut, remember that these methods and techniques are on the cutting edge of research and will require years of validation and confirmation before we'll know how to interpret them to guide diets, lifestyles and healthcare decisions. While there are some emerging clinical trials that implement microbiome information we would like to stress again that no clinical, medical, or otherwise health- or lifestyle-related decision-making, behavior, or activity is advised based on this data. Instead, as with any health matter that arises, if you find something of concern, we recommend you remember that this report is research-grade only and instead contact your local primary care provider to discuss your specific health conditions and options.

Thank you again for your participation in the American Gut Project and we look forward to your feedback and continued support for this citizen-science led initiative.