# "Fecal microbiota transplant" tutorial

**_note:_ This guide assumes you have QIIME 2 installed (e.g. using [this procedure](https://docs.qiime2.org/2020.2/install/native/)). To execute the script properly, open this notebook in a Jupyter Notebook from within a conda QIIME 2 environment.**

**_note:_ This tutorial is an adaptation of the same [tutorial](https://docs.qiime2.org/2020.2/tutorials/fmt/) that may be found on the official [QIIME 2 docs website](https://docs.qiime2.org/2020.2/). The original tutorial uses the QIIME 2 CLI interface.**

Instead of CLI interface, this tutorial uses Artifact API - a Python 3 application programmer interface (API) for QIIME 2. The Artifact API supports interactive computing with QIIME 2 using the Python 3 programming language. The API is automatically generated, and its availability depends on which QIIME 2 plugins are currently installed. It has been optimized for use in the Jupyter Notebook. The Artifact API is a part of the QIIME 2 framework; no additional software needs to be installed to use it.

The notebook was tested using the 2020.2 version of QIIME 2.

This document is intended to be run after [the moving pictures tutorial](https://docs.qiime2.org/2020.2/tutorials/moving-pictures/). It is designed to introduce a few new ideas, and to be an exercise in applying the tools that were explored in that document.

The data used in this tutorial is derived from a [Fecal Microbiome Transplant study](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-016-0225-7) where children under the age of 18 with autism and gastrointestinal disorders, as measured by the Autism Diagnostic Interview-Revised (ADI-R) and Gastrointestinal Symptom Rating Scale (GSRS), respectively, were treated with fecal microbiota transplant in attempt to reduce the severity of their behavioral and gastrointestinal symptoms. We tracked changes in their microbiome, several metrics of the severity of autism including the Parent Global Impressions-III (PGI-III) and the Childhood Autism Rating Scale (CARS), and the severity of their gastrointestinal symptoms through their GSRS score over an eighteen week period. The microbiome was tracked through collection of weekly fecal swab samples (collected by swabbing used toilet paper) and less frequent stool samples (collected as whole stool). In the full study, which was a phase 1 clinical trial designed to test safety of the treatment, eighteen individuals received the treatment, and twenty individuals were followed as controls. The controls did not receive the treatment, but were monitored to track normal temporal variation in the gut microbiome. The fecal material that was transplanted during treatment was also sequenced in this study.

This tutorial dataset is a subsample of the data generated for this study. It includes data from five individuals who received treatment and five controls. Between six and sixteen samples are included per individual, including stool and fecal swab samples for each individual, and samples before and after FMT treatment. Five samples of the transplanted fecal material are also included.

These data were sequenced on two Illumina MiSeq sequencing runs. As in the Moving Pictures tutorial, we’ll use [DADA2](https://www.ncbi.nlm.nih.gov/pubmed/27214047) to perform initial quality control and generate our `FeatureTable[Frequency]` and `FeatureData[Sequence]` objects. However, the DADA2 denoising process is only applicable to a single sequencing run at a time, so we need to run this on a per sequencing run basis and then merge the results. We’ll work through this initial step, and then pose several questions that can be answered as an exercise.


# Importing necessary modules



In [2]:
import qiime2

In [3]:
from qiime2.plugins import demux, dada2, metadata, feature_table

# Creating a new directory

Create a directory to work in called `qiime2-fmt-tutorial` and change to that directory:


In [6]:
workdir='/path/to/your/directory/qiime2-fmt-tutorial/'

In [13]:
!mkdir -p $workdir
!cd $workdir

# Obtaining and importing data files

As in the Moving Pictures study, you should begin your analysis by familiarizing yourself with the [sample metadata](https://data.qiime2.org/2020.2/tutorials/fmt/sample_metadata.tsv) . You can again access the sample metadata as a Google Spreadsheet. Notice that there are three tabs in this spreadsheet. This first tab (called sample-metadata) contains all of the clinical metadata.



In [14]:
!wget -O $workdir/"sample-metadata.tsv" \
  "https://data.qiime2.org/2020.2/tutorials/moving-pictures/sample_metadata.tsv"

--2020-04-15 09:23:56--  https://data.qiime2.org/2020.2/tutorials/moving-pictures/sample_metadata.tsv
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://docs.google.com/spreadsheets/d/15HpBuwlUbm6Yg12qOtKOrr2dUM7B2ityv9te7KB7Xq8/export?gid=0&format=tsv [following]
--2020-04-15 09:23:57--  https://docs.google.com/spreadsheets/d/15HpBuwlUbm6Yg12qOtKOrr2dUM7B2ityv9te7KB7Xq8/export?gid=0&format=tsv
Resolving docs.google.com (docs.google.com)... 216.58.209.14, 2a00:1450:401b:808::200e
Connecting to docs.google.com (docs.google.com)|216.58.209.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/tab-separated-values]
Saving to: ‘/home/user/qiime2-moving-pictures-tutorial//sample-metadata.tsv’

/home/user/qiime2-m     [ <=>                ]   2.04K  --.-KB/s    in 0s      

2020-04-15 09:23:58 (14

Next, download the demultiplexed sequences that we’ll use in this analysis. To learn how to start a QIIME 2 analysis from fastq-formatted sequence data, see the [importing data tutorial](https://docs.qiime2.org/2020.2/tutorials/importing/). 


You can choose either a 1% subsample of the reads or a 10% subsample of the reads. If you’re just trying to gain experience with preparing and combining multiple sequencing runs of data, you can work with the 1% subsample data so that the commands will run very quickly. 

We’ll need to download two sets of demultiplexed sequences, each corresponding to one of the sequencing runs.
If you’re using this tutorial to gain additional experience in generating and interpreting QIIME 2 analysis results, you should work with the 10% subsample data so that the results will be supported by more sequence data (1% of the reads is likely not sufficient to support some of the findings of the original study).

# 10% subsample data

Now download the demultiplexed sequences that we’ll use in this analysis. The following two commands will do it for you.

In [15]:
!wget -O $workdir/"fmt-tutorial-demux-1.qza" \
  "https://data.qiime2.org/2020.2/tutorials/fmt/fmt-tutorial-demux-1-10p.qza"

--2020-04-15 09:24:05--  https://data.qiime2.org/2020.2/tutorials/fmt/fmt-tutorial-demux-1-10p.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2020.2/tutorials/fmt/fmt-tutorial-demux-1-10p.qza [following]
--2020-04-15 09:24:06--  https://s3-us-west-2.amazonaws.com/qiime2-data/2020.2/tutorials/fmt/fmt-tutorial-demux-1-10p.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.201.224
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.201.224|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20536136 (20M) [binary/octet-stream]
Saving to: ‘/home/user/qiime2-moving-pictures-tutorial//fmt-tutorial-demux-1.qza’


2020-04-15 09:25:02 (361 KB/s) - ‘/home/user/qiime2-moving-pictures-tutorial//fmt-tutorial-demux-1.qza’ sa

In [26]:
!wget -O $workdir/"fmt-tutorial-demux-2.qza" \
  "https://data.qiime2.org/2020.2/tutorials/fmt/fmt-tutorial-demux-2-10p.qza"

--2020-03-18 10:55:32--  https://data.qiime2.org/2020.2/tutorials/fmt/fmt-tutorial-demux-2-10p.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2020.2/tutorials/fmt/fmt-tutorial-demux-2-10p.qza [following]
--2020-03-18 10:55:33--  https://s3-us-west-2.amazonaws.com/qiime2-data/2020.2/tutorials/fmt/fmt-tutorial-demux-2-10p.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.212.120
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.212.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8633726 (8.2M) [binary/octet-stream]
Saving to: ‘fmt-tutorial-demux-2.qza’


2020-03-18 10:55:41 (1.12 MB/s) - ‘fmt-tutorial-demux-2.qza’ saved [8633726/8633726]



# 1% subsample data


In [28]:
!wget -O $workdir/"fmt-tutorial-demux-1.qza" \
  "https://data.qiime2.org/2020.2/tutorials/fmt/fmt-tutorial-demux-1-1p.qza"

--2020-03-18 10:55:41--  https://data.qiime2.org/2020.2/tutorials/fmt/fmt-tutorial-demux-1-1p.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2020.2/tutorials/fmt/fmt-tutorial-demux-1-1p.qza [following]
--2020-03-18 10:55:42--  https://s3-us-west-2.amazonaws.com/qiime2-data/2020.2/tutorials/fmt/fmt-tutorial-demux-1-1p.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.250.32
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.250.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2165499 (2.1M) [binary/octet-stream]
Saving to: ‘fmt-tutorial-demux-1.qza’


2020-03-18 10:55:48 (458 KB/s) - ‘fmt-tutorial-demux-1.qza’ saved [2165499/2165499]



In [27]:
!wget -O $workdir/"fmt-tutorial-demux-2.qza" \
  "https://data.qiime2.org/2020.2/tutorials/fmt/fmt-tutorial-demux-2-1p.qza"

--2020-04-15 09:31:04--  https://data.qiime2.org/2020.2/tutorials/fmt/fmt-tutorial-demux-2-1p.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2020.2/tutorials/fmt/fmt-tutorial-demux-2-1p.qza [following]
--2020-04-15 09:31:05--  https://s3-us-west-2.amazonaws.com/qiime2-data/2020.2/tutorials/fmt/fmt-tutorial-demux-2-1p.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.216.56
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.216.56|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 935392 (913K) [binary/octet-stream]
Saving to: ‘/home/user/qiime2-moving-pictures-tutorial//fmt-tutorial-demux-2.qza’


2020-04-15 09:31:07 (490 KB/s) - ‘/home/user/qiime2-moving-pictures-tutorial//fmt-tutorial-demux-2.qza’ saved [9

# Importing data as a qiime2 artifact

All data that is used as input to QIIME 2 is in form of QIIME 2 artifacts, which contain information about the type of data and the source of the data. The first thing we need to do is import these sequence data files into a QIIME 2 artifact.



In [17]:
sample_metadata = qiime2.Metadata.load(workdir+'/sample-metadata.tsv')

In [18]:
fmt_tutorial_demux_1 = qiime2.Artifact.load(workdir+'/fmt-tutorial-demux-1.qza')

In [19]:
fmt_tutorial_demux_2 = qiime2.Artifact.load(workdir+'/fmt-tutorial-demux-2.qza')

# Sequence quality control

We’ll begin by performing quality control on the demultiplexed sequences using [DADA2](https://www.ncbi.nlm.nih.gov/pubmed/27214047) , but this time we’ll run the `denoise-single` command on each set of demultiplexed sequences individually. Again, we’ll want to start by visualizing sequence quality for some of the samples in each run. When we run denoise-single, we need to use the same values for `--p-trunc-len` and `--p-trim-left` for both runs, so when looking at the visualizations that result from these two commands, think about what values would make sense for these parameters for both commands.

In [20]:
demux_summary_1 = demux.visualizers.summarize(fmt_tutorial_demux_1)

In [21]:
demux_summary_2 = demux.visualizers.summarize(fmt_tutorial_demux_2)

### Output visualizations:

In [22]:
demux_summary_1.visualization

In [24]:
demux_summary_2.visualization

### Question

Based on the plots you see in `demux-summary-1.qzv` and `demux-summary-2.qzv`, what values would you choose for `trunc_len` and `trim_left` in this case? How does these plots compare to those generated in the the moving pictures tutorial?


Here the quality seems relatively low in the first few bases, and then seems to stay relatively high through the end of the reads. We’ll therefore trim the first 13 bases from each sequence and truncate the sequences at 150 bases. Since the reads are 151 bases long, this results in very little truncation of the sequences.


In [25]:
denoised_sequences_1 = dada2.methods.denoise_single(fmt_tutorial_demux_1, trim_left = 13, trunc_len = 150)

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_single.R /tmp/qiime2-archive-dwefcyu1/5db90b62-c6e4-4c09-8c79-c0cdfbe2cea0/data /tmp/tmpg2t0es41/output.tsv.biom /tmp/tmpg2t0es41/track.tsv /tmp/tmpg2t0es41 150 13 2.0 2 Inf consensus 1.0 1 1000000 NULL 16



In [26]:
denoised_sequences_2 = dada2.methods.denoise_single(fmt_tutorial_demux_2, trim_left = 13, trunc_len = 150)

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_single.R /tmp/qiime2-archive-zvr04t4i/e7ae20b7-0d6c-4251-a0d6-2fa6463e11f1/data /tmp/tmp4l27d8ri/output.tsv.biom /tmp/tmp4l27d8ri/track.tsv /tmp/tmp4l27d8ri 150 13 2.0 2 Inf consensus 1.0 1 1000000 NULL 16



# Viewing denoising stats

The `denoise_single` commands return basic statistics about the denoising process, and can be visualized with the following:

In [47]:
denoise_stats_1 = metadata.visualizers.tabulate(denoised_sequences_1.denoising_stats.view(qiime2.Metadata))
denoise_stats.visualization

In [48]:
denoise_stats_2 = metadata.visualizers.tabulate(denoised_sequences_2.denoising_stats.view(qiime2.Metadata))
denoise_stats.visualization

# Merging denoised data

The `denoise_single` command is the last step in this analysis that needs to be run on a per sequencing run basis. We’re therefore ready to merge the artifacts generated by those two commands. First we’ll merge the two `FeatureTable[Frequency]` artifacts, and then we’ll merge the two `FeatureData[Sequence`] artifacts. This is possible because the feature ids generated in each run of `denoise_single` are directly comparable (in this case, the feature id is the md5 hash of the sequence defining the feature).

In [30]:
merged_table = feature_table.methods.merge([denoised_sequences_1.table,
                                            denoised_sequences_2.table])

In [28]:
rep_seqs = feature_table.methods.merge_seqs([denoised_sequences_1.representative_sequences,
                                             denoised_sequences_2.representative_sequences])

Next, we’ll generate a summary of the merged `FeatureTable[Frequency]` artifact.

In [2]:
visualization_table = feature_table.visualizers.summarize(merged_table.merged_table, sample_metadata)
visualization_table.visualization

NameError: name 'feature_table' is not defined

### Question

> Based on the information in `table.qzv`, what value will you choose for the `--p-sampling-depth` parameter when you run `qiime diversity core-metrics-phylogenetic`?


### Question

> Generate summaries of the tables for the individual runs of `qiime dada2 denoise-single`. How many features were defined in the first run? How many features were defined in the second run? How do these numbers compare to total number of features after merging?


We’ll also generate a summary of the merged FeatureData[Sequence] artifact. You can use this summary to obtain additional information about specific features of interest as you proceed through the analysis.

In [32]:
rep_seq_viz = feature_table.visualizers.tabulate_seqs(rep_seqs.merged_data)
rep_seq_viz.visualization

# Saving visualisation locally

In [35]:
rep_seq_viz.visualization.save(workdir+'/rep-seqs.qzv')

'/home/user/qiime2-moving-pictures-tutorial//rep-seqs.qzv'

# Diversity analysis

Now that you have `FeatureTable[Frequency]` and `FeatureData[Sequence]` objects, you’re ready to begin exploring the composition of these samples in the context of their metadata. Refer to the [moving pictures](https://docs.qiime2.org/2020.2/tutorials/moving-pictures/) tutorial to derive the specific commands that you’ll run. Several questions concern longitudinal changes in the microbiome of individuals; review the actions described in the [q2-longitudinal tutorial](https://docs.qiime2.org/2020.2/tutorials/longitudinal/) to learn about methods for longitudinal analysis that are supported in QIIME 2.

Below are some specific questions to answer about this data, grouped into a few categories. Try to collect at least one specific result to support your answer to each question.

1. The personal human microbiome.
    1. Do samples differ in composition by subject-id (i.e., across individual)?
    2. Do samples differ in richness by subject-id?
    3. Do samples differ in evenness by subject-id?
    4. Do richness, evenness, composition, and UniFrac distance change in individuals between baseline and the end of the study? Does this differ between individuals receiving FMT and control subjects? (Hint: try the paired difference/distance methods described in the [q2-longitudinal tutorial](https://docs.qiime2.org/2020.2/tutorials/longitudinal/)
    5. Do richness, evenness, composition, and UniFrac distance change over time and in relation to FMT treatment and other subject metadata? Are these metrics more variable over time in treatment or control groups? (Hint: these questions concern longitudinal measurements.)

2. Microbiota engraftment. 
    1. At approximately what week in the study do microbiome samples in individuals who receive treatment appear most similar to FMT donors in terms of unweighted UniFrac distances? (Hint: Try plotting the data with `qiime emperor plot`. Pay close attention to the color tab and visibility menu.) 
    2. At approximately what week in the study do microbiome samples in individuals who receive treatment appear most similar to FMT donors in terms of Bray-Curtis distances? 
    3. Is this pattern stronger based on unweighted UniFrac or Bray-Curtis distance? Based on how you know about these metrics, what does this suggest to you about what is changing in the microbiome with fecal microbiota transplant? Use the Jaccard and weighted UniFrac distance Emperor plots to help you refine this idea.

3. Experimental design: Comparing stool and swab sample collection methods.
    1. What feature(s) differ most in abundance between the stool and swab samples? What taxonomy is associated with those feature ids based on their best BLAST hits, and based on the results of Naive Bayes feature classification with the QIIME 2 `q2-feature-classifier` plugin?
    2. Is the microbial composition of stool and swab samples significantly different based on either unweighted UniFrac or Bray-Curtis distances between samples (yes, no, or not possible to say with the current information)?
    3. Do the donated fecal material samples appear more similar in composition to the stool or swab samples?
    4. Does community richness differ between stool samples and swab samples? Does community evenness differ between stool samples and swab samples?

4. How many samples were sequenced in each sequencing run? Do you observe any systematic differences in the samples across sequencing runs?


# Acknowledgements

The data in this tutorial was initially presented in: Microbiota Transfer Therapy alters gut ecosystem and improves gastrointestinal and autism symptoms: an open-label study. Dae-Wook Kang, James B. Adams, Ann C. Gregory, Thomas Borody, Lauren Chittick, Alessio Fasano, Alexander Khoruts, Elizabeth Geis, Juan Maldonado, Sharon McDonough-Means, Elena L. Pollard, Simon Roux, Michael J. Sadowsky, Karen Schwarzberg Lipson, Matthew B. Sullivan, J. Gregory Caporaso and Rosa Krajmalnik-Brown. Microbiome (2017) 5:10. DOI: 10.1186/s40168-016-0225-7.