## TransPAT - 16S OTU
Author: Victoria Ruiz & Thomas Battaglia

### Introduction
This Notebook is meant to contain the 16S_OTU data found within the respective manuscript. It contains only the code used to generate the figures found within the main text. This entire dataset is publically available in QIITA under the ID [10527](https://qiita.ucsd.edu/study/description/10527). More details about the procedures used to generate the data can be found within the **Methods** section of the manuscript. The table found within the data folder has been processed to remove any OTU less than 0.01% relative abundance.

### Make Taxanomic Abundance Plots (Figure 4d)
These commands will generate the graphs found within the multi-panel **Figure 4d**. The command must be run on each of the 4 permutations. The final PDF is composed of multiple plots across the different Treatment groups and genders.

**Note:** The barplot abundance coors were changed in post-processing after generating the figure. The colors were changed to highlight the important bacteria.

In [12]:
# Split the table by Treatment and Gender
split_otu_table.py \
-i data/transfer_m0001.biom \
-o data/per_study_otu_tables \
-m data/transpat_mapping.txt \
-f Sex,Treatment

In [19]:
# Make a directory to store the results
mkdir -p analysis/taxa_plots/

# Plot barplot with only Male-Control
summarize_taxa_through_plots.py \
-i data/per_study_otu_tables/transfer_m0001__Sex_Male_Treatment_Control__.biom \
-o analysis/taxa_plots/male_control \
-m data/per_study_otu_tables/transpat_mapping__Sex_Male_Treatment_Control__.txt \
-c Days_post_transfer \
--sort 

# Plot barplot with only Male-PAT
summarize_taxa_through_plots.py \
-i data/per_study_otu_tables/transfer_m0001__Sex_Male_Treatment_PAT__.biom \
-o analysis/taxa_plots/male_pat1 \
-m data/per_study_otu_tables/transpat_mapping__Sex_Male_Treatment_PAT__.txt \
-c Days_post_transfer \
--sort 

# - - - - - - - - - - - - - - - - - - - - - - - - - - #

# Plot barplot with only Female-Control
summarize_taxa_through_plots.py \
-i data/per_study_otu_tables/transfer_m0001__Sex_Female_Treatment_Control__.biom \
-o analysis/taxa_plots/female_control \
-m data/per_study_otu_tables/transpat_mapping__Sex_Female_Treatment_Control__.txt \
-c Days_post_transfer \
--sort 

# Plot barplot with only Female-PAT
summarize_taxa_through_plots.py \
-i data/per_study_otu_tables/transfer_m0001__Sex_Female_Treatment_PAT__.biom \
-o analysis/taxa_plots/female_pat1 \
-m data/per_study_otu_tables/transpat_mapping__Sex_Female_Treatment_PAT__.txt \
-c Days_post_transfer \
--sort 

### Make Beta diversity PCoA (Figure 4f)
These commands will generate the graphs found within the multi-panel **Figure 4f**. A 3D plot is generate using the Emperor tool. The figure was composed of multiple views generated with the plot. The file plot can be found within `analysis/bdiv_pcoa/unweighted_unifrac_emperor_pcoa_plot/index.html`

In [1]:
# Create analysis folder
mkdir -p analysis

# Run beta diversity through plots
beta_diversity_through_plots.py \
-i data/transfer_m0001.biom \
-o analysis/bdiv_pcoa \
-m data/transpat_mapping.txt \
-t data/rep_set.tre



### Differentially abundant taxa with LEfSe (Figure S2-d)
These commands will generate the raw data to be used to make the heatmap-style taxa list in the supplemental **Figure S2-d**. A custom tool called **Koeken** was developed to more easily run the **LEfSe** tool on the commands line, over multiple timepoints. Installation can be found [Koeken (Github)](https://github.com/twbattaglia/koeken). The resulting folder contains the intermediate file from running **LEfSe**. THe final table was imported into R and a heatmap was created using the **aheatmap** R library.

In [3]:
# Make a directory to store the results
mkdir -p analysis/lefse

# Run Koeken between Control vs PAT1 pups
koeken.py \
-i data/transfer_m0001.biom \
-o analysis/lefse/control_pat_overtime \
-m data/transpat_mapping.txt \
--class Treatment \
--split Days_post_transfer \
--compare Control PAT \
--level 7

Koeken v0.2.6: Linear Discriminant Analysis (LEfSe) on a Longitudinal Microbial Dataset.
Written by Thomas W. Battaglia (tb1280@nyu.edu)

LEfSe Credits: "Metagenomic biomarker discovery and explanation"
Nicola Segata, Jacques Izard, Levi Waldron, Dirk Gevers, Larisa Miropolsky, Wendy S Garrett, and Curtis Huttenhower
Genome Biology, 12:R60, 2011

Running QIIME's summarize_taxa.py... 

Number of significantly discriminative features: 44 ( 44 ) before internal wilcoxon
Number of discriminative features with abs LDA score > 2.0 : 44


Number of significantly discriminative features: 6 ( 6 ) before internal wilcoxon
Number of discriminative features with abs LDA score > 2.0 : 6


Number of significantly discriminative features: 33 ( 33 ) before internal wilcoxon
Number of discriminative features with abs LDA score > 2.0 : 33


Number of significantly discriminative features: 30 ( 30 ) before internal wilcoxon
Number of discriminative features with abs LDA score > 2.0 : 30


Number of signi

In [6]:
## Prettify the output across timepoints
pretty_lefse.py \
-i analysis/lefse/control_pat_overtime/lefse_output/run_lefse \
-o analysis/lefse/control_pat_overtime/control_pat_heatmap/ \
-c "Control"

Prettifing the table. Please wait...

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
