# American Gut OTU vs Deblur 

This notebook is a part of a group of several notebooks that will explore any differences that may exist between OTU and deblur forms of analysis of the American Gut cohort.


Import dependencies:

In [7]:
import qiime 
import pandas as pd
import numpy as np
import skbio

## Beta Diversity

We begin this comparison by running the Beta Diversity of both the OTU and Deblur data. We will then visualize both data sets using Emperor PCoA plots. Finally, we will perform a procrustes analysis between both sets of data. (described in more detail below)

#### OTUs (unweighted)

Download unweighted principal coordinates and metadata June 23 data from the ftp site in order to create emperor plot of beta diversity:

In [None]:
!curl -OL ftp://ftp.microbio.me/AmericanGut/ag-June-23-2016/06-beta/notrim/10k/ag/unweighted_unifrac_ag-pc.txt
!curl -OL ftp://ftp.microbio.me/AmericanGut/ag-June-23-2016/01-raw/metadata.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 21  378M   21 81.8M    0     0  8764k      0  0:00:44  0:00:09  0:00:35 9088k

Create OTU unweighted Emerperor PCoA plot:

In [None]:
!make_emperor.py -i "unweighted_unifrac_ag-pc.txt" -m "metadata.txt" -o "emperor_OTU-unweighted"

#### OTUs (weighted)

Download weighted principal coordinates June 23 data from the ftp site in order to create emperor plot of beta diversity.

In [None]:
!curl -OL ftp://ftp.microbio.me/AmericanGut/ag-June-23-2016/06-beta/notrim/10k/ag/weighted_unifrac_ag-pc.txt

Create OTU weighted Emerperor PCoA plot to visualize beta diversity (3D):

In [None]:
!make_emperor.py -i "weighted_unifrac_ag-pc.txt" -m "metadata.txt" -o "emperor_OTU-weighted"

#### Deblur (unweighted)

Secure copy the deblur files from barnacle using command line in terminal to establish a secure connection: scp jgeier@barnacle.ucsd.edu:/home/jona1883/scratch/AG_paper_deblurred_july_22_2016/bdiv/neg_even2k_unw_unifrac/unweighted_unifrac_ag.deblur.neg.min10.withtax.bloom.filtered.even2000_pc.txt /Users/justingeier/Documents/Justine_AG


Notes: (1:change fp appropriately) (2:must be logged out of barnacle and open new terminal to copy files) (3:use scp -r to copy directory)

Create deblur unweighted Emerperor PCoA plot:

In [None]:
!make_emperor.py -i "unweighted_unifrac_ag.deblur.neg.min10.withtax.bloom.filtered.even2000_pc.txt" -m "metadata.txt" -o "emperor_deblur-unweighted"

#### Deblur (weighted)

Secure copy the deblur files from barnacle using command line in terminal to establish a secure connection: scp jgeier@barnacle.ucsd.edu:/home/jona1883/scratch/AG_paper_deblurred_july_22_2016/bdiv/neg_even2k_w_unifrac/weighted_unifrac_ag.deblur.neg.min10.withtax.bloom.filtered.even2000_pc.txt /Users/justingeier/Documents/Justine_AG/

Notes: (1:must be logged out of barnacle and open new terminal to copy files) (2:use scp -r to copy directory)

Create deblur weighted Emerperor PCoA plot:

In [None]:
!make_emperor.py -i "weighted_unifrac_ag.deblur.neg.min10.withtax.bloom.filtered.even2000_pc.txt" -m "metadata.txt" -o "emperor_deblur-weighted"

### Procrustes Analysis

Now that we have made PCoA plots for both OTU and deblur data, we will now perform a procrustes analysis to compare the two sets of data. Specifically, procrustes analysis will allow us to see the difference in the beta diversity outputs from both OTU and deblur methods. This comparison is visualized by mapping the PCoA plot from the OTU data onto the PCoA plot from the deblur data and drawing lines between corresponding points. The length of the lines linking the two points corresponds to the closeness of the fit (short line = good fit; long line = poor fit). 

#### Procrustes Unweighted

In [None]:
!transform_coordinate_matrices.py -i "unweighted_unifrac_ag-pc.txt","unweighted_unifrac_ag.deblur.neg.min10.withtax.bloom.filtered.even2000_pc.txt" -r 999 -o "procrustes_results-unweighted"

In [None]:
!make_emperor.py -c -i "procrustes_results-unweighted" -o "procrustes_results-unweighted/plots/" -m "metadata.txt"

#### Procrustes Weighted

In [3]:
!transform_coordinate_matrices.py -i 'weighted_unifrac_ag-pc.txt','weighted_unifrac_ag.deblur.neg.min10.withtax.bloom.filtered.even2000_pc.txt' -r 1000 -o "procrustes_results-weighted"

Traceback (most recent call last):
  File "/Users/justingeier/miniconda2/envs/qiime-conda/bin/transform_coordinate_matrices.py", line 4, in <module>
    __import__('pkg_resources').run_script('qiime==1.9.1', 'transform_coordinate_matrices.py')
  File "/Users/justingeier/miniconda2/envs/qiime-conda/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg/pkg_resources/__init__.py", line 719, in run_script
    
  File "/Users/justingeier/miniconda2/envs/qiime-conda/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg/pkg_resources/__init__.py", line 1504, in run_script
    
  File "/Users/justingeier/miniconda2/envs/qiime-conda/lib/python2.7/site-packages/qiime-1.9.1-py2.7.egg-info/scripts/transform_coordinate_matrices.py", line 187, in <module>
    main()
  File "/Users/justingeier/miniconda2/envs/qiime-conda/lib/python2.7/site-packages/qiime-1.9.1-py2.7.egg-info/scripts/transform_coordinate_matrices.py", line 149, in main
    max_dimensions=num_dimensions)
  File "/Users/j

In [30]:
!make_emperor.py -c -i "procrustes_results-weighted" -o "procrustes_results-weighted/plots/" -m "metadata.txt"

Error in make_emperor.py: Could not use any of the files in the input directory.


## Mantel Tests

Now we will perform Mantel tests which will compare the two distance matrices (create from OTU and Deblur data) and show statistical significance from a permutation test. 

### Mantel Test (weighted)

In [5]:
! curl -OL ftp://ftp.microbio.me/AmericanGut/ag-June-23-2016/06-beta/notrim/10k/ag/weighted_unifrac_ag.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  477M  100  477M    0     0  9508k      0  0:00:51  0:00:51 --:--:-- 10.3M


In [8]:
otu_weighted_mantel = skbio.DistanceMatrix.read('weighted_unifrac_ag.txt')

In [9]:
deblur_weighted_mantel = skbio.DistanceMatrix.read('weighted_unifrac_ag.deblur.neg.min10.withtax.bloom.filtered.even2000.txt')

In [10]:
weighted_otu_ids = set(otu_weighted_mantel.ids)

In [11]:
weighted_deblur_ids = set(deblur_weighted_mantel.ids)

In [12]:
weighted_shared_ids = set.intersection(weighted_otu_ids, weighted_deblur_ids)

In [13]:
filtered_otu_weighted_mantel = otu_weighted_mantel.filter(weighted_shared_ids)

In [14]:
filtered_deblur_weighted_mantel = deblur_weighted_mantel.filter(weighted_shared_ids)

In [15]:
skbio.stats.distance.mantel(filtered_otu_weighted_mantel, filtered_deblur_weighted_mantel)

(0.8432902389437168, 0.001, 6017)

### Mantel Test (unweighted)

In [16]:
! curl -OL ftp://ftp.microbio.me/AmericanGut/ag-June-23-2016/06-beta/notrim/10k/ag/unweighted_unifrac_ag.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  597M  100  597M    0     0  6146k      0  0:01:39  0:01:39 --:--:-- 9098k


In [17]:
otu_unweighted_mantel = skbio.DistanceMatrix.read('unweighted_unifrac_ag.txt')

In [18]:
deblur_unweighted_mantel = skbio.DistanceMatrix.read('unweighted_unifrac_ag.deblur.neg.min10.withtax.bloom.filtered.even2000.txt')

In [19]:
otu_ids = set(otu_unweighted_mantel.ids)

In [20]:
deblur_ids = set (deblur_unweighted_mantel.ids)

In [21]:
shared_ids = set.intersection(otu_ids, deblur_ids)

In [23]:
filtered_otu_unweighted_mantel = otu_unweighted_mantel.filter(shared_ids)

In [24]:
filtered_deblur_unweighted_mantel = deblur_unweighted_mantel.filter(shared_ids)

In [25]:
skbio.stats.distance.mantel(filtered_otu_unweighted_mantel, filtered_otu_unweighted_mantel)

(1.0, 0.001, 6017)