The two previous notebooks looked at over all p values and alpha diversity for a single category. Now, we'll look at beta diversity, which lets us compare community structure.

In [1]:
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats
import seaborn as sn
import skbio

import americangut.diversity_analysis as div

from americangut.ag_data import AgData
from americangut.ag_data_dictionary import ag_data_dictionary



In [2]:
from matplotlib import rcParams

% matplotlib inline

# Formats the axes using seabron so they will be white, and have ticks
# on the bottom of the axes.
sn.set_style('ticks', {'axes.facecolor': 'none'})

# Sets up plotting parameters so that the default setting is use to Helvetica
# in plots
rcParams['font.family'] = 'sans-serif'
rcParams['font.sans-serif'] = ['Helvetica', 'Arial']
rcParams['text.usetex'] = True

Next, let's select the data set and rarefaction depth we wish to use.

In [3]:
bodysite = 'fecal'
sequence_trim = '100nt'
rarefaction_depth = '10k'

use_subset = True
use_one_sample = True

Let's pick the category to interogate.

In [4]:
group_name = 'ALCOHOL_FREQUENCY'

Now, let's read the files assoicated with the data and load the data dictionary entry for the group.

In [5]:
fecal_data = AgData(bodysite=bodysite, 
                    trim=sequence_trim, 
                    depth=rarefaction_depth, 
                    sub_participants=use_subset, 
                    one_sample=use_one_sample)

group = ag_data_dictionary[group_name]

/Users/jwdebelius/Repositories/American-Gut/ipynb/primary-processing/agp_processing/11-packaged/fecal/100nt/sub_participants/one_sample/10k


In [6]:
fecal_data.data_dir

'/Users/jwdebelius/Repositories/American-Gut/ipynb/primary-processing/agp_processing/11-packaged/fecal/100nt/sub_participants/one_sample/10k'

We're going to start by cleaning up the data. So, let's remove any samples that might be outliers (in rounds 1-21, there is a sample with alpha diveristy seven standard deivations above the mean and 4 standard deviations about the next highest sample).

We'll also clean up the mapping column as needed, to make analsyis easier.

In [7]:
fecal_data.drop_alpha_outliers()
fecal_data.clean_up_column(group)

Now that we have the data loaded, let's use a post-hoc test to evaluate comparisons between groups. For now, we're going to use the QIIME script, [`make_distance_boxplots.py`](http://qiime.org/scripts/make_distance_boxplots.html).

In [8]:
save_dir = 'beta_diversity/%(bodysite)s/%(participant_set)s/%(samples_per_participants)s' % fecal_data.data_set
!mkdir -p $save_dir

In [None]:
!make_distance_boxplots -o $save_dir -m $fecal_data.map_fp -d $fecal_data.unweighted_fp -f $fecal_data.name

We can use the generated data to make distance plots, 