# Analyzing Feb 2019 16s Sequencing Data
## Written by Kathryn Cogert, Feb 5th, 2019

### Background:
USEARCH was used to clean up this sequencing data collected from several samples.  Namely-- 

Mini column samples:
 - KC AMX D46 (Day 46 granules from large anammox column reactor, taken just before mini column inoculation)
 - KC AC Minicolumn Upper (Day 23 granules taken from minicolumn. Red granules stripped of AC only)
 - KC AC Minicolumn Mixed (Day 23 granules taken from minicolumn. mixture of black and red granules)
 - KC AC Batch (Granules from the failed batch experiment)

Red & Grey Anammox Granules:
 - KC AMX D46 (Day 46 granules from large anammox column reactor, taken just before mini column inoculation)
 - KC Red (Red granules taken from large anammox column reactor)
 - KC Grey (Grey granules taken from large anammox column reactor)

### Objective:
This analysis is intended to drop sequencing results into a CSV for data visualization in R. Preprocessing of data was done with USEARCH - see in [Pipeline.ipynb](Pipeline.ipynb))



In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interact, interactive, fixed, interact_manual
import numpy as np
import ipywidgets as widgets
%matplotlib inline
plt.rcParams['figure.figsize'] = (10,7)

### Step 1) Load in OTU counts per sample table

In [2]:
otu_count = pd.read_csv('AMX.zotutab.txt', sep='\t').rename(columns={'#OTU ID':'OTU #'})
otu_count.head()

Unnamed: 0,OTU #,KC.AC.MiniColumn.Upper,KC.AMX.D46,KC.AMX.Grey,KC.AMX.Red,KC.AC.Batch,KC.AC.MiniColumn.Mixed
0,Otu1,4547,6856,1276,8603,3822,3209
1,Otu34,312,229,364,119,10,150
2,Otu18,797,494,374,296,903,624
3,Otu22,634,486,407,203,320,535
4,Otu331,9,2,10,1,1,1


### Step 2) Load in the OTU taxonomy assignment table & data clean

In [6]:
otu_id = pd.read_table('AMX.otus.sintax', header=None, delimiter='\t', index_col=None)
otu_id_clean = otu_id. \
join(otu_id[3].str.split(',', 5, expand=True). \
rename(columns={0:'Domain',1:'Phylum',2:'Class',3:'Order',4:'Family',5:'Genus'})). \
drop(columns=[1,2,3]). \
rename(columns={0:'OTU #'})
otu_id_clean.iloc[:,1:] = otu_id_clean.iloc[:,1:].apply(lambda x: x.str.replace('[a-z]:',''))

otu_id_clean.head()

Unnamed: 0,OTU #,Domain,Phylum,Class,Order,Family,Genus
0,Otu3,Bacteria,Bacteroidetes,Bacteroidia,Cytophagales,Microscillaceae,uncultured
1,Otu2,Bacteria,Proteobacteria,Gammaproteobacteria,Betaproteobacteriales,Rhodocyclaceae,Denitratisoma
2,Otu1,Bacteria,Planctomycetes,Brocadiae,Brocadiales,Scalinduaceae,Candidatus_Scalindua
3,Otu5,Bacteria,Planctomycetes,Brocadiae,Brocadiales,Scalinduaceae,Candidatus_Scalindua
4,Otu4,Bacteria,Proteobacteria,Gammaproteobacteria,Betaproteobacteriales,Rhodocyclaceae,Denitratisoma


### Step 3) Merge two dataframes together and normalize columns

In [8]:
df = otu_id_clean.merge(right=otu_count, on='OTU #')
colsum = pd.DataFrame(df.sum(axis=0)[1:7]).transpose()
df.iloc[:,7:] = df.iloc[:,7:] /  df.iloc[:,7:].sum()
df.sort_values('OTU #').head()

Unnamed: 0,OTU #,Domain,Phylum,Class,Order,Family,Genus,KC.AC.MiniColumn.Upper,KC.AMX.D46,KC.AMX.Grey,KC.AMX.Red,KC.AC.Batch,KC.AC.MiniColumn.Mixed
2,Otu1,Bacteria,Planctomycetes,Brocadiae,Brocadiales,Scalinduaceae,Candidatus_Scalindua,0.097341,0.112858,0.027443,0.227852,0.090724,0.084726
9,Otu10,Bacteria,Planctomycetes,Phycisphaerae,,,,0.035023,0.030535,0.007398,0.007045,0.046572,0.038812
99,Otu100,Bacteria,Bacteroidetes,Bacteroidia,Flavobacteriales,Cryomorphaceae,uncultured,0.001927,0.000741,0.000409,2.6e-05,0.0,0.001663
100,Otu101,Bacteria,Proteobacteria,Gammaproteobacteria,Betaproteobacteriales,Nitrosomonadaceae,mle17,0.000257,0.000823,0.001204,0.000689,0.001258,0.00037
103,Otu102,Bacteria,Firmicutes,Clostridia,Clostridiales,,,2.1e-05,0.0,2.2e-05,0.0,0.004178,2.6e-05


### Step 4) Note unidentified organisms as such

In [10]:
df.fillna('Unidentified', inplace=True)

### Step 5) Save as CSV

In [11]:
df.to_csv('AC_Column_Abundances.csv')

### Supplemental

Previously used interactive plot to visualize by taxonomic rank.

In [113]:
@interact(Rank=['Domain', 'Phylum', 'Class', 'Order', 'Family', 'Genus'],
         Experiment = ['Activated Carbon', 'Granule Color'])
def compare_v(Rank='Order', Experiment='Activated Carbon'): 
    #to_plot = (df.groupby([Rank]).sum()/ df. \
    #           groupby([Rank]).sum().sum()). \
    #reindex(sorted(df.columns), axis=1)
    to_plot = df.groupby([Rank]).sum().sort_values('KC.AMX.D46', 
                                                   ascending=False).transpose()
    if Experiment=='Activated Carbon':
        to_plot = to_plot.loc[['KC.AMX.D46',
                               'KC.AC.MiniColumn.Upper',
                               'KC.AC.MiniColumn.Mixed', 
                               'KC.AC.Batch']]
    elif Experiment == 'Granule Color':
        to_plot = to_plot.loc[['KC.AMX.D46',
                               'KC.AMX.Grey', 
                               'KC.AMX.Red']]
    fig=to_plot.plot(kind='bar', stacked=True)
    
    #for i in [0,2,4,6,8, 10]: 
    # Omit KC.AMXRT results b/c not in v10 results.
    #    fig=to_plot.iloc[i:i+2,:].plot(kind='bar', stacked=True)
    plt.title(Experiment, color='white', fontsize=14, fontweight='bold')
    plt.xticks(color='white', size=16, rotation=45)
    plt.yticks(np.arange(0,1.1,0.1), color='white')
    fig.legend(bbox_to_anchor=(0.75, -0.1), ncol=2)



interactive(children=(Dropdown(description='Rank', index=3, options=('Domain', 'Phylum', 'Class', 'Order', 'Fa…