# Summary:

This notebook is for visualizing antibiotic resistance gene tables generated by [ABRicate](https://github.com/tseemann/abricate) and [SRST2](https://github.com/katholt/srst2). 

    
# Example Use Case:

In this example, the complete Shakya et al. 2013 metagenome is being compared to small, medium, and large subsamples of itself after conservative or aggressive read filtering and assembly with SPAdes or MEGAHIT. The datasets used in this example are named according to their metagenome content, relative degree of read filtering, and assembler used where appropriate. SRST2 is appropriate for analysis of antibiotic resistance genes (ARG) in reads while is ABRicate is useful for analysis of ABR in contigs.

* SRR606249 = Accession number for the complete Shakya et al. 2013 metagenome
* subset50 = 50% of the complete Shakya et al. 2013 metagenome
* subset25 = 25% of the complete Shakya et al. 2013 metagenome
* subset10 = 10% of the complete Shakya et al. 2013 metagenome
* pe.trim2 = Conservative read filtering
* pe.trim30 = Aggressive read filtering
* megahit = MEGHIT assembly 
* spades = SPAdes assembly 


# Objectives:

* Create table with all of the genes found 
* Count the total number of genes found for each dataset
* Count the number of unique genes found per dataset
* Compare unique genes found using a presence/absence table
* Compare results from reads and assemblies

In [1]:
from antibiotic_res import *

# Analysis of antibiotic resistance genes in contigs using ABRicate

In [23]:
concat_abricate_files('*tab').to_csv('concatenated_abricate_results.txt')

In [3]:
calc_total_genes_abricate()

Unnamed: 0_level_0,GENE
filename,Unnamed: 1_level_1
SRR606249_1.trim30_spades_abricate.tab,15
SRR606249_subset10_1.trim2_spades_abricate.tab,15
SRR606249_subset25_1.trim2_spades_abricate.tab,15
SRR606249_1.trim2_spades_abricate.tab,14
SRR606249_subset25_1.trim30_megahit_abricate.tab,14
SRR606249_subset25_1.trim30_spades_abricate.tab,14
SRR606249_1.trim30_megahit_abricate.tab,13
SRR606249_subset25_1.trim2_megahit_abricate.tab,13
SRR606249_subset10_1.trim30_spades_abricate.tab,12
SRR606249_subset50_1.trim2_spades_abricate.tab,12


In [4]:
calculate_unique_genes_abricate()

Unnamed: 0_level_0,GENE
filename,Unnamed: 1_level_1
SRR606249_1.trim2_spades_abricate.tab,12
SRR606249_1.trim30_spades_abricate.tab,12
SRR606249_subset25_1.trim30_spades_abricate.tab,12
SRR606249_subset25_1.trim2_spades_abricate.tab,11
SRR606249_subset25_1.trim30_megahit_abricate.tab,11
SRR606249_1.trim30_megahit_abricate.tab,10
SRR606249_subset25_1.trim2_megahit_abricate.tab,10
SRR606249_subset50_1.trim2_spades_abricate.tab,10
SRR606249_1.trim2_megahit_abricate.tab,9
SRR606249_subset50_1.trim30_spades_abricate.tab,9


In [5]:
create_abricate_presence_absence_gene_table()

Unnamed: 0,SRR606249_subset10_1.trim30_megahit_abricate.tab,SRR606249_1.trim2_megahit_abricate.tab,SRR606249_subset25_1.trim30_spades_abricate.tab,SRR606249_subset25_1.trim2_spades_abricate.tab,SRR606249_1.trim30_spades_abricate.tab,SRR606249_subset25_1.trim2_megahit_abricate.tab,SRR606249_subset25_1.trim30_megahit_abricate.tab,SRR606249_subset50_1.trim30_spades_abricate.tab,SRR606249_subset10_1.trim30_spades_abricate.tab,SRR606249_subset50_1.trim2_megahit_abricate.tab,SRR606249_1.trim30_megahit_abricate.tab,SRR606249_subset10_1.trim2_spades_abricate.tab,SRR606249_subset10_1.trim2_megahit_abricate.tab,SRR606249_1.trim2_spades_abricate.tab,SRR606249_subset50_1.trim30_megahit_abricate.tab,SRR606249_subset50_1.trim2_spades_abricate.tab
qepA_1,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False
blaTEM-116_4,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False
blaOXA-48_2,False,True,False,False,True,True,True,False,False,False,True,False,False,False,False,True
catB7_1,True,False,True,False,False,False,True,False,True,False,False,True,True,False,False,False
lsa(A)_2,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
blaOXA-181_1,False,True,True,True,True,True,True,True,False,True,True,False,False,True,True,True
vat(A)_1,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True,True
vat(F)_1,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False
tet(O)_3,False,False,False,False,True,False,False,True,False,False,False,False,False,True,False,True
msr(D)_2,False,True,True,True,True,False,False,True,False,False,True,False,False,True,False,True


In [6]:
np.version.version

'1.14.5'

In [7]:
interactive_map_abricate()

clustergrammer_widget(network='{"row_nodes": [{"name": "qepA_1", "ini": 18, "clust": 0, "rank": 0, "rankvar": …

In [8]:
interactive_table_abricate() 

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

In [24]:
df = pd.read_csv('concatenated_abricate_results.csv')
qgrid.show_grid(df, show_toolbar=True)

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

# Analysis of SRST2 results

In [9]:
concat_srst2_txt("srst2/*results.txt")

Unnamed: 0,DB,LsaA_MLS,MphD_MLS,Sample,TEM-1D_Bla,allele,annotation,clusterid,coverage,depth,diffs,divergence,filename,gene,length,maxMAF,seqid,uncertainty
0,ARGannot.r1,,,SRR606249_subset10,,MphD_1613,no;no;MphD;MLS;NC_017312;2292413-2291580;834,228.0,90.528,3.798,5snp79holes,0.662,srst2/SRR606249_subset10_1.trim2.fq.gz__fullge...,MphD_MLS,834.0,0.25,1613.0,edge0.0
1,ARGannot.r1,,,SRR606249_subset50,,LsaA_298,no;no;LsaA;MLS;AY225127;41-1537;1497,33.0,100.0,17.648,26snp,1.737,srst2/SRR606249_subset50_1.trim30.fq.gz__fullg...,LsaA_MLS,1497.0,0.071,298.0,
2,ARGannot.r1,,,SRR606249_subset50,,MphD_1613,no;no;MphD;MLS;NC_017312;2292413-2291580;834,228.0,100.0,17.692,6snp,0.719,srst2/SRR606249_subset50_1.trim30.fq.gz__fullg...,MphD_MLS,834.0,0.04,1613.0,
3,,LsaA_298*,MphD_1613*,SRR606249_subset50,,,,,,,,,srst2/SRR606249_subset50_1.trim30.fq.gz__genes...,,,,,
4,,,,SRR606249_subset10,,,,,,,,,srst2/SRR606249_subset10_1.trim30.fq.gz__genes...,,,,,
5,,LsaA_298*,MphD_1613*,SRR606249,TEM-116_967*?,,,,,,,,srst2/SRR606249_1.trim30.fq.gz__genes__ARGanno...,,,,,
6,,LsaA_298*,MphD_1613*,SRR606249,TEM-116_967*?,,,,,,,,srst2/SRR606249_1.trim2.fq.gz__genes__ARGannot...,,,,,
7,ARGannot.r1,,,SRR606249_subset25,,LsaA_298,no;no;LsaA;MLS;AY225127;41-1537;1497,33.0,100.0,8.917,26snp,1.737,srst2/SRR606249_subset25_1.trim2.fq.gz__fullge...,LsaA_MLS,1497.0,0.5,298.0,
8,ARGannot.r1,,,SRR606249_subset25,,MphD_1613,no;no;MphD;MLS;NC_017312;2292413-2291580;834,228.0,100.0,9.616,6snp,0.719,srst2/SRR606249_subset25_1.trim2.fq.gz__fullge...,MphD_MLS,834.0,0.167,1613.0,
9,ARGannot.r1,,,SRR606249_subset25,,LsaA_298,no;no;LsaA;MLS;AY225127;41-1537;1497,33.0,100.0,8.01,26snp,1.737,srst2/SRR606249_subset25_1.trim30.fq.gz__fullg...,LsaA_MLS,1497.0,0.125,298.0,


In [15]:
calc_total_genes_srst2()#.to_csv('')

In [11]:
calculate_unique_genes_srst2()#.to_csv('')

Unnamed: 0_level_0,gene
filename,Unnamed: 1_level_1
srst2/SRR606249_1.trim2.fq.gz__fullgenes__ARGannot.r1__results.txt,3
srst2/SRR606249_1.trim30.fq.gz__fullgenes__ARGannot.r1__results.txt,3
srst2/SRR606249_subset25_1.trim2.fq.gz__fullgenes__ARGannot.r1__results.txt,2
srst2/SRR606249_subset25_1.trim30.fq.gz__fullgenes__ARGannot.r1__results.txt,2
srst2/SRR606249_subset50_1.trim2.fq.gz__fullgenes__ARGannot.r1__results.txt,2
srst2/SRR606249_subset50_1.trim30.fq.gz__fullgenes__ARGannot.r1__results.txt,2
srst2/SRR606249_subset10_1.trim2.fq.gz__fullgenes__ARGannot.r1__results.txt,1
srst2/SRR606249_1.trim2.fq.gz__genes__ARGannot.r1__results.txt,0
srst2/SRR606249_1.trim30.fq.gz__genes__ARGannot.r1__results.txt,0
srst2/SRR606249_subset10_1.trim2.fq.gz__genes__ARGannot.r1__results.txt,0


In [12]:
create_srst2_presence_absence_gene_table()

Unnamed: 0,srst2/SRR606249_subset50_1.trim2.fq.gz__fullgenes__ARGannot.r1__results.txt,srst2/SRR606249_1.trim30.fq.gz__fullgenes__ARGannot.r1__results.txt,srst2/SRR606249_subset50_1.trim30.fq.gz__fullgenes__ARGannot.r1__results.txt,srst2/SRR606249_subset25_1.trim30.fq.gz__fullgenes__ARGannot.r1__results.txt,srst2/SRR606249_1.trim2.fq.gz__fullgenes__ARGannot.r1__results.txt,srst2/SRR606249_subset10_1.trim2.fq.gz__fullgenes__ARGannot.r1__results.txt,srst2/SRR606249_subset25_1.trim2.fq.gz__fullgenes__ARGannot.r1__results.txt
LsaA_MLS,True,True,True,True,True,False,True
MphD_MLS,True,True,True,True,True,True,True
TEM-1D_Bla,False,True,False,False,True,False,False


In [13]:
interactive_map_srst2()

clustergrammer_widget(network='{"row_nodes": [{"name": "qepA_1", "ini": 18, "clust": 0, "rank": 0, "rankvar": …

In [14]:
interactive_table_srst2()

QgridWidget(grid_options={'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': True, 'defau…

# Conclusions: 


We analyzed and compared predicted antibiotic resistance genes (ABRs) in reads and contigs. To determine whether quality filtering and sequencing depth affected detection of ABRs we compared light and agressive trimming. A greater number of genes were detected with following assembly. Three genes, vat(F), tet(O), and blaTEM-116 4, were only detected in the SPAdes assembly. 