## Unipept lowest common ancestor peptide analysis:

### This tool returns the taxonomic lowest common ancestor for a given tryptic peptide. Here we're running all our de novo (PeaksDN), database search (Comet) and de novo-assisted database searched peptides through to determine their specificity and ability to ID organismal and functional source.

### You can run the `pept2lca` command as part of a [web server](https://unipept.ugent.be/datasets) or using the command line interface ([info here](https://unipept.ugent.be/clidocs)).

### I exported the LCA results to a .csv and placed into my /analysis/unipept directory:

In [1]:
cd /home/millieginty/Documents/git-repos/2017-etnp/analyses/pronovo-2020/unipept/SKQ17-PTMopt/

/home/millieginty/Documents/git-repos/2017-etnp/analyses/pronovo-2020/unipept/SKQ17-PTMopt


In [2]:
# LIBRARIES
#import pandas library for working with tabular data
import os
os.getcwd()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import kde
#import regular expresson (regex)
import re
#check pandas version
pd.__version__

'1.0.5'

### 243: 965 m 0.3 um McLane pump MSMS

### There were phylum level peptide IDs from the PeaksDB, Comet, and de novo peptides for this sample.

In [3]:
ls De-novo/

231-PTMopt-DN80_lca.csv      273-PTMopt-DN80-lca-tax.csv
231-PTMopt-DN80-lca-tax.csv  278-PTMopt-DN80_lca.csv
233-PTMopt-DN80_lca.csv      278-PTMopt-DN80-lca-tax.csv
233-PTMopt-DN80-lca-tax.csv  378-PTMopt-DN80_lca.csv
243-PTMopt-DN80_lca.csv      378-PTMopt-DN80-lca-tax.csv
243-PTMopt-DN80-lca-tax.csv  [0m[01;34mPhylum-out[0m/
273-PTMopt-DN80_lca.csv


In [4]:
dn80_lca243 = pd.read_csv('De-novo/243-PTMopt-DN80-lca-tax.csv')

# drop the peptide that aren't specific to the phylum level
dn80_phy243 = dn80_lca243[dn80_lca243.phylum.notnull()]

# drop everything else
dn80_phy243 = dn80_phy243[['peptide', 'phylum']].copy()

print('Peptides specific to the Phylum level:', len(dn80_phy243))
phylum = len(dn80_phy243)

#Cyanobacteria
print('Peptide specific to Cyanobacteria:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Cyanobacteria')]))

Cyanobacteria = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Cyanobacteria')])

#Fungi and fungi-like
print('Peptide specific to Ascomycota:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Ascomycota')]))
print('Peptide specific to Basidiomycota:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Basidiomycota')]))
print('Peptide specific to Chytridiomycota:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Chytridiomycota')]))
print('Peptide specific to Mucoromycota:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Mucoromycota')]))
print('Peptide specific to Zoopagomycota:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Zoopagomycota')]))

Ascomycota = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Ascomycota')])
Basidiomycota = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Basidiomycota')])
Chytridiomycota= len(dn80_phy243[dn80_phy243['phylum'].str.contains('Chytridiomycota')])
Mucoromycota = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Mucoromycota')])
Zoopagomycota = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Zoopagomycota')])

#Hetertrophic bacteria
print('Peptide specific to Actinobacteria:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Actinobacteria')]))
print('Peptide specific to Bacteroidetes:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Bacteroidetes')]))
print('Peptide specific to Firmicutes:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Firmicutes')]))
print('Peptide specific to Proteobacteria:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Proteobacteria')]))
print('Peptide specific to Candidatus Marinimicrobia:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Candidatus Marinimicrobia')]))

Actinobacteria = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Actinobacteria')])
Bacteroidetes = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Bacteroidetes')])
Firmicutes = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Firmicutes')])
Proteobacteria = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Proteobacteria')])
Candidatus_Marinimicrobia = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Candidatus Marinimicrobia')])

#Nitrospina
print('Peptide specific to Nitrospirae:', len(dn80_phy243[dn80_phy243['phylum'].str.contains('Nitrospirae')]))

Nitrospirae = len(dn80_phy243[dn80_phy243['phylum'].str.contains('Nitrospirae')])

# make a dictionary of phylum output
phy_db80_243 = {"Cyanobacteria" : Cyanobacteria, "Ascomycota" : Ascomycota, "Basidiomycota" : Basidiomycota,\
                   "Chytridiomycota" : Chytridiomycota, "Mucoromycota" : Mucoromycota, \
                "Zoopagomycota" : Zoopagomycota, "Actinobacteria" : Actinobacteria, "Bacteroidetes" : Bacteroidetes, \
               "Firmicutes" : Firmicutes, "Proteobacteria" : Proteobacteria, "Nitrospirae" : Nitrospirae, \
              "Candidatus Marinimicrobia" : Candidatus_Marinimicrobia}

# make phylum specific dataframes
dn80_243_cyano = dn80_phy243[dn80_phy243['phylum'].str.contains('Cyanobacteria')]
dn80_243_fungi = dn80_phy243[dn80_phy243['phylum'].str.contains('Ascomycota|Basidiomycota|Chytridiomycota|Mucoromycota|Zoopagomycota')]
dn80_243_hetb = dn80_phy243[dn80_phy243['phylum'].str.contains('Actinobacteria|Bacteroidetes|Firmicutes|Proteobacteria|Candidatus Marinimicrobia')]
dn80_243_nitro = dn80_phy243[dn80_phy243['phylum'].str.contains('Nitrospirae')]

# save as a csv
dn80_phy243.to_csv("De-novo/Phylum-out/243/243-PTMopt-DN80-lca-phy.csv")
dn80_243_cyano.to_csv("De-novo/Phylum-out/243/243-PTMopt-DN80-lca-cyano.csv")
dn80_243_fungi.to_csv("De-novo/Phylum-out/243/243-PTMopt-DN80-lca-fungi.csv")
dn80_243_hetb.to_csv("De-novo/Phylum-out/243/243-PTMopt-DN80-lca-hetb.csv")
dn80_243_nitro.to_csv("De-novo/Phylum-out/243/243-PTMopt-DN80-lca-nitro.csv")

dn80_243_hetb.head()

Peptides specific to the Phylum level: 85
Peptide specific to Cyanobacteria: 0
Peptide specific to Ascomycota: 5
Peptide specific to Basidiomycota: 2
Peptide specific to Chytridiomycota: 1
Peptide specific to Mucoromycota: 0
Peptide specific to Zoopagomycota: 0
Peptide specific to Actinobacteria: 7
Peptide specific to Bacteroidetes: 10
Peptide specific to Firmicutes: 15
Peptide specific to Proteobacteria: 26
Peptide specific to Candidatus Marinimicrobia: 0
Peptide specific to Nitrospirae: 0


Unnamed: 0,peptide,phylum
19,VLGQNEAVNAVSNALR,Proteobacteria
32,SNLGALER,Actinobacteria
35,LLLLGFYK,Firmicutes
68,TTTWTLLR,Proteobacteria
80,ALQPLGDR,Proteobacteria


In [5]:
phy_db80_243

{'Cyanobacteria': 0,
 'Ascomycota': 5,
 'Basidiomycota': 2,
 'Chytridiomycota': 1,
 'Mucoromycota': 0,
 'Zoopagomycota': 0,
 'Actinobacteria': 7,
 'Bacteroidetes': 10,
 'Firmicutes': 15,
 'Proteobacteria': 26,
 'Nitrospirae': 0,
 'Candidatus Marinimicrobia': 0}

In [6]:
pdb_lca243 = pd.read_csv('PeaksDB/243-PTMopt-PeaksDB-lca-tax.csv')

# drop the peptide that aren't specific to the phylum level
pdb_phy243 = pdb_lca243[pdb_lca243.phylum.notnull()]

# drop everything else
pdb_phy243 = pdb_phy243[['peptide', 'phylum']].copy()

print('Peptides specific to the Phylum level:', len(pdb_phy243))
phylum = len(pdb_phy243)

#Cyanobacteria
print('Peptide specific to Cyanobacteria:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Cyanobacteria')]))

Cyanobacteria = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Cyanobacteria')])

#Fungi and fungi-like
print('Peptide specific to Ascomycota:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Ascomycota')]))
print('Peptide specific to Basidiomycota:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Basidiomycota')]))
print('Peptide specific to Chytridiomycota:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Chytridiomycota')]))
print('Peptide specific to Mucoromycota:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Mucoromycota')]))
print('Peptide specific to Zoopagomycota:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Zoopagomycota')]))

Ascomycota = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Ascomycota')])
Basidiomycota = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Basidiomycota')])
Chytridiomycota= len(pdb_phy243[pdb_phy243['phylum'].str.contains('Chytridiomycota')])
Mucoromycota = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Mucoromycota')])
Zoopagomycota = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Zoopagomycota')])

#Hetertrophic bacteria
print('Peptide specific to Actinobacteria:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Actinobacteria')]))
print('Peptide specific to Bacteroidetes:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Bacteroidetes')]))
print('Peptide specific to Firmicutes:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Firmicutes')]))
print('Peptide specific to Proteobacteria:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Proteobacteria')]))
print('Peptide specific to Candidatus Marinimicrobia:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Candidatus Marinimicrobia')]))

Actinobacteria = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Actinobacteria')])
Bacteroidetes = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Bacteroidetes')])
Firmicutes = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Firmicutes')])
Proteobacteria = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Proteobacteria')])
Candidatus_Marinimicrobia = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Candidatus Marinimicrobia')])

#Nitrospina
print('Peptide specific to Nitrospirae:', len(pdb_phy243[pdb_phy243['phylum'].str.contains('Nitrospirae')]))

Nitrospirae = len(pdb_phy243[pdb_phy243['phylum'].str.contains('Nitrospirae')])

# make a dictionary of phylum output
phy_pdb_243 = {"Cyanobacteria" : Cyanobacteria, "Ascomycota" : Ascomycota, "Basidiomycota" : Basidiomycota,\
                   "Chytridiomycota" : Chytridiomycota, "Mucoromycota" : Mucoromycota, \
                "Zoopagomycota" : Zoopagomycota, "Actinobacteria" : Actinobacteria, "Bacteroidetes" : Bacteroidetes, \
               "Firmicutes" : Firmicutes, "Proteobacteria" : Proteobacteria, "Nitrospirae" : Nitrospirae, \
              "Candidatus Marinimicrobia" : Candidatus_Marinimicrobia}

# make phylum specific dataframes
pdb_243_cyano = pdb_phy243[pdb_phy243['phylum'].str.contains('Cyanobacteria')]
pdb_243_fungi = pdb_phy243[pdb_phy243['phylum'].str.contains('Ascomycota|Basidiomycota|Chytridiomycota|Mucoromycota|Zoopagomycota')]
pdb_243_hetb = pdb_phy243[pdb_phy243['phylum'].str.contains('Actinobacteria|Bacteroidetes|Firmicutes|Proteobacteria|Candidatus Marinimicrobia')]
pdb_243_nitro = pdb_phy243[pdb_phy243['phylum'].str.contains('Nitrospirae')]

# save as a csv
pdb_phy243.to_csv("PeaksDB/Phylum-out/243/243-PTMopt-pdb-lca-phy.csv")
pdb_243_cyano.to_csv("PeaksDB/Phylum-out/243/243-PTMopt-pdb-lca-cyano.csv")
pdb_243_fungi.to_csv("PeaksDB/Phylum-out/243/243-PTMopt-pdb-lca-fungi.csv")
pdb_243_hetb.to_csv("PeaksDB/Phylum-out/243/243-PTMopt-pdb-lca-hetb.csv")
pdb_243_nitro.to_csv("PeaksDB/Phylum-out/243/243-PTMopt-pdb-lca-nitro.csv")

pdb_phy243.head()

Peptides specific to the Phylum level: 30
Peptide specific to Cyanobacteria: 0
Peptide specific to Ascomycota: 2
Peptide specific to Basidiomycota: 0
Peptide specific to Chytridiomycota: 0
Peptide specific to Mucoromycota: 0
Peptide specific to Zoopagomycota: 0
Peptide specific to Actinobacteria: 2
Peptide specific to Bacteroidetes: 0
Peptide specific to Firmicutes: 0
Peptide specific to Proteobacteria: 12
Peptide specific to Candidatus Marinimicrobia: 0
Peptide specific to Nitrospirae: 0


Unnamed: 0,peptide,phylum
7,VTVEEPFYVRPEEHPGAL,Nitrospinae
8,EEEVGLDLAQNGER,Thaumarchaeota
10,TQFYNDEPEALEYGENFLVHR,Nitrospinae
11,ADEVVAAYDSGR,Proteobacteria
24,VGNPLDTYPDR,Nitrospinae


In [7]:
comet_lca243 = pd.read_csv('Comet/243-PTMopt-Comet-lca-tax.csv')

# drop the peptide that aren't specific to the phylum level
comet_phy243 = comet_lca243[comet_lca243.phylum.notnull()]

# drop everything else
comet_phy243 = comet_phy243[['peptide', 'phylum']].copy()

print('Peptides specific to the Phylum level:', len(comet_phy243))
phylum = len(comet_phy243)

#Cyanobacteria
print('Peptide specific to Cyanobacteria:', len(comet_phy243[comet_phy243['phylum'].str.contains('Cyanobacteria')]))

Cyanobacteria = len(comet_phy243[comet_phy243['phylum'].str.contains('Cyanobacteria')])

#Fungi and fungi-like
print('Peptide specific to Ascomycota:', len(comet_phy243[comet_phy243['phylum'].str.contains('Ascomycota')]))
print('Peptide specific to Basidiomycota:', len(comet_phy243[comet_phy243['phylum'].str.contains('Basidiomycota')]))
print('Peptide specific to Chytridiomycota:', len(comet_phy243[comet_phy243['phylum'].str.contains('Chytridiomycota')]))
print('Peptide specific to Mucoromycota:', len(comet_phy243[comet_phy243['phylum'].str.contains('Mucoromycota')]))
print('Peptide specific to Zoopagomycota:', len(comet_phy243[comet_phy243['phylum'].str.contains('Zoopagomycota')]))

Ascomycota = len(comet_phy243[comet_phy243['phylum'].str.contains('Ascomycota')])
Basidiomycota = len(comet_phy243[comet_phy243['phylum'].str.contains('Basidiomycota')])
Chytridiomycota= len(comet_phy243[comet_phy243['phylum'].str.contains('Chytridiomycota')])
Mucoromycota = len(comet_phy243[comet_phy243['phylum'].str.contains('Mucoromycota')])
Zoopagomycota = len(comet_phy243[comet_phy243['phylum'].str.contains('Zoopagomycota')])

#Hetertrophic bacteria
print('Peptide specific to Actinobacteria:', len(comet_phy243[comet_phy243['phylum'].str.contains('Actinobacteria')]))
print('Peptide specific to Bacteroidetes:', len(comet_phy243[comet_phy243['phylum'].str.contains('Bacteroidetes')]))
print('Peptide specific to Firmicutes:', len(comet_phy243[comet_phy243['phylum'].str.contains('Firmicutes')]))
print('Peptide specific to Proteobacteria:', len(comet_phy243[comet_phy243['phylum'].str.contains('Proteobacteria')]))
print('Peptide specific to Candidatus Marinimicrobia:', len(comet_phy243[comet_phy243['phylum'].str.contains('Candidatus Marinimicrobia')]))

Actinobacteria = len(comet_phy243[comet_phy243['phylum'].str.contains('Actinobacteria')])
Bacteroidetes = len(comet_phy243[comet_phy243['phylum'].str.contains('Bacteroidetes')])
Firmicutes = len(comet_phy243[comet_phy243['phylum'].str.contains('Firmicutes')])
Proteobacteria = len(comet_phy243[comet_phy243['phylum'].str.contains('Proteobacteria')])
Candidatus_Marinimicrobia = len(comet_phy243[comet_phy243['phylum'].str.contains('Candidatus Marinimicrobia')])

#Nitrospina
print('Peptide specific to Nitrospirae:', len(comet_phy243[comet_phy243['phylum'].str.contains('Nitrospirae')]))

Nitrospirae = len(comet_phy243[comet_phy243['phylum'].str.contains('Nitrospirae')])

# make a dictionary of phylum output
phy_comet_243 = {"Cyanobacteria" : Cyanobacteria, "Ascomycota" : Ascomycota, "Basidiomycota" : Basidiomycota,\
                   "Chytridiomycota" : Chytridiomycota, "Mucoromycota" : Mucoromycota, \
                "Zoopagomycota" : Zoopagomycota, "Actinobacteria" : Actinobacteria, "Bacteroidetes" : Bacteroidetes, \
               "Firmicutes" : Firmicutes, "Proteobacteria" : Proteobacteria, "Nitrospirae" : Nitrospirae, \
              "Candidatus Marinimicrobia" : Candidatus_Marinimicrobia}


# make phylum specific dataframes
comet_243_cyano = comet_phy243[comet_phy243['phylum'].str.contains('Cyanobacteria')]
comet_243_fungi = comet_phy243[comet_phy243['phylum'].str.contains('Ascomycota|Basidiomycota|Chytridiomycota|Mucoromycota|Zoopagomycota')]
comet_243_hetb = comet_phy243[comet_phy243['phylum'].str.contains('Actinobacteria|Bacteroidetes|Firmicutes|Proteobacteria|Candidatus Marinimicrobia')]
comet_243_nitro = comet_phy243[comet_phy243['phylum'].str.contains('Nitrospirae')]

# save as a csv
comet_phy243.to_csv("Comet/Phylum-out/243/243-PTMopt-comet-lca-phy.csv")
comet_243_cyano.to_csv("Comet/Phylum-out/243/243-PTMopt-comet-lca-cyano.csv")
comet_243_fungi.to_csv("Comet/Phylum-out/243/243-PTMopt-comet-lca-fungi.csv")
comet_243_hetb.to_csv("Comet/Phylum-out/243/243-PTMopt-comet-lca-hetb.csv")
comet_243_nitro.to_csv("Comet/Phylum-out/243/243-PTMopt-comet-lca-nitro.csv")

comet_phy243.head()

Peptides specific to the Phylum level: 3
Peptide specific to Cyanobacteria: 0
Peptide specific to Ascomycota: 0
Peptide specific to Basidiomycota: 0
Peptide specific to Chytridiomycota: 0
Peptide specific to Mucoromycota: 0
Peptide specific to Zoopagomycota: 0
Peptide specific to Actinobacteria: 0
Peptide specific to Bacteroidetes: 0
Peptide specific to Firmicutes: 0
Peptide specific to Proteobacteria: 0
Peptide specific to Candidatus Marinimicrobia: 0
Peptide specific to Nitrospirae: 0


Unnamed: 0,peptide,phylum
3,EEEVGLDLAQNGER,Thaumarchaeota
11,VTVEEPFYVRPEEHPGAL,Nitrospinae
17,LDLLDEAASSLR,Chloroflexi


In [8]:
# made a dataframe from the peaks dn and peaks db dictionaries

phy_243 = pd.DataFrame({'phy_db80_243':pd.Series(phy_db80_243),'phy_pdb_243':pd.Series(phy_pdb_243), \
                        'phy_comet_243':pd.Series(phy_comet_243)})

uni_243 = phy_243.T

uni_243['Fungi tot'] = uni_243['Ascomycota'] + uni_243['Basidiomycota'] + uni_243['Chytridiomycota'] + \
                        uni_243['Mucoromycota'] + uni_243['Zoopagomycota']

uni_243['Het tot'] = uni_243['Actinobacteria'] + uni_243['Bacteroidetes'] + uni_243['Firmicutes'] + \
                        uni_243['Proteobacteria'] + uni_243['Candidatus Marinimicrobia']


uni_243.head()

Unnamed: 0,Cyanobacteria,Ascomycota,Basidiomycota,Chytridiomycota,Mucoromycota,Zoopagomycota,Actinobacteria,Bacteroidetes,Firmicutes,Proteobacteria,Nitrospirae,Candidatus Marinimicrobia,Fungi tot,Het tot
phy_db80_243,0,5,2,1,0,0,7,10,15,26,0,0,8,58
phy_pdb_243,0,2,0,0,0,0,2,0,0,12,0,0,2,14
phy_comet_243,0,0,0,0,0,0,0,0,0,0,0,0,0,0
