## Evaluating the PTMs across peptides from different cellular compartments

### Beginning with:

    Exported peptides lists (.csvs) that contain the AAs with modifications. Want to combine peptides from the following:
    
     - from trypsin and no-digest searches
     - from DB and DN searches
     
    From 8 samples (4 timepoints and 2 treatments, trypsin- and naturallyd-digested
    
     - 322: Day 0 trypsin digested
     - 323: Day 2 trypsin digested
     - 324: Day 5 trypsin digested
     - 325: Day 12 trypsin digested
     - 329: Day 0 undigested
     - 330: Day 2 undigested
     - 331: Day 5 undigested
     - 332: Day 12 undigested
    
### Want:

    Text files with all the stripped (no mod) peptides for the following modifications:
        
        - Lysine acetylation
        - Asparagine demidation
        - Arginine methylation
        

In [1]:
# LIBRARIES
#import pandas library for working with tabular data
import os
os.getcwd()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import kde
#import regular expresson (regex)
import re
#check pandas version
pd.__version__

'1.0.5'

## Combine 322

In [2]:
cd /home/millieginty/Documents/git-repos/rot-mayer/data/processed/PTM-cellular-compartment/to-combine/T0/

/home/millieginty/Documents/git-repos/rot-mayer/data/processed/PTM-cellular-compartment/to-combine/T0


In [29]:
cat TW_322_T0_trypsin_noenz_combine_PTMopt_DB_FDR1_mod_peptides.txt TW_322_T0_trypsin_combine_PTMopt_DB_FDR1_mod_peptides.txt TW_322_T0_trypsin_combine_PTMopt_DN_mod_peptides.txt TW_322_T0_trypsin_noenz_combine_PTMopt_DN_mod_peptides.txt > all_322.csv

In [46]:
# read in the combined datafile as a dataframe

all_322 = pd.read_csv("all_322.csv", header = None)

all_322.columns = ['Peptide']

print('Total peptides:', len(all_322))

all_322.head()

Total peptides: 10445


Unnamed: 0,Peptide
0,LPQVEGTGGDVQPSQDLVR
1,AAIGPGIGQGNAAGQ
2,VIGQNEAVDAVSNAIR
3,STEFDNILIVGPIAGK
4,GPAPLPLALAHLD


In [79]:
# take all lines if they contain deamindated asparagines and make new df

keep= ["N\(\+.98"]

N_deam_322 = all_322[all_322.Peptide.str.contains('|'.join(keep))]

# now strip the special characters of the modification

N_deam_322['stripped_peptide'] = N_deam_322['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()

print('Number of deamidated asparagine peptides:', len(N_deam_322))

# keep only stripped peptide column
ndeam_322_sp = N_deam_322[["stripped_peptide"]].dropna()

# write to txt file

ndeam_322_sp.to_csv('322-T0-combined-n-deam-stripped-peptides.txt', header=False, index=False)

N_deam_322.head()

Number of deamidated asparagine peptides: 1461


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  N_deam_322['stripped_peptide'] = N_deam_322['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()


Unnamed: 0,Peptide,stripped_peptide
105,VIGQN(+.98)EAVDAVSNAIR,VIGQNEAVDAVSNAIR
167,LGEHN(+.98)IDVLEGN,LGEHNIDVLEGN
195,N(+.98)NPVLIGEPGVGK,NNPVLIGEPGVGK
212,SNGDGVIDIN(+.98)DK,SNGDGVIDINDK
275,IITHPNFNGN(+.98)TL,IITHPNFNGNTL


In [81]:
# take all lines if they contain lysine acetylations and make new df

keep= ["K\(\+42.01"]

K_acet_322 = all_322[all_322.Peptide.str.contains('|'.join(keep))]

# now strip the special characters of the modification

K_acet_322['stripped_peptide'] = K_acet_322['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()

print('Number of lysine acetylation peptides:', len(K_acet_322))

# keep only stripped peptide column
kacet_322_sp = K_acet_322[["stripped_peptide"]].dropna()

# write to txt file

kacet_322_sp.to_csv('322-T0-combined-k-acet-stripped-peptides.txt', header=False, index=False)

K_acet_322.head()

Number of lysine acetylation peptides: 420


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  K_acet_322['stripped_peptide'] = K_acet_322['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()


Unnamed: 0,Peptide,stripped_peptide
1837,TTEEK(+42.01)R,TTEEKR
1890,VSN(+.98)VLVK(+42.01),VSN
1919,K(+42.01)AEYWVK,KAEYWVK
1949,WELEVK(+42.01),WELEVK
1964,VSLNK(+42.01)VYK,VSLNKVYK


In [82]:
# take all lines if they contain methylated arginines and make new df

keep= ["R\(\+14.02"]

R_meth_322 = all_322[all_322.Peptide.str.contains('|'.join(keep))]

# now strip the special characters of the modification

R_meth_322['stripped_peptide'] = R_meth_322['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()

print('Number of deamidated asparagine peptides:', len(R_meth_322))

# keep only stripped peptide column
rmeth_322_sp = R_meth_322[["stripped_peptide"]].dropna()

# write to txt file

rmeth_322_sp.to_csv('322-T0-combined-r-meth-stripped-peptides.txt', header=False, index=False)

R_meth_322.head()

Number of deamidated asparagine peptides: 495


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  R_meth_322['stripped_peptide'] = R_meth_322['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()


Unnamed: 0,Peptide,stripped_peptide
844,GLGSGGSIR(+14.02),GLGSGGSIR
1469,VVSAAHC(+57.02)YK(+14.02)SR(+14.02),VVSAAHC
1881,LLETENP(+15.99)R(+14.02),LLETENP
1955,R(+14.02)DTVLFN(+.98)R(+14.02),R
2063,VYGP(+15.99)DER(+14.02),VYGP


## Combine 329

In [83]:
cat TW_329_T0_trypsin_noenz_combine_PTMopt_DN_mod_peptides.txt TW_329_T0_undig_noenz_combine_PTMopt_DB_FDR1_mod_peptides.txt > all_329.csv

In [84]:
# read in the combined datafile as a dataframe

all_329 = pd.read_csv("all_329.csv", header = None)

all_329.columns = ['Peptide']

print('Total peptides:', len(all_329))

all_329.head()

Total peptides: 2118


Unnamed: 0,Peptide
0,HLDVDDSGK
1,DKFDEETK
2,DPN(+.98)LPLK(+42.01)H
3,P(+15.99)KEKFE
4,DHGEVVVK


In [85]:
# take all lines if they contain deamindated asparagines and make new df

keep= ["N\(\+.98"]

N_deam_329 = all_329[all_329.Peptide.str.contains('|'.join(keep))]

# now strip the special characters of the modification

N_deam_329['stripped_peptide'] = N_deam_329['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()

print('Number of deamidated asparagine peptides:', len(N_deam_329))

# keep only stripped peptide column
ndeam_329_sp = N_deam_329[["stripped_peptide"]].dropna()

# write to txt file

ndeam_329_sp.to_csv('329-T0-combined-n-deam-stripped-peptides.txt', header=False, index=False)

N_deam_329.head()

Number of deamidated asparagine peptides: 364


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  N_deam_329['stripped_peptide'] = N_deam_329['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()


Unnamed: 0,Peptide,stripped_peptide
2,DPN(+.98)LPLK(+42.01)H,DPNH
7,SPN(+.98)N(+.98)SLK,SPNSLK
8,YN(+.98)PDLPLLGH,YNPDLPLLGH
11,DN(+.98)ADQERF,DNADQERF
12,DPN(+.98)LPLVAH,DPNLPLVAH


In [86]:
# take all lines if they contain lysine acetylations and make new df

keep= ["K\(\+42.01"]

K_acet_329 = all_329[all_329.Peptide.str.contains('|'.join(keep))]

# now strip the special characters of the modification

K_acet_329['stripped_peptide'] = K_acet_329['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()

print('Number of lysine acetylation peptides:', len(K_acet_329))

# keep only stripped peptide column
kacet_329_sp = K_acet_329[["stripped_peptide"]].dropna()

# write to txt file

kacet_329_sp.to_csv('329-T0-combined-k-acet-stripped-peptides.txt', header=False, index=False)

K_acet_329.head()

Number of lysine acetylation peptides: 90


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  K_acet_329['stripped_peptide'] = K_acet_329['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()


Unnamed: 0,Peptide,stripped_peptide
2,DPN(+.98)LPLK(+42.01)H,DPNH
17,WLVK(+42.01)LP,WLVKLP
81,TN(+.98)QQLSK(+42.01),TN
146,K(+42.01)PLFDLKDR(+15.99)P,KP
154,K(+42.01)YDPDLPLLGH,KYDPDLPLLGH


In [87]:
# take all lines if they contain methylated arginines and make new df

keep= ["R\(\+14.02"]

R_meth_329 = all_329[all_329.Peptide.str.contains('|'.join(keep))]

# now strip the special characters of the modification

R_meth_329['stripped_peptide'] = R_meth_329['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()

print('Number of deamidated asparagine peptides:', len(R_meth_329))

# keep only stripped peptide column
rmeth_329_sp = R_meth_329[["stripped_peptide"]].dropna()

# write to txt file

rmeth_329_sp.to_csv('329-T0-combined-r-meth-stripped-peptides.txt', header=False, index=False)

R_meth_329.head()

Number of deamidated asparagine peptides: 75


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  R_meth_329['stripped_peptide'] = R_meth_329['Peptide'].str.replace(r"\(.*\)","").drop_duplicates()


Unnamed: 0,Peptide,stripped_peptide
36,LGELVR(+14.02)P,LGELVRP
83,LLVVC(+57.02)R(+14.02)TP(+15.99),LLVVC
102,R(+14.02)PWTQT,RPWTQT
104,R(+14.02)KPSDPEE,RKPSDPEE
145,VLGADDR(+14.02)N,VLGADDRN
