## Manipulation of Peaks de novo results of ETNP 2017 P2 samples LC-MS/MS data using python.

### Starting with:

Results from separate Peaks notebooks manipulating raw de novo outputs:

- [231: 100 m suspended (GF75 0.3 um)](https://github.com/MeganEDuffy/2017-etnp/blob/master/notebooks/SKQ17-Peaks/SKQ17-231-100m-0.3-PeaksDN.ipynb)
- [233: 265 m suspended (GF75 0.3 um)](https://github.com/MeganEDuffy/2017-etnp/blob/master/notebooks/SKQ17-Peaks/SKQ17-233-265m-0.3-PeaksDN.ipynb)
- [243: 965 m suspended (GF75 0.3 um)](https://github.com/MeganEDuffy/2017-etnp/blob/master/notebooks/SKQ17-Peaks/SKQ17-243-965m-0.3-PeaksDN.ipynb)
- [378: 100 m sediment trap](https://github.com/MeganEDuffy/2017-etnp/blob/master/notebooks/SKQ17-Peaks/SKQ17-378-100m-trap-PeaksDN.ipynb)
- [278: 265 m sediment trap](https://github.com/MeganEDuffy/2017-etnp/blob/master/notebooks/SKQ17-Peaks/SKQ17-278-265m-trap-PeaksDN.ipynb)
- [273: 965 m sediment trap]((https://github.com/MeganEDuffy/2017-etnp/blob/master/notebooks/SKQ17-Peaks/SKQ17-273-965m-trap-PeaksDN.ipynb))

### Goal: plot datasetwide trends in AA composition and PTMs

In [1]:
cd /home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt/

/home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt


In [6]:
# LIBRARIES
#import pandas library for working with tabular data
import os
os.getcwd()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import kde
#import regular expresson (regex)
import re
#check pandas version
pd.__version__

'1.0.5'

In [2]:
ls

ETNP-SKQ17-231-100m-0.3-JA2_DN50.csv
ETNP-SKQ17-231-100m-0.3-JA2_DN50_ptm.csv
ETNP-SKQ17-231-100m-0.3-JA2_DN50_stripped.csv
ETNP-SKQ17-231-100m-0.3-JA2_DN50_stripped_peptides.fas
ETNP-SKQ17-231-100m-0.3-JA2_DN50_stripped_peptides.txt
ETNP-SKQ17-231-100m-0.3-JA2_DN50_totals.csv
ETNP-SKQ17-231-100m-0.3-JA2_DN80_stripped_peptides.fas
ETNP-SKQ17-231-100m-0.3-JA2_DN80_stripped_peptides.txt
ETNP-SKQ17-233-265m-0.3-JA2_DN50_stripped.csv
ETNP-SKQ17-233-265m-0.3-JA4_DN50.csv
ETNP-SKQ17-233-265m-0.3-JA4_DN50_ptm.csv
ETNP-SKQ17-233-265m-0.3-JA4_DN50_stripped.csv
ETNP-SKQ17-233-265m-0.3-JA4_DN50_stripped_peptides.txt
ETNP-SKQ17-233-265m-0.3-JA4_DN50_totals.csv
ETNP-SKQ17-243-965m-0.3-JA14_DN50.csv
ETNP-SKQ17-243-965m-0.3-JA14_DN50_ptm.csv
ETNP-SKQ17-243-965m-0.3-JA14_DN50_stripped.csv
ETNP-SKQ17-243-965m-0.3-JA14_DN50_stripped_peptides.txt
ETNP-SKQ17-243-965m-0.3-JA14_DN50_totals.csv
ETNP-SKQ17-243-965m-0.3-JA14_DN80_stripped_peptides.txt
ETNP-SKQ17-273-965m-trap_DN50.csv
ETNP

In [61]:
# combine the totals csvs

#read the CSVs for each replicate into a datadrame we name with the running # using the pandas read_csv function
peaks231 = pd.read_csv("/home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt/ETNP-SKQ17-231-100m-0.3-JA2_DN50_totals.csv")
peaks233 = pd.read_csv("/home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt/ETNP-SKQ17-233-265m-0.3-JA4_DN50_totals.csv")
peaks243 = pd.read_csv("/home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt/ETNP-SKQ17-243-965m-0.3-JA14_DN50_totals.csv")
peaks378 = pd.read_csv("/home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt/ETNP-SKQ17-378-100m-trap_DN50_totals.csv")
peaks278 = pd.read_csv("/home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt/ETNP-SKQ17-278-265m-trap_DN50_totals.csv")
peaks273 = pd.read_csv("/home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt/ETNP-SKQ17-273-965m-trap_DN50_totals.csv")


frames = [peaks231, peaks233, peaks243, peaks378, peaks278, peaks273]

# concatenate dataframes
peakstot = pd.concat(frames, sort=False)
del peakstot['Unnamed: 0']

# make a column for running #
sample = ['100 m sus','265 m sus','965 m sus', '100 m trap', '265 m trap', '965 m trap']
  
peakstot['Sample'] = sample

peakstot.set_index('Sample')

col_name="Sample"
first_col = peakstot.pop(col_name)
peakstot.insert(0, col_name, first_col)

# write to a csv
peakstot.to_csv("/home/millieginty/Documents/git-repos/2017-etnp/data/pro2020/ETNP-SKQ17/PEAKS-PTMopt/ETNP-SKQ17-combine_totals.csv")

#look at the dataframe
peakstot.head(6)

Unnamed: 0,Sample,A,C,D,E,F,G,H,I,K,...,k-meth,r-meth,% C w/ carb.,% M w/ oxid,% N w/ deam,% Q w/ deam,% K w/ hydr,% P w/ hydr,% K w/ meth,% R w/ meth
0,100 m sus,3927,626,2010,2762,1551,2101,1048,0,5403,...,796,1047,1.0,0.43424,0.202742,0.029278,0.135665,0.389918,0.147326,0.287637
0,265 m sus,2930,571,1496,2010,1332,1596,1007,0,4594,...,735,1064,1.0,0.386171,0.192341,0.024651,0.142795,0.35241,0.159991,0.339178
0,965 m sus,986,232,527,683,456,521,446,0,1804,...,359,535,1.0,0.397866,0.198492,0.034335,0.20898,0.374486,0.199002,0.429719
0,100 m trap,1102,231,651,1071,266,546,234,0,1434,...,258,302,1.0,0.49505,0.255269,0.052747,0.154114,0.38725,0.179916,0.342792
0,265 m trap,5491,1881,5153,5959,2792,4386,2693,0,10971,...,1783,2960,1.0,0.409785,0.233269,0.026387,0.166165,0.39942,0.162519,0.491449
0,965 m trap,4262,1332,3400,4505,2686,2827,2200,0,9404,...,1433,2434,1.0,0.368797,0.217347,0.02902,0.163654,0.387119,0.152382,0.482649
