## Objective:
Manipulate Cytoscape (Shannon et al. 2003) csv output of GNPS molecular networking (Wang et al. 2016) to identify ions that are unique to a sample group triplicate. These then need to be identifed as True positives, False positives or contamination by extracting the EIC

## References
Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., ... & Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11), 2498-2504.
<br>
Wang, M., Carver, J. J., Phelan, V. V., Sanchez, L. M., Garg, N., Peng, Y., ... & Bandeira, N. (2016). Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nature biotechnology, 34(8), 828-837.

In [2]:
import os
print(os.getcwd())
import csv
import pandas as pd
import numpy as np

/Users/vincentn/Documents/PhD/Heterolgous_expression/Heterologous_expression_lassos_SMM_agar


# Functions to manipulate Cytoscape CSV output

In [4]:
def ImportAndStandardiseCSV(input_file):
    PosMS_df = pd.read_csv(str(os.getcwd())+"/"+input_file)
    PosMS_df2 = PosMS_df[["ATTRIBUTE_SampleGroup","ATTRIBUTE_SampleName","RTMean",
                      "precursor mass","ATTRIBUTE_NP-Class"]]
    PosMS_df2.columns = ['Sample Group','Triplicates','Mean RT/s','m/z','NP Class']
    return PosMS_df2

In [5]:
def MakeSingleGroupDf(input_df):
    single_group_df = input_df[input_df["Sample Group"].isin(all_cluster_list)]
    single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')
    return single_group_df

In [6]:
def MakeUniqueIonDf(single_group_df):
    unique_ion_df = single_group_df[single_group_df["sorted_Triplicates"].isin(all_nested_triplicates_list)]
    unique_ion_df = unique_ion_df.sort_values("Sample Group")
    return unique_ion_df


In [7]:
all_cluster_list = ["P5","P6","P7","P8","P9","P10","P11","P12","P13","P14","P15","poJ"]
all_triplicates_list = ["P5_1","P5_2","P5_3","P6_1","P6_2","P6_3","P7_1","P7_2","P7_3",
                    "P8_1","P8_2","P8_3","P9_1,P9_2,P9_3","P10_1,P10_2,P10_3",
                    "P11_1","P11_2","P11_3","P12_1","P12_2","P12_3",
                    "P13_1","P13_2","P13_3","P14_1","P14_2","P15_1","P15_2","P15_3",
                    "poJ_1","poJ_2","poJ_3"]
all_nested_triplicates_list = ["P5_1,P5_2,P5_3","P6_1,P6_2,P6_3","P7_1,P7_2,P7_3",
                               "P8_1,P8_2,P8_3","P9_1,P9_2,P9_3","P10_1,P10_2,P10_3",
                               "P11_1,P11_2,P11_3","P12_1,P12_2,P12_3","P13_1,P13_2,P13_3",
                           "P14_1,P14_2,P14_3","P15_1,P15_2,P15_3","poJ_1,poJ_2,poJ_3"]

## 20220905and20220927_Del14_all_lassos_repeat

In [29]:
PosMS_Del14_all_df = ImportAndStandardiseCSV("20220905and20220927_Del14_all_lassos_sample_groups.csv")
PosMS_Del14_all_df_single_group = MakeSingleGroupDf(PosMS_Del14_all_df)
PosMS_Del14_all_df_unique_ion = MakeUniqueIonDf(PosMS_Del14_all_df_single_group)
PosMS_Del14_all_df_unique_ion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
1692,P11,"P11_1,P11_2,P11_3",578.002333,319.119,Lassopeptide,"P11_1,P11_2,P11_3"
2639,P11,"P11_1,P11_2,P11_3",577.907645,366.175,Lassopeptide,"P11_1,P11_2,P11_3"
813,P11,"P11_1,P11_2,P11_3",1376.075,585.254,Lassopeptide,"P11_1,P11_2,P11_3"
1229,P11,"P11_1,P11_2,P11_3",660.884333,225.142,Lassopeptide,"P11_1,P11_2,P11_3"
2455,P11,"P11_1,P11_2,P11_3",457.510471,337.129,Lassopeptide,"P11_1,P11_2,P11_3"
1408,P11,"P11_1,P11_2,P11_3",462.3745,289.11,Lassopeptide,"P11_1,P11_2,P11_3"
2009,P12,"P12_3,P12_2,P12_1",1457.012,766.67,Lassopeptide,"P12_1,P12_2,P12_3"
2658,P12,"P12_3,P12_2,P12_1",1461.216,741.659,Lassopeptide,"P12_1,P12_2,P12_3"
2560,P13,"P13_1,P13_3,P13_2",379.1724,380.08,Lassopeptide,"P13_1,P13_2,P13_3"
2666,P14,"P14_2,P14_1,P14_3",1401.953333,740.655,Lassopeptide,"P14_1,P14_2,P14_3"


In [30]:
print(len(PosMS_Del14_all_df))
print(len(PosMS_Del14_all_df[PosMS_Del14_all_df["m/z"] > 500]))
print(len(PosMS_Del14_all_df[PosMS_Del14_all_df["m/z"] > 1000]))

2908
1317
76


In [31]:
PosMS_Del14_all_df_single_group[PosMS_Del14_all_df_single_group["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
2188,P11,P11_2,490.41925,1272.46,Lassopeptide,P11_2
949,P12,P12_1,1481.78,1159.97,Lassopeptide,P12_1
1484,P12,P12_1,1479.895,1159.97,Lassopeptide,P12_1
2894,P13,P13_2,1452.48,1813.44,Lassopeptide,P13_2
2371,P15,"P15_2,P15_1",741.6185,1139.4,Lassopeptide,"P15_1,P15_2"
633,P5,P5_2,1344.865,1216.04,Lassopeptide,P5_2
699,P5,P5_2,1327.905,1216.04,Lassopeptide,P5_2
2377,P5,P5_2,1359.075,1216.04,Lassopeptide,P5_2
1511,P6,P6_2,1294.82,1211.78,Lassopeptide,P6_2
2072,P7,P7_1,1485.855,1188.0,Lassopeptide,P7_1


## 20221021_Del14_all_lassos_neg_mzML

In [32]:
NegMS_Del14_all_df = ImportAndStandardiseCSV("20221021_Del14_all_lassos_neg_sample_groups.csv")
NegMS_Del14_all_df_single_group = MakeSingleGroupDf(NegMS_Del14_all_df)
NegMS_Del14_all_df_unique_ion = MakeUniqueIonDf(NegMS_Del14_all_df_single_group)
NegMS_Del14_all_df_unique_ion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
955,P10,"P10_2,P10_3,P10_1",277.62,401.163,Lassopeptide,"P10_1,P10_2,P10_3"
1336,P11,"P11_2,P11_3,P11_1",390.204833,433.085,Lassopeptide,"P11_1,P11_2,P11_3"
100,P13,"P13_3,P13_1,P13_2",673.4735,911.495,Lassopeptide,"P13_1,P13_2,P13_3"
950,P6,"P6_2,P6_3,P6_1",335.099357,721.392,Lassopeptide,"P6_1,P6_2,P6_3"
668,poJ,"poJ_3,poJ_2,poJ_1",1000.306667,321.205,Control,"poJ_1,poJ_2,poJ_3"
1185,poJ,"poJ_3,poJ_2,poJ_1",613.310857,362.167,Control,"poJ_1,poJ_2,poJ_3"


In [33]:
print(len(NegMS_Del14_all_df))
print(len(NegMS_Del14_all_df[NegMS_Del14_all_df["m/z"] > 500]))
print(len(NegMS_Del14_all_df[NegMS_Del14_all_df["m/z"] > 1000]))

1518
695
101


In [34]:
NegMS_Del14_all_df_single_group[NegMS_Del14_all_df_single_group["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
1244,P10,"P10_2,P10_1",381.167,1066.07,Lassopeptide,"P10_1,P10_2"
403,P10,"P10_2,P10_3",1116.015,1111.7,Lassopeptide,"P10_2,P10_3"
1189,P10,"P10_2,P10_1",550.838,1265.97,Lassopeptide,"P10_1,P10_2"
1152,P10,P10_3,223.2735,1134.47,Lassopeptide,P10_3
1430,P12,"P12_2,P12_3",273.994,1196.01,Lassopeptide,"P12_2,P12_3"
284,P13,"P13_3,P13_1",654.09,1055.5,Lassopeptide,"P13_1,P13_3"
742,P13,P13_3,1457.955,1065.92,Lassopeptide,P13_3
1030,P13,P13_3,654.397,1055.5,Lassopeptide,P13_3
909,P5,P5_3,257.7715,1089.46,Lassopeptide,P5_3
752,P8,"P8_2,P8_3",257.5005,1013.76,Lassopeptide,"P8_2,P8_3"


## 20220914and20220927_M1154_all_lassos

In [35]:
PosMS_M1154_all_df = ImportAndStandardiseCSV("20220914and20220927_M1154_all_lassos_sample_groups.csv")
PosMS_M1154_all_df_single_group = MakeSingleGroupDf(PosMS_M1154_all_df)
PosMS_M1154_all_df_unique_ion = MakeUniqueIonDf(PosMS_M1154_all_df_single_group)
PosMS_M1154_all_df_unique_ion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
463,P10,"P10_1,P10_2,P10_3",446.61175,606.367,Lassopeptide,"P10_1,P10_2,P10_3"
821,P10,"P10_1,P10_2,P10_3",794.728,586.298,Lassopeptide,"P10_1,P10_2,P10_3"
728,P13,"P13_2,P13_3,P13_1",1003.393333,141.109,Lassopeptide,"P13_1,P13_2,P13_3"
205,P15,"P15_1,P15_2,P15_3",1437.3135,740.654,Lassopeptide,"P15_1,P15_2,P15_3"
667,P15,"P15_1,P15_2,P15_3",1444.97,740.654,Lassopeptide,"P15_1,P15_2,P15_3"
266,P7,"P7_2,P7_3,P7_1",320.4155,446.133,Lassopeptide,"P7_1,P7_2,P7_3"
322,P7,"P7_2,P7_3,P7_1",1453.026667,822.731,Lassopeptide,"P7_1,P7_2,P7_3"
844,P7,"P7_2,P7_3,P7_1",1343.633333,579.52,Lassopeptide,"P7_1,P7_2,P7_3"
137,P9,"P9_1,P9_3,P9_2",1427.8525,122.096,Lassopeptide,"P9_1,P9_2,P9_3"
992,P9,"P9_1,P9_3,P9_2",932.177,294.205,Lassopeptide,"P9_1,P9_2,P9_3"


In [36]:
print(len(PosMS_M1154_all_df))
print(len(PosMS_M1154_all_df[PosMS_M1154_all_df["m/z"] > 500]))
print(len(PosMS_M1154_all_df[PosMS_M1154_all_df["m/z"] > 1000]))

1355
582
100


In [37]:
PosMS_M1154_all_df_single_group[PosMS_M1154_all_df_single_group["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
598,P10,P10_3,1177.355,1070.09,Lassopeptide,P10_3
384,P10,P10_2,1138.51,1070.09,Lassopeptide,P10_2
12,P13,P13_1,1374.385,1293.35,Lassopeptide,P13_1
18,P14,P14_3,1428.715,1440.38,Lassopeptide,P14_3
102,P7,P7_1,1463.1175,1295.35,Lassopeptide,P7_1
145,P9,"P9_1,P9_2",274.387667,1302.47,Lassopeptide,"P9_1,P9_2"
693,P9,P9_3,272.582,1302.46,Lassopeptide,P9_3
697,P9,"P9_1,P9_3",273.199,1302.46,Lassopeptide,"P9_1,P9_3"
991,P9,"P9_3,P9_2",271.314,1302.47,Lassopeptide,"P9_2,P9_3"
1334,P9,P9_2,275.12,1302.47,Lassopeptide,P9_2


## 20221021_M1154_all_lassos

In [38]:
NegMS_M1154_all_df = ImportAndStandardiseCSV("20221021_M1154_all_lassos_neg_sample_groups.csv")
NegMS_M1154_all_df_single_group = MakeSingleGroupDf(NegMS_M1154_all_df)
NegMS_M1154_all_df_unique_ion = MakeUniqueIonDf(NegMS_M1154_all_df_single_group)
NegMS_M1154_all_df_unique_ion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
88,P10,"P10_1,P10_3,P10_2",504.169583,361.092,Lassopeptide,"P10_1,P10_2,P10_3"
676,P10,"P10_1,P10_3,P10_2",285.994,1006.36,Lassopeptide,"P10_1,P10_2,P10_3"
602,P10,"P10_1,P10_3,P10_2",1115.592,1197.58,Lassopeptide,"P10_1,P10_2,P10_3"
1342,P10,"P10_1,P10_3,P10_2",308.104941,311.113,Lassopeptide,"P10_1,P10_2,P10_3"
408,P10,"P10_1,P10_3,P10_2",382.847125,378.079,Lassopeptide,"P10_1,P10_2,P10_3"
493,P10,"P10_1,P10_3,P10_2",1264.208125,384.933,Lassopeptide,"P10_1,P10_2,P10_3"
377,P10,"P10_1,P10_3,P10_2",1239.793333,649.508,Lassopeptide,"P10_1,P10_2,P10_3"
361,P10,"P10_1,P10_3,P10_2",178.0492,573.2,Lassopeptide,"P10_1,P10_2,P10_3"
240,P10,"P10_1,P10_3,P10_2",1359.390712,384.933,Lassopeptide,"P10_1,P10_2,P10_3"
190,P10,"P10_1,P10_3,P10_2",157.485588,456.116,Lassopeptide,"P10_1,P10_2,P10_3"


In [39]:
print(len(NegMS_M1154_all_df))
print(len(NegMS_M1154_all_df[NegMS_M1154_all_df["m/z"] > 500]))
print(len(NegMS_M1154_all_df[NegMS_M1154_all_df["m/z"] > 1000]))

1544
786
145


In [40]:
NegMS_M1154_all_df_single_group[NegMS_M1154_all_df_single_group["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
670,P10,P10_2,933.377,1063.05,Lassopeptide,P10_2
1001,P10,"P10_1,P10_3",1183.981667,1189.58,Lassopeptide,"P10_1,P10_3"
946,P10,P10_1,657.814,1033.98,Lassopeptide,P10_1
1467,P10,P10_1,947.7785,1063.55,Lassopeptide,P10_1
800,P10,"P10_1,P10_3",1115.93,1197.58,Lassopeptide,"P10_1,P10_3"
1136,P10,P10_1,884.0185,1099.06,Lassopeptide,P10_1
676,P10,"P10_1,P10_3,P10_2",285.994,1006.36,Lassopeptide,"P10_1,P10_2,P10_3"
1090,P10,"P10_1,P10_3",431.5275,1033.98,Lassopeptide,"P10_1,P10_3"
1031,P10,"P10_1,P10_3",180.3525,1033.98,Lassopeptide,"P10_1,P10_3"
1164,P10,P10_1,947.6855,1063.55,Lassopeptide,P10_1


## 20221001and20220927_M1152_all_lassos

In [36]:
PosMS_M1152_all_df = ImportAndStandardiseCSV("20221001and20220927_M1152_all_lassos_sample_groups.csv")
PosMS_M1152_all_df_single_group = MakeSingleGroupDf(PosMS_M1152_all_df)
PosMS_M1152_all_df_unique_ion = MakeUniqueIonDf(PosMS_M1152_all_df_single_group)
PosMS_M1152_all_df_unique_ion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
304,P12,"P12_1,P12_3,P12_2",132.78725,316.187,Lassopeptide,"P12_1,P12_2,P12_3"
557,P12,"P12_1,P12_3,P12_2",131.597333,316.186,Lassopeptide,"P12_1,P12_2,P12_3"
992,P7,"P7_2,P7_1,P7_3",491.352667,183.101,Lassopeptide,"P7_1,P7_2,P7_3"
235,P9,"P9_3,P9_1,P9_2",1074.1775,307.202,Lassopeptide,"P9_1,P9_2,P9_3"
305,P9,"P9_3,P9_1,P9_2",1048.961,307.202,Lassopeptide,"P9_1,P9_2,P9_3"
525,P9,"P9_3,P9_1,P9_2",1301.8775,178.106,Lassopeptide,"P9_1,P9_2,P9_3"
797,P9,"P9_3,P9_1,P9_2",873.802167,132.101,Lassopeptide,"P9_1,P9_2,P9_3"
1056,P9,"P9_3,P9_1,P9_2",1011.9944,132.101,Lassopeptide,"P9_1,P9_2,P9_3"
1161,P9,"P9_3,P9_1,P9_2",898.8856,132.101,Lassopeptide,"P9_1,P9_2,P9_3"
1431,P9,"P9_3,P9_1,P9_2",1062.6656,307.202,Lassopeptide,"P9_1,P9_2,P9_3"


In [37]:
print(len(PosMS_M1152_all_df))
print(len(PosMS_M1152_all_df[PosMS_M1152_all_df["m/z"] > 500]))
print(len(PosMS_M1152_all_df[PosMS_M1152_all_df["m/z"] > 1000]))

1487
532
63


In [38]:
PosMS_M1152_all_df_single_group[PosMS_M1152_all_df_single_group["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
1150,P13,P13_3,1416.035,1515.44,Lassopeptide,P13_3
201,P15,P15_2,1424.045,1739.48,Lassopeptide,P15_2
298,P15,"P15_1,P15_2",274.0295,1302.47,Lassopeptide,"P15_1,P15_2"
769,P5,"P5_3,P5_2",1224.05,1081.07,Lassopeptide,"P5_2,P5_3"
206,P5,P5_1,1402.909091,1072.31,Lassopeptide,P5_1
1334,P5,P5_1,1411.006429,1076.3,Lassopeptide,P5_1
823,P5,P5_1,1403.645,1075.3,Lassopeptide,P5_1
1327,P5,P5_1,1406.49,1075.3,Lassopeptide,P5_1
142,P6,P6_1,1411.665417,1147.32,Lassopeptide,P6_1
336,P6,P6_1,1419.208889,1146.32,Lassopeptide,P6_1


## 20221001_M1152_all_lassos_neg

In [39]:
NegMS_M1152_all_df = ImportAndStandardiseCSV("20221001_M1152_all_lassos_neg_sample_groups.csv")
NegMS_M1152_all_df_single_group = MakeSingleGroupDf(NegMS_M1152_all_df)
NegMS_M1152_all_df_unique_ion = MakeUniqueIonDf(NegMS_M1152_all_df_single_group)
NegMS_M1152_all_df_unique_ion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
254,P8,"P8_1,P8_3,P8_2",806.082667,1033.96,Lassopeptide,"P8_1,P8_2,P8_3"
791,poJ,"poJ_1,poJ_2,poJ_3",1155.03,1097.69,Control,"poJ_1,poJ_2,poJ_3"


In [40]:
print(len(NegMS_M1152_all_df))
print(len(NegMS_M1152_all_df[NegMS_M1152_all_df["m/z"] > 500]))
print(len(NegMS_M1152_all_df[NegMS_M1152_all_df["m/z"] > 1000]))

967
329
60


In [41]:
NegMS_M1152_all_df_single_group[NegMS_M1152_all_df_single_group["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
778,P11,P11_2,756.9435,1090.92,Lassopeptide,P11_2
130,P12,P12_1,239.9245,1037.33,Lassopeptide,P12_1
786,P13,P13_1,1428.095,1065.91,Lassopeptide,P13_1
471,P5,P5_2,202.8245,1052.05,Lassopeptide,P5_2
254,P8,"P8_1,P8_3,P8_2",806.082667,1033.96,Lassopeptide,"P8_1,P8_2,P8_3"
41,P9,P9_3,1428.99,1065.91,Lassopeptide,P9_3
691,P9,"P9_1,P9_2",1460.0,1065.91,Lassopeptide,"P9_1,P9_2"
461,poJ,"poJ_1,poJ_3",1146.7225,1097.69,Control,"poJ_1,poJ_3"
791,poJ,"poJ_1,poJ_2,poJ_3",1155.03,1097.69,Control,"poJ_1,poJ_2,poJ_3"
838,poJ,poJ_1,173.369,1048.01,Control,poJ_1


## 20221001_M1152_all_lassos_neg_MGFallpeaks

In [42]:
NegMS_M1152_all_df_MGFallpeaks = ImportAndStandardiseCSV("20221001_M1152_all_lassos_neg_MGFallpeaks_sample_groups.csv")
NegMS_M1152_all_df_single_group_MGFallpeaks = MakeSingleGroupDf(NegMS_M1152_all_df_MGFallpeaks)
NegMS_M1152_all_df_unique_ion_MGFallpeaks = MakeUniqueIonDf(NegMS_M1152_all_df_single_group_MGFallpeaks)
NegMS_M1152_all_df_unique_ion_MGFallpeaks

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
491,P8,"P8_2,P8_1,P8_3",48364.933333,1033.96,Lassopeptide,"P8_1,P8_2,P8_3"
529,poJ,"poJ_2,poJ_3,poJ_1",69301.666667,1097.69,Control,"poJ_1,poJ_2,poJ_3"


In [43]:
print(len(NegMS_M1152_all_df_MGFallpeaks))
print(len(NegMS_M1152_all_df_MGFallpeaks[NegMS_M1152_all_df_MGFallpeaks["m/z"] > 500]))
print(len(NegMS_M1152_all_df_MGFallpeaks[NegMS_M1152_all_df_MGFallpeaks["m/z"] > 1000]))

963
325
58


In [44]:
NegMS_M1152_all_df_single_group_MGFallpeaks[NegMS_M1152_all_df_single_group_MGFallpeaks["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
238,P11,P11_2,45416.616667,1090.92,Lassopeptide,P11_2
158,P12,P12_1,14395.5,1037.33,Lassopeptide,P12_1
363,P13,P13_1,85685.75,1065.91,Lassopeptide,P13_1
745,P5,P5_2,12169.5,1052.05,Lassopeptide,P5_2
491,P8,"P8_2,P8_1,P8_3",48364.933333,1033.96,Lassopeptide,"P8_1,P8_2,P8_3"
114,P9,P9_3,85739.4,1065.91,Lassopeptide,P9_3
792,P9,"P9_1,P9_2",87599.75,1065.91,Lassopeptide,"P9_1,P9_2"
299,poJ,poJ_1,10402.1,1048.01,Control,poJ_1
346,poJ,"poJ_3,poJ_1",68803.4,1097.69,Control,"poJ_1,poJ_3"
529,poJ,"poJ_2,poJ_3,poJ_1",69301.666667,1097.69,Control,"poJ_1,poJ_2,poJ_3"


## 20221021_TK24_all_lassos_pos_sample_groups

In [51]:
PosMS_TK24_all_df = ImportAndStandardiseCSV("20221021_TK24_all_lassos_pos_sample_groups.csv")
PosMS_TK24_all_df_single_group = MakeSingleGroupDf(PosMS_TK24_all_df)
PosMS_TK24_all_df_unique_ion = MakeUniqueIonDf(PosMS_TK24_all_df_single_group)
PosMS_TK24_all_df_unique_ion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
885,P13,"P13_3,P13_2,P13_1",1257.9775,141.112,Lassopeptide,"P13_1,P13_2,P13_3"
662,P15,"P15_1,P15_3,P15_2",1028.953333,949.212,Lassopeptide,"P15_1,P15_2,P15_3"
1193,P9,"P9_2,P9_1,P9_3",1407.483333,670.469,Lassopeptide,"P9_1,P9_2,P9_3"


In [52]:
print(len(PosMS_TK24_all_df))
print(len(PosMS_TK24_all_df[PosMS_TK24_all_df["m/z"] > 500]))
print(len(PosMS_TK24_all_df[PosMS_TK24_all_df["m/z"] > 1000]))

1309
627
59


In [53]:
PosMS_TK24_all_df_single_group[PosMS_TK24_all_df_single_group["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
916,P12,P12_2,1420.66,1588.45,Lassopeptide,P12_2
1063,poJ,poJ_3,1479.145,1740.49,Control,poJ_3


## 20221021_TK24_all_lassos_neg_sample_groups

In [48]:
NegMS_TK24_all_df = ImportAndStandardiseCSV("20221021_TK24_all_lassos_neg_sample_groups.csv")
NegMS_TK24_all_df_single_group = MakeSingleGroupDf(NegMS_TK24_all_df)
NegMS_TK24_all_df_unique_ion = MakeUniqueIonDf(NegMS_TK24_all_df_single_group)
NegMS_TK24_all_df_unique_ion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  single_group_df["sorted_Triplicates"] = single_group_df.Triplicates.str.split(',').apply(sorted, 1).str.join(',').str.strip(',')


Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
729,P11,"P11_2,P11_1,P11_3",306.638333,351.078,Lassopeptide,"P11_1,P11_2,P11_3"
15,P14,"P14_1,P14_3,P14_2",524.488,181.083,Lassopeptide,"P14_1,P14_2,P14_3"
187,poJ,"poJ_1,poJ_2,poJ_3",1442.879333,118.921,Control,"poJ_1,poJ_2,poJ_3"
294,poJ,"poJ_1,poJ_2,poJ_3",847.383333,394.887,Control,"poJ_1,poJ_2,poJ_3"
411,poJ,"poJ_1,poJ_2,poJ_3",677.011314,394.887,Control,"poJ_1,poJ_2,poJ_3"
551,poJ,"poJ_1,poJ_2,poJ_3",677.493167,394.887,Control,"poJ_1,poJ_2,poJ_3"


In [49]:
print(len(NegMS_TK24_all_df))
print(len(NegMS_TK24_all_df[NegMS_TK24_all_df["m/z"] > 500]))
print(len(NegMS_TK24_all_df[NegMS_TK24_all_df["m/z"] > 1000]))

821
364
62


In [50]:
NegMS_TK24_all_df_single_group[NegMS_TK24_all_df_single_group["m/z"] > 1000].sort_values("Sample Group")

Unnamed: 0,Sample Group,Triplicates,Mean RT/s,m/z,NP Class,sorted_Triplicates
103,P5,P5_1,408.486,1033.97,Lassopeptide,P5_1
132,P5,"P5_2,P5_1",286.3445,1033.97,Lassopeptide,"P5_1,P5_2"
581,P5,P5_1,619.1885,1033.97,Lassopeptide,P5_1
613,P5,P5_1,524.584,1033.97,Lassopeptide,P5_1
200,poJ,poJ_3,182.921,1033.97,Control,poJ_3
360,poJ,poJ_3,580.8235,1033.97,Control,poJ_3
481,poJ,poJ_3,1460.97,1033.97,Control,poJ_3
