# pgfinder End-to-end Demo

## Introduction
This is a demo of running pgfinder end-to-end with data held in the [pgfinder Repository on GitHub](https://github.com/Mesnage-Org/Mass-Spec-pgfinder-Analysis/). This notebook runs on the latest release. Please review the [release notes](https://github.com/Mesnage-Org/pgfinder/releases).

## How to run it:
If you're running this via `mybinder` click the `Cell` menu, then `Run All`.

Import required libraries:

In [1]:
from ipysheet import from_dataframe
import pandas as pd
import pgfinder.matching as matching
import pgfinder.pgio as pgio

In [2]:
masses_file_name = "data/masses/e_coli_monomer_masses.csv"
mq_filepath = "data/maxquant_test_data.txt"
output_file_name = "eg_output.csv"

Read and display monoisotopic mass list (first 10 rows):

In [5]:
theo_masses = pgio.theo_masses_reader(masses_file_name)
display(from_dataframe(theo_masses.head(10)))

[Fri, 27 May 2022 12:07:00] [INFO    ] [pgfinder] Theoretical masses loaded from      : data/masses/e_coli_monomer_masses.csv


Unnamed: 0,Structure,Monoisotopicmass
0,gm|0,498.2061
1,gm (x2)|0,976.386
2,gm (x3)|0,1454.5659
3,gm (x4)|0,1932.7458
4,gm (x5)|0,2410.9257
5,gm (x6)|0,2889.1056
6,gm (x7)|0,3367.2855
7,gm-AE|1,698.2858
8,gm-AEJ|1,870.3706
9,gm-AEJA|1,941.4077


Read and display ftrs list (first 10 rows):

In [6]:
raw_data = pgio.ms_file_reader(mq_filepath)
display(from_dataframe(raw_data.head(10)))

[Fri, 27 May 2022 12:09:51] [INFO    ] [pgfinder] Mass spectroscopy file loaded from : data/maxquant_test_data.txt


Out of range float values are not JSON compliant
Supporting this message is deprecated in jupyter-client 7, please make sure your message is JSON-compliant
  content = self.pack(content)


Sheet(cells=(Cell(column_end=0, column_start=0, numeric_format='0[.]0', row_end=9, row_start=0, squeeze_row=Fa…

Run matching:

In [7]:
mod_test = ['Sodium','Nude', 'DeAc']
results = matching.data_analysis(raw_data, theo_masses, 0.5, mod_test, 10)

[Fri, 27 May 2022 12:09:54] [INFO    ] [pgfinder] Filtering theoretical masses by observed masses


[Fri, 27 May 2022 12:09:57] [INFO    ] [pgfinder] Building custom search file


[Fri, 27 May 2022 12:09:57] [INFO    ] [pgfinder] Generating variants


[Fri, 27 May 2022 12:09:57] [INFO    ] [pgfinder] Matching


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] Cleaning data


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] Processing 182 Sodium Adducts


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 2695 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 6966 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 3264 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 3264 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 3264 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 3390 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 3390 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 6377 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 6377 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 6377 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 6377 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 6377 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 6377 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 6377 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] drop idx: 3026 has already been removed


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] No ^K+ found


[Fri, 27 May 2022 12:10:00] [INFO    ] [pgfinder] No ^m found


Read and display results (first 10 rows):

In [9]:
display(from_dataframe(results.head(10)))

Sheet(cells=(Cell(column_end=0, column_start=0, numeric_format='0[.]0', row_end=9, row_start=0, squeeze_row=Fa…

Unnamed: 0,ID,rt,rt_length,mwMonoisotopic,theo_mwMonoisotopic,inferredStructure,maxIntensity
2816,2816,11.38,35.81,963.38822,963.3896,Na+ gm-AEJA|1,20043.0
2955,2955,17.497,12.875,963.38796,963.3896,Na+ gm-AEJA|1,6282.6
3626,3626,16.502,14.596,1154.4543,1154.4521,Na+ gm-AEJFD|1,105260.0
8285,8285,16.502,6.129,1154.4624,1154.4521,Na+ gm-AEJFD|1,7594.0
3744,3744,19.998,13.697,1154.4547,1154.4521,Na+ gm-AEJFD|1,23646.0
3711,3711,19.103,9.141,1191.5295,1191.5205,Na+ gm-AEJIW|1,17434.0
8156,8156,15.26,36.38,991.42483,"991.4252,991.4248","Na+ gm-AEJV|1,gm-AEJY (Deacetyl) |1",628180.0
2857,2857,15.26,19.878,991.42425,"991.4252,991.4248","Na+ gm-AEJV|1,gm-AEJY (Deacetyl) |1",347940.0
3430,3430,9.099,11.976,1183.5189,"1183.5121,1183.5122","Na+ gm-AEJYE|1,Na+ gm-AEJYK|1",396600.0
2375,2375,4.741,5.399,934.37543,934.3755,gm (x2) (Deacetyl) |0,6300.1


Save output:

In [None]:
results.to_csv(output_file_name)