## `A Jupyter notebook for SIRIUS and FBMN integration.`

#### `1) Load Feature matrix and annotate with scan numbers (will be used to match SIRIUS, Spectral Match and GNPS annotations")`
This notebook requires a .graphml file generated from FBMN. Once you run the job, using the GNPSexport files, save the graphml file under the directory results/GNPSexport and run the following cells to add information to the cytoscape file:

In [None]:
# Integrating into Graphml
import requests
import pandas as pd
import networkx as nx
import glob
import os
import sys
from pyteomics import mgf, auxiliary
from src.export_feature_matrix import export

Load feature matrix.

In [None]:
matrix= pd.read_csv(os.path.join("results", "interim", "FeatureMatrix.tsv"), sep="\t")
matrix["id"]= matrix["id"].astype(str)
matrix

Turn the mgf file to a dataframe to match the feature IDs with scan numbers and annotate feature matrix with scan numbers.

In [None]:
path= os.path.join("results", "GNPSexport", "MSMS.mgf")
file= mgf.MGF(source=path, use_header=True, convert_arrays=2, read_charges=True, read_ions=False, dtype=None, encoding=None)
parameters=[]
for spectrum in file:
    parameters.append(spectrum['params'])
mgf_file= pd.DataFrame(parameters)
mgf_file["feature_id"]= mgf_file["feature_id"].str.replace(r"e_", "")


# display(mgf_file)

matrix["SCANS"] = ""
for i, id in zip(matrix.index, matrix["id"]):
    hits = []
    for scan, feature_id in zip(mgf_file["scans"], mgf_file["feature_id"]): 
        if feature_id==id:
            hit = f"{scan}"
            if hit not in hits:
                hits.append(hit)
    matrix.loc[i, "SCANS"] = " ## ".join(hits)

# display(matrix)

#### `2) SIRIUS, CSI:FingerID, CANOPUS and in-house Spectral Matching integration to GraphML file`
This notebook requires a .graphml file generated from FBMN. Once you run the job, using the GNPSexport files, save the graphml file under the directory results/GNPSexport and run the following cells to add information to the cytoscape file:

Add the SIRIUS, CSI:FingerID, CANOPUS and in-house spectral matching information at the graphml file from FBMN:

In [None]:
file_list= glob.glob(os.path.join("resources", "*.graphml"))

for file in file_list:
    G = nx.read_graphml(file)
    for i, row in matrix.iterrows():
        scans = [s for s in row["SCANS"].split("#") if s]
        if scans:
            for term in ["SIRIUS_molecularFormula",
                         "SIRIUS_explainedIntensity",
                         "CSI:FingerID_molecularFormula",
                         "CSI:FingerID_name",
                         "CSI:FingerID_InChI",
                         "CSI:FingerID_smiles",
                         "CANOPUS_pathway",
                         "CANOPUS_superclass",
                         "CANOPUS_class",
                         "CANOPUS_most specific class",
                         "SpectralMatch",
                         "SpectralMatch_smiles"]:
                for col in [col for col in matrix.columns if col.endswith(term)]:
                    if not pd.isna(row[col]):
                        for scan in scans:
                            if scan in G.nodes:
                                G.nodes[scan][col] = str(row[col])

    nx.write_graphml(G, os.path.join("results", "FBMN_SIRIUS-CSI-CANOPUS_SpectralMatches.graphml"))

#### `2) Add GNPS library hits to Feature matrix`

This step is optional in case the user does not have an MGF file downloaded (or if they want to add complementary MSMS library matches) requires the library results `.tsv` file (downloadable in GNPS result dashboard). Move the `.tsv` table into the resources directory.

In [None]:
file_list = glob.glob(os.path.join("resources", "*.tsv"))

for file in file_list:
    df= pd.read_csv(file, sep="\t")
    df.drop(df.index[df['IonMode'] == "negative"], inplace=True)
    # df.drop(df.index[df['MZErrorPPM'] > 10.0], inplace=True)
    GNPS = df.drop_duplicates(subset="Compound_Name", keep='first')
GNPS.head()

Add the GNPS matches to the matrix:

In [None]:
gnps = []
for i, row in matrix.iterrows():
        scans = [s for s in row["SCANS"].split("#") if s]
        hits = []
        if scans:
            for scan in scans:
                hits.append("##".join(GNPS[GNPS["#Scan#"] == int(scan)]["Compound_Name"].tolist()))
        gnps.append("##".join(hits))
matrix["GNPS"] = gnps

export(matrix)

matrix