# Preprocessing for Pancreatic cancer - Eduati 2020

Model from the paper: doi: 10.15252/msb.209690 (link: https://www.ncbi.nlm.nih.gov/pubmed/32073727)

## input file
`Eduati2020.sbml` file was obtained using CellNOptR by converting the [SIF file](https://github.com/saezlab/ModelingMPS/blob/master/data/PKN_curated.sif) provided by the author.  
  
CellNOptR doesn't support convert .sif file directly to .txt file with the rules, so I use the Colomoto functions here to make the conversion.  
  
(I've also tried to use GINsim to convert SBML-qual to txt(bnet),  but the resulting file does not look good.)

In [7]:
# Using Colomoto to convert format
# colomoto-docker --bind .
import biolqm
import ginsim
from pyboolnet.file_exchange import bnet2primes, primes2bnet
import re
from rpy2.robjects.packages import importr
boolnet = importr("BoolNet")

INFO cffi mode is CFFI_MODE.ANY
INFO R home found: /opt/conda/lib/R
INFO R library path: 
INFO LD_LIBRARY_PATH: 
INFO Default options to initialize R: rpy2, --quiet, --no-save
INFO R is already initialized. No need to initialize.


In [4]:
model_lqm = biolqm.load("input_files/Eduati2020.sbml")
model_lrg = biolqm.to_ginsim(model_lqm)
ginsim.show(model_lrg)

In [13]:
model_bnet = primes2bnet(biolqm.to_pyboolnet(model_lqm))
model_bnet

'EGF,          \nEGFR,         \nPDPK1,        \nPDPKa,        \nTNF,          \nTNFRs,        \n\nA20,          NFkB\nAPC,          Apaf1 & Cas9 & Mito | !cIAPs\nAkt,          AktM & AktP\nAktM,         PDPK1 & PIP3\nAktP,         PDPK1 & PIP3\nApaf1,        p53\nBAD,          p53 | !RSK | !Akt\nBID,          JNK & p53 | Cas8 & p53 | !BclX\nBclX,         !p53 | STAT | NFkB | !BAD\nCas12,        Cas7\nCas3,         !cIAPs | Cas8 | Cas6 | APC\nCas6,         !cIAPs | Cas3\nCas7,         !cIAPs | Cas8 | APC\nCas8,         complexIIB | complexIIA | Cas6\nCas9,         !cIAPs | !ERK | Cas3 | Cas12 | !Akt\nERK,          MEK\nIKKs,         complexI | PDPK1 | Akt | !A20\nIkB,          NFkB | !IKKs\nJAK,          EGFR\nJNK,          JNKK\nJNKK,         MEKK1 | !Akt\nMEK,          RAF | PDPK1\nMEKK1,        complexI | RAS\nMdm2,         p53 | Akt\nMito,         !BclX | BID\nNFkB,         NIK | !IkB\nNIK,          !cIAPs\nPI3K,         RAS | EGFR\nPIP3,         !PTEN | PI3K\nPTEN,         p53\nRA

In [14]:
# Replace ',' with '='
model_bnm = model_bnet.replace(',', '=')

# Save to a text file
with open("input_files/Eduati2020_raw.txt", "w") as file:
    file.write(model_bnm)

print(model_bnm)

EGF=          
EGFR=         
PDPK1=        
PDPKa=        
TNF=          
TNFRs=        

A20=          NFkB
APC=          Apaf1 & Cas9 & Mito | !cIAPs
Akt=          AktM & AktP
AktM=         PDPK1 & PIP3
AktP=         PDPK1 & PIP3
Apaf1=        p53
BAD=          p53 | !RSK | !Akt
BID=          JNK & p53 | Cas8 & p53 | !BclX
BclX=         !p53 | STAT | NFkB | !BAD
Cas12=        Cas7
Cas3=         !cIAPs | Cas8 | Cas6 | APC
Cas6=         !cIAPs | Cas3
Cas7=         !cIAPs | Cas8 | APC
Cas8=         complexIIB | complexIIA | Cas6
Cas9=         !cIAPs | !ERK | Cas3 | Cas12 | !Akt
ERK=          MEK
IKKs=         complexI | PDPK1 | Akt | !A20
IkB=          NFkB | !IKKs
JAK=          EGFR
JNK=          JNKK
JNKK=         MEKK1 | !Akt
MEK=          RAF | PDPK1
MEKK1=        complexI | RAS
Mdm2=         p53 | Akt
Mito=         !BclX | BID
NFkB=         NIK | !IkB
NIK=          !cIAPs
PI3K=         RAS | EGFR
PIP3=         !PTEN | PI3K
PTEN=         p53
RAF=          RAS
RAS=          SOS
RSK=

I realized that the SBML file that I obtained from CellNOptR contains some errors, making the rules disappeared for several nodes (e.g., EGF and EGFR). Therefore, I manually added them back and save the file as 'Eduati2020.txt'.

In [16]:
# Formating the model file to comply with BNMPy
with open("input_files/Eduati2020_checked.txt", "r") as file:
    content = file.readlines()

processed_lines = []
processed_lines_boolnet = ['targets, factors']
for line in content:
    line = line.strip()  # Remove leading/trailing spaces
    # Ensure spaces around '='
    line = re.sub(r'\s*=\s*', ' = ', line)
    # Ensure spaces around !, &, |, (, )
    line = re.sub(r'(?<=[^\s!&|()])([!&|()])', r' \1', line)  # Add space before operators
    line = re.sub(r'([!&|()])(?=[^\s!&|()])', r'\1 ', line)  # Add space after operators
    # Remove multiple spaces but keep single spaces
    line = re.sub(r'\s+', ' ', line).strip()

    # Also save another boolnet format file
    line_boolnet = line.replace('=', ',')
    processed_lines.append(line)
    processed_lines_boolnet.append(line_boolnet)

cleaned_content = "\n".join(processed_lines)
cleaned_content_boolnet = "\n".join(processed_lines_boolnet)

print(cleaned_content)

EGF = EGF
EGFR = EGF
PDPK1 = PDPKa
PDPKa = PDPKa
TNF = TNF
TNFRs = NFkB | TNF
A20 = NFkB
APC = Apaf1 & Cas9 & Mito | ! cIAPs
Akt = AktM & AktP
AktM = PDPK1 & PIP3
AktP = PDPK1 & PIP3
Apaf1 = p53
BAD = p53 | ! RSK | ! Akt
BID = JNK & p53 | Cas8 & p53 | ! BclX
BclX = ! p53 | STAT | NFkB | ! BAD
Cas12 = Cas7
Cas3 = ! cIAPs | Cas8 | Cas6 | APC
Cas6 = ! cIAPs | Cas3
Cas7 = ! cIAPs | Cas8 | APC
Cas8 = complexIIB | complexIIA | Cas6
Cas9 = ! cIAPs | ! ERK | Cas3 | Cas12 | ! Akt
ERK = MEK
IKKs = complexI | PDPK1 | Akt | ! A20
IkB = NFkB | ! IKKs
JAK = EGFR
JNK = JNKK
JNKK = MEKK1 | ! Akt
MEK = RAF | PDPK1
MEKK1 = complexI | RAS
Mdm2 = p53 | Akt
Mito = ! BclX | BID
NFkB = NIK | ! IkB
NIK = ! cIAPs
PI3K = RAS | EGFR
PIP3 = ! PTEN | PI3K
PTEN = p53
RAF = RAS
RAS = SOS
RSK = ERK
SOS = ! ERK | EGFR
STAT = JAK
cFLIP = NFkB
cIAPs = ! Cas3 & ! Cas6 | NFkB | ! Mito
complexI = TNFRs
complexIIA = complexI | ! cFLIP
complexIIB = complexI | ! cIAPs
p53 = ! Mdm2 | JNK | ERK


In [17]:
# Save the cleaned content back to a file
with open("input_files/Eduati2020.txt", "w") as file:
    file.write(cleaned_content)

with open("input_files/Eduati2020_boolnet.txt", "w") as file:
    file.write(cleaned_content_boolnet)

Save the checked model to SBML file

In [18]:
net = boolnet.loadNetwork("input_files/Eduati2020_boolnet.txt")
boolnet.toSBML(net, "input_files/Eduati2020_checked.sbml")

0
0


In [19]:
model_lqm = biolqm.load("input_files/Eduati2020_checked.sbml")
model_lrg = biolqm.to_ginsim(model_lqm)
ginsim.show(model_lrg)

### output a csv for BioTapestry visualization

In [2]:
import pandas as pd
import re

def parse_boolean_network(file_path):
    # Initialize lists to store the relationships
    sources = []
    targets = []
    signs = []
    
    with open(file_path, 'r') as file:
        for line in file:
            # Skip empty lines
            if line.strip() == '':
                continue
                
            # Parse the rule
            parts = line.strip().split('=', 1)
            if len(parts) != 2:
                continue
                
            target = parts[0].strip()
            rule = parts[1].strip()
            
            # Split the rule by OR conditions
            or_parts = rule.split('|')
            
            for or_part in or_parts:
                # Split the AND conditions
                and_parts = or_part.split('&')
                
                for part in and_parts:
                    part = part.strip()
                    
                    # Skip empty parts
                    if not part:
                        continue
                        
                    # Check if the part is negated
                    is_negated = '!' in part
                    
                    # Remove operators and whitespace to get the source
                    source = re.sub(r'!|\s', '', part)
                    
                    # Skip if empty after removing operators
                    if not source:
                        continue
                        
                    # Add the relationship
                    sources.append(source)
                    targets.append(target)
                    signs.append('negative' if is_negated else 'positive')
    
    # Create and return the DataFrame
    df = pd.DataFrame({
        'Source Name': sources,
        'Target Name': targets,
        'Sign': signs
    })
    
    return df

# Parse the network and save to CSV
file_path = "input_files/Eduati2020.txt"
network_df = parse_boolean_network(file_path)

# Save to CSV
output_path = "Eduati2020_interactions.csv"
network_df.to_csv(output_path, index=False)

print(f"Network parsed and saved to {output_path}")
print(f"Total interactions found: {len(network_df)}")
print(network_df.head())

Network parsed and saved to Eduati2020_interactions.csv
Total interactions found: 95
  Source Name Target Name      Sign
0         EGF         EGF  positive
1         EGF        EGFR  positive
2       PDPKa       PDPK1  positive
3       PDPKa       PDPKa  positive
4         TNF         TNF  positive


## Standardize the model

In [1]:
with open('../input_files/Eduati2020.txt', 'r') as f:
    original_string = f.read()
print(original_string)

EGF = EGF
EGFR = EGF
PDPK1 = PDPK1
TNF = TNF
TNFRs = NFkB | TNF
A20 = NFkB
APC = Apaf1 & Cas9 & Mito | ! cIAPs
Akt = AktM & AktP
AktM = PDPK1 & PIP3
AktP = PDPK1 & PIP3
Apaf1 = p53
BAD = p53 | ! RSK | ! Akt
BID = JNK & p53 | Cas8 & p53 | ! BclX
BclX = ! p53 | STAT | NFkB | ! BAD
Cas12 = Cas7
Cas3 = ! cIAPs | Cas8 | Cas6 | APC
Cas6 = ! cIAPs | Cas3
Cas7 = ! cIAPs | Cas8 | APC
Cas8 = complexIIB | complexIIA | Cas6
Cas9 = ! cIAPs | ! ERK | Cas3 | Cas12 | ! Akt
ERK = MEK
IKKs = complexI | PDPK1 | Akt | ! A20
IkB = NFkB | ! IKKs
JAK = EGFR
JNK = JNKK
JNKK = MEKK1 | ! Akt
MEK = RAF | PDPK1
MEKK1 = complexI | RAS
Mdm2 = p53 | Akt
Mito = ! BclX | BID
NFkB = NIK | ! IkB
NIK = ! cIAPs
PI3K = RAS | EGFR
PIP3 = ! PTEN | PI3K
PTEN = p53
RAF = RAS
RAS = SOS
RSK = ERK
SOS = ! ERK | EGFR
STAT = JAK
cFLIP = NFkB
cIAPs = ! Cas3 & ! Cas6 | NFkB | ! Mito
complexI = TNFRs
complexIIA = complexI | ! cFLIP
complexIIB = complexI | ! cIAPs
p53 = ! Mdm2 | JNK | ERK


In [2]:
from BNMPy import BMatrix
mapping = '../input_files/Eduati2020_curation.xlsx'
new_string = BMatrix.rename_nodes(original_string, mapping)
print(new_string)

EGF = EGF
EGFR = EGF
PDPK1 = PDPK1
TNF = TNF
TNFRSF1A = NFKB1 | TNF
TNFAIP3 = NFKB1
APC = APAF1 & CASP9 & Mito | ! BIRC2
AKT1 = AktM & AktP
AktM = PDPK1 & PIP3
AktP = PDPK1 & PIP3
APAF1 = TP53
BAD = TP53 | ! RPS6KA1 | ! AKT1
BID = MAPK8 & TP53 | CASP8 & TP53 | ! BCL2L1
BCL2L1 = ! TP53 | STAT3 | NFKB1 | ! BAD
CASP12 = CASP7
CASP3 = ! BIRC2 | CASP8 | CASP6 | APC
CASP6 = ! BIRC2 | CASP3
CASP7 = ! BIRC2 | CASP8 | APC
CASP8 = (RIPK1 & TRADD & FADD & CASP8 & CFLAR) | (TRADD & RIPK1 & FADD & CASP8) | CASP6
CASP9 = ! BIRC2 | ! MAPK1 | CASP3 | CASP12 | ! AKT1
MAPK1 = MAP2K1
IKBKB = (RIPK1 & TRADD & TRAF2) | PDPK1 | AKT1 | ! TNFAIP3
NFKBIA = NFKB1 | ! IKBKB
JAK1 = EGFR
MAPK8 = MAP2K4
MAP2K4 = MAP3K1 | ! AKT1
MAP2K1 = RAF1 | PDPK1
MAP3K1 = (RIPK1 & TRADD & TRAF2) | KRAS
MDM2 = TP53 | AKT1
Mito = ! BCL2L1 | BID
NFKB1 = MAP3K14 | ! NFKBIA
MAP3K14 = ! BIRC2
PIK3CA = KRAS | EGFR
PIP3 = ! PTEN | PIK3CA
PTEN = TP53
RAF1 = KRAS
KRAS = SOS1
RPS6KA1 = MAPK1
SOS1 = ! MAPK1 | EGFR
STAT3 = JAK1
CFLAR = NFKB1

In [None]:
with open('../input_files/Eduati2020_standardized.txt', 'w') as f:
    f.write(new_string)