# Phenotype Score Calculation

This notebook demonstrates how to calculate phenotype scores using different types of simulation results:
1. BN steady state results (dictionary with fixed points and cyclic attractors)
2. PBN steady state results (numpy array)
3. Network update/simulation results (numpy array)
4. Pandas Series/DataFrame

The phenotype scoring uses ProxPath to determine gene-phenotype relationships and calculates scores based on simulation results.


In [1]:
import sys
import os
import numpy as np
import pandas as pd

sys.path.append('./src')
import BNMPy

## Available phenotypes

First, let's see what phenotypes are available in SIGNOR.

In [2]:
# Get available phenotypes
BNMPy.get_phenotypes()

There are 201 phenotypes
There are 4905 genes
Available phenotypes: ['ACROSOME_ASSEMBLY' 'ACTIN_CYTOSKELETON_REORGANIZATION'
 'ACTION_POTENTIAL_' 'ADIPOGENESIS' 'ALTERNATIVE_SPLICING_REGULATION'
 'AMYLOID_FIBRIL_FORMATION' 'ANGIOGENESIS' 'APOPTOSIS' 'ARDS'
 'AUTOPHAGOSOME_FORMATION' 'AUTOPHAGY' 'AXONAL_GROWTH_CONE_FORMATION'
 'B_CELL_MATURATION' 'B-LYMPHOCYTE_DIFF' 'BASOPHIL_DIFF'
 'BONE_MINERALIZATION' 'BROWN_ADIPOGENESIS' 'CARTILAGE_DEVELOPMENT'
 'CELL_ADHESION' 'CELL_CYCLE_BLOCK' 'CELL_CYCLE_EXIT'
 'CELL_CYCLE_PROGRESS_' 'CELL_DEATH' 'CELL_GROWTH' 'CELL_KILLING'
 'CELL_MIGRATION' 'CELL_POLARITY' 'CELL_SHAPE' 'CENTROMERE_ASSEMBLY'
 'CENTROSOME_SEPARATION' 'CEREBRAL_CORTEX_DEVELOPMENT'
 'CHAPERONE-MEDIATED_AUTOPHAGY' 'CHAPERONE-MEDIATED_PROTEIN_FOLDING'
 'CHEMOATTRACTION_OF_AXON' 'CHEMOREPULSION_OF_AXON' 'CHEMOTAXIS'
 'CHROMATINE_CONDENSATION' 'CHROMOSOME_SEGREGATION' 'CILIUM_ASSEMBLY'
 'CILIUM_MOVEMENT' 'CITRIC_ACID_CYCLE'
 'CLEARANCE_OF_FOREIGN_INTRACELLULAR_DNA' 'COLLOID' 'CYTOKINE

## Get phenotype score formulas

Get the calculation formulas of the phenotype scores using a list of genes and phenotypes.

In [3]:
# Get formulas without simulation results
formulas = BNMPy.phenotype_scores(
    genes=['TP53', 'MYC', 'BCL2','CASP3','PI3K','PTEN','KRAS','EGFR'],
    phenotypes=['APOPTOSIS', 'PROLIFERATION', 'DIFFERENTIATION'],
    simulation_results=None
)

print("\nPhenotype score formulas:")
for phenotype, formula in formulas.items():
    print(f"{phenotype}: {formula}")

EGFR has dual effects on PROLIFERATION
Path found for 3 phenotypes: ['APOPTOSIS' 'PROLIFERATION' 'DIFFERENTIATION']

Phenotype score formulas:
APOPTOSIS: - BCL2 + CASP3 -  EGFR -  KRAS -  PI3K + PTEN + TP53
PROLIFERATION: - CASP3 + EGFR + KRAS + MYC + PI3K -  TP53
DIFFERENTIATION: MYC + TP53


## ProxPath gene-phenotype relationships

You can also explore the underlying ProxPath relationships.

In [4]:
# Get detailed ProxPath information
pheno_df = BNMPy.proxpath(
    genes=['TP53', 'MYC', 'BCL2','CASP3','PI3K','PTEN','KRAS','EGFR'],
    phenotypes=['APOPTOSIS']
)
# this displays the shortest path from the gene to the phenotype
pheno_df

Path found for 1 phenotypes: ['APOPTOSIS']


Unnamed: 0,EndPathways,QueryNode,EndNode,Path_String,relations_path,Path_Score,Path_Length,Final_Effect,Effect,n,mean,sd,zscore
2913,APOPTOSIS,BCL2,APOPTOSIS,BCL2--|APOPTOSIS,SIGNOR-249611,0.3,1,-1,down-regulates,75493,1.737625,0.526944,-2.728233
3557,APOPTOSIS,CASP3,APOPTOSIS,CASP3-->APOPTOSIS,SIGNOR-89244,0.3,1,1,up-regulates,75493,1.737625,0.526944,-2.728233
3312,APOPTOSIS,EGFR,APOPTOSIS,EGFR-->STAT5A--|APOPTOSIS,SIGNOR-146852;SIGNOR-256583,0.48,2,-1,down-regulates,75493,1.737625,0.526944,-2.386641
3539,APOPTOSIS,KRAS,APOPTOSIS,KRAS-->PIK3CA-->AKT--|APOPTOSIS,SIGNOR-175204;SIGNOR-244429;SIGNOR-260215,0.6,3,-1,down-regulates,75493,1.737625,0.526944,-2.158912
3531,APOPTOSIS,PI3K,APOPTOSIS,PI3K-->AKT--|APOPTOSIS,SIGNOR-254950;SIGNOR-260215,0.531,2,-1,down-regulates,75493,1.737625,0.526944,-2.289856
3322,APOPTOSIS,PTEN,APOPTOSIS,PTEN--|PIP3-->AKT--|APOPTOSIS,SIGNOR-228145;SIGNOR-236490;SIGNOR-260215,0.7,3,1,up-regulates,75493,1.737625,0.526944,-1.969139
2894,APOPTOSIS,TP53,APOPTOSIS,TP53-->APOPTOSIS,SIGNOR-255678,0.3,1,1,up-regulates,75493,1.737625,0.526944,-2.728233


## Boolean network example

Compute steady state of BN and calculate phenotype scores.

In [8]:
# load the network from a file

file = '../input_files/Eduati2020.txt'
bn = BNMPy.load_network(file)
print(f"network genes: {bn.nodeDict.keys()}")

No initial state provided, using a random initial state
Network loaded successfully. There are 47 genes in the network.
network genes: dict_keys(['AKT1', 'APAF1', 'APC', 'AktM', 'AktP', 'BAD', 'BCL2L1', 'BID', 'BIRC2', 'CASP12', 'CASP3', 'CASP6', 'CASP7', 'CASP8', 'CASP9', 'CFLAR', 'EGF', 'EGFR', 'FADD', 'IKBKB', 'JAK1', 'KRAS', 'MAP2K1', 'MAP2K4', 'MAP3K1', 'MAP3K14', 'MAPK1', 'MAPK8', 'MDM2', 'Mito', 'NFKB1', 'NFKBIA', 'PDPK1', 'PIK3CA', 'PIP3', 'PTEN', 'RAF1', 'RIPK1', 'RPS6KA1', 'SOS1', 'STAT3', 'TNF', 'TNFAIP3', 'TNFRSF1A', 'TP53', 'TRADD', 'TRAF2'])


In [9]:
# compute the steady state of the network
calc = BNMPy.SteadyStateCalculator(bn)

# Set experimental conditions (e.g., activate EGF, inhibit AKT1)
calc.set_experimental_conditions(
    stimuli=['EGF'],
    inhibitors=['AKT1']
)

# steady state, add seed for reproducibility
steady_state = calc.compute_steady_state(n_runs=10,n_steps=20000,seed=99)

Found 0 fixed points and 2 cyclic attractors
--------------------------------
No fixed points found
--------------------------------
Cyclic attractors: 
Cyclic attractor 1: [[1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1], [1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0]]
Cyclic attractor 2: [[0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1], [0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0]]
--------------------------------
Node order: dict_keys(['AKT1', 'APAF1', 'APC', 'AktM', 'AktP', 'BAD', 'BCL2L1', 'BID', 'BIRC2', 'CASP12', 'CASP3', 'CASP6', 'CASP7', 'CASP8', 'CASP9', 'CFLAR', 'EGF', 'EGFR', 'FADD', 'IKBKB', 'JAK1', 

In [10]:
# Calculate phenotype scores from BN steady state dictionary
# Each attractor gets its own row in the output DataFrame

phenotype_scores_bn = BNMPy.phenotype_score.phenotype_scores(
    phenotypes=['APOPTOSIS', 'PROLIFERATION','DIFFERENTIATION'],
    simulation_results=steady_state,
    network=bn
)

print(phenotype_scores_bn)

AKT1 has dual effects on PROLIFERATION
EGF has dual effects on PROLIFERATION
IKBKB has dual effects on APOPTOSIS
IKBKB has dual effects on PROLIFERATION
MAP2K4 has dual effects on APOPTOSIS
MAPK1 has dual effects on DIFFERENTIATION
MAPK8 has dual effects on APOPTOSIS
MAPK8 has dual effects on PROLIFERATION
NFKBIA has dual effects on APOPTOSIS
NFKBIA has dual effects on PROLIFERATION
PDPK1 has dual effects on APOPTOSIS
PDPK1 has dual effects on PROLIFERATION
STAT3 has dual effects on PROLIFERATION
Path found for 3 phenotypes: ['APOPTOSIS' 'DIFFERENTIATION' 'PROLIFERATION']
                 APOPTOSIS  DIFFERENTIATION  PROLIFERATION
Cycle_1_State_1        0.0              1.0            6.0
Cycle_1_State_2       -1.0              1.0            7.0
Cycle_2_State_1        1.0              0.0            5.0
Cycle_2_State_2        0.0              0.0            6.0


## PBN steady state

For probabilistic networks, steady state returns a numpy array with probabilities.
Since PBN steady states are averaged frequencies over multiple runs, this returns a single state row.

In [11]:
# convert the BN to a PBN
pbn_string,_ = BNMPy.BN2PBN(bn, prob=0.5)
pbn = BNMPy.load_network(pbn_string)

# Create steady state calculator for PBN
calc = BNMPy.SteadyStateCalculator(pbn)

# Set the same experimental conditions
calc.set_experimental_conditions(
    stimuli=['EGF'],
    inhibitors=['AKT1']
)

# Compute steady state using Monte Carlo (returns numpy array)
steady_state = calc.compute_stationary_mc(
    n_runs=3, 
    n_steps=10000, 
    seed=99
)
steady_state

No initial state provided, using a random initial state
PBN loaded successfully. There are 47 genes in the network.


array([0.        , 0.56981937, 0.58508298, 0.51576351, 0.48536959,
       0.71972272, 0.84676398, 0.84189829, 0.81730321, 0.89695394,
       0.94894354, 0.95434246, 0.97220556, 0.97213891, 0.98493635,
       0.86969273, 0.86962607, 0.9340132 , 0.74838366, 0.86082783,
       0.89755382, 0.8671599 , 0.90868493, 0.94181164, 0.93521296,
       0.55615544, 0.72825435, 0.83389989, 0.85309605, 0.86062787,
       0.75838166, 0.7773112 , 0.77724455, 0.49723389, 0.57695128,
       0.69206159, 0.78211024, 0.72425515, 0.72698794, 0.83776578,
       0.86675998, 0.86682663, 0.38279011, 0.67166567, 0.81190429,
       0.74091848, 0.70685863])

In [12]:
# Calculate phenotype scores from numpy array
phenotype_scores_pbn = BNMPy.phenotype_scores(
    phenotypes=['APOPTOSIS', 'PROLIFERATION'],
    simulation_results=steady_state,
    network=pbn
)

phenotype_scores_pbn

AKT1 has dual effects on PROLIFERATION
EGF has dual effects on PROLIFERATION
IKBKB has dual effects on APOPTOSIS
IKBKB has dual effects on PROLIFERATION
MAP2K4 has dual effects on APOPTOSIS
MAPK8 has dual effects on APOPTOSIS
MAPK8 has dual effects on PROLIFERATION
NFKBIA has dual effects on APOPTOSIS
NFKBIA has dual effects on PROLIFERATION
PDPK1 has dual effects on APOPTOSIS
PDPK1 has dual effects on PROLIFERATION
STAT3 has dual effects on PROLIFERATION
Path found for 2 phenotypes: ['APOPTOSIS' 'PROLIFERATION']


Unnamed: 0,APOPTOSIS,PROLIFERATION
State_1,0.381057,5.313671


## Multiple states from 2D Numpy array

You can also pass multiple states at once using a 2D numpy array (rows = states, columns = genes).

In [13]:
# Create multiple states (e.g., from different time points or conditions)
# Rows represent different states, columns represent genes
multiple_states = np.array([
    [1, 0, 0, 1, 0, 0],  # State 1
    [1, 0, 0, 0, 1, 0],  # State 2
    [0, 1, 1, 0, 0, 1],  # State 3
])
gene_names = ['TP53','MYC','BCL2','CASP3','KRAS','EGFR']

# Calculate phenotype scores for all states at once
phenotype_scores_multi = BNMPy.phenotype_scores(
    genes=gene_names,
    phenotypes=['APOPTOSIS', 'PROLIFERATION'],
    simulation_results=multiple_states
)
phenotype_scores_multi

EGFR has dual effects on PROLIFERATION
Path found for 2 phenotypes: ['APOPTOSIS' 'PROLIFERATION']


Unnamed: 0,APOPTOSIS,PROLIFERATION
State_1,2.0,-2.0
State_2,0.0,0.0
State_3,-2.0,2.0
