In [None]:
import re
import numpy as np
import pandas as pd


text =""" Step 1: Finding the Edges

Here are the identified edges, based on known biological mechanisms in the Sachs et al. dataset and validated signaling pathways:


---

1. (PIP2 → PIP3):
PIP2 is phosphorylated by PI3K to produce PIP3. This is a well-known transformation in the PI3K/Akt pathway.


2. (PIP3 → Akt):
PIP3 recruits Akt to the plasma membrane where it becomes activated by phosphorylation, making it a direct upstream regulator of Akt activity.


3. (PIP2 → PLCg):
PLCg hydrolyzes PIP2 into DAG and IP3. Thus, PIP2 availability is essential for PLCg action.


4. (PLCg → pIP3):
Upon activation, PLCg cleaves PIP2 into IP3 (also referred to as pIP3) and DAG. This step links membrane signaling to calcium fluxes.


5. (PKC → Raf):
Activated PKC can stimulate Raf, initiating the MAPK cascade, including activation of Erk.


6. (Raf → Erk):
Raf phosphorylates MEK, which then activates Erk. This is the canonical MAPK signaling pathway.


7. (PKA → Erk):
PKA can have cross-talk effects with the MAPK pathway, modulating Erk activation depending on context.


8. (PKA → Akt):
There is evidence of PKA influencing Akt via indirect modulation of PI3K or via feedback inhibition mechanisms.


9. (PKC → Jnk):
PKC is known to activate stress-response kinases like Jnk via DAG-mediated pathways.


10. (PKC → p38):
PKC can activate p38 MAPK through intermediary kinases under stress or cytokine signals.


11. (Jnk → Erk):
Cross-talk between Jnk and Erk pathways has been reported, especially under certain stimuli that activate multiple MAPK branches.


12. (p38 → Jnk):
p38 and Jnk are both stress-activated kinases and can modulate each other’s activity in response to shared upstream signals.


13. (PLCg → PKC):
PLCg generates DAG, which directly activates PKC, linking receptor-mediated signaling to downstream kinase activation.


14. (pIP3 → PKC):
IP3/pIP3 increases intracellular calcium, which is required (along with DAG) for full PKC activation.




---

Step 2: Reflecting on Domain Knowledge – Finalized Set of Causal Edges

Each edge below is validated by established biochemical pathways and reflects causal direction based on experimental evidence in the Sachs dataset:

1. (PIP2, PIP3) : PIP2 is phosphorylated by PI3K to produce PIP3, a critical step in activating Akt signaling.
2. (PIP3, Akt) : PIP3 recruits Akt to the membrane where it is activated by phosphorylation.
3. (PIP2, PLCg) : PIP2 is the substrate for PLCg, which cleaves it into signaling molecules DAG and IP3.
4. (PLCg, pIP3) : PLCg cleaves PIP2 to generate IP3 (pIP3), a key second messenger in calcium signaling.
5. (PLCg, PKC) : PLCg activity produces DAG, which activates PKC.
6. (pIP3, PKC) : pIP3 (IP3) increases cytosolic calcium, which helps fully activate PKC.
7. (PKC, Raf) : PKC activation can lead to Raf activation, starting the MAPK cascade.
8. (Raf, Erk) : Raf phosphorylates MEK, which activates Erk, a central component of the MAPK pathway.
9. (PKA, Erk) : PKA modulates Erk activity, often via crosstalk with MAPK signaling.
10. (PKA, Akt) : PKA may affect Akt through PI3K regulation or feedback inhibition mechanisms.
11. (PKC, Jnk) : PKC can activate Jnk under stress or cytokine stimulation.
12. (PKC, p38) : PKC also activates the p38 MAPK pathway through shared intermediates.
13. (Jnk, Erk) : Jnk can influence Erk activity via stress pathway cross-talk.
14. (p38, Jnk) : p38 modulates Jnk signaling in the broader MAPK stress response network.


---

Let me know if you'd like to visualize this DAG or explore how to infer this structure automatically using causal discovery algorithms like PC, GES, or NOTEARS.
"""

# Extract edges of the form (A → B)
edges_step1 = re.findall(r'\((\w+)\s*→\s*(\w+)\)', text)

# Extract edges of the form (A,B)
step2_text = text.split("Step 2")[1]
edges_step2 = re.findall(r'\((\w+),\s*(\w+)\)', step2_text)

all_nodes = sorted(set(sum(edges_step1 + edges_step2, ())))
node_index = {node: i for i, node in enumerate(all_nodes)}

# Initialize adjacency matrix
adj = np.zeros((len(all_nodes), len(all_nodes)), dtype=int)

for src, tgt in edges_step1:
    adj[node_index[src], node_index[tgt]] = 1


for src, tgt in edges_step2:
    if adj[node_index[src], node_index[tgt]] == 1:
        adj[node_index[src], node_index[tgt]] = 2

# Prune edges not equal to 2
adj[adj != 2] = 0
adj[adj == 2] = 1
adj_df = pd.DataFrame(adj, index=all_nodes, columns=all_nodes)

# Display edge list
edge_list = [(src, tgt) for src in all_nodes for tgt in all_nodes if adj_df.loc[src, tgt] == 2]

# adjacency matrix and final edge list
print("Pruned Adjacency Matrix (only edges present in both Step 1 and Step 2):\n")
print(adj_df)

print("\nFinal Edge List (A → B) after pruning:\n")
for edge in edge_list:
    print(f"{edge[0]} → {edge[1]}")

Pruned Adjacency Matrix (only edges present in both Step 1 and Step 2):

      Akt  Erk  Jnk  PIP2  PIP3  PKA  PKC  PLCg  Raf  p38  pIP3
Akt     0    0    0     0     0    0    0     0    0    0     0
Erk     0    0    0     0     0    0    0     0    0    0     0
Jnk     0    1    0     0     0    0    0     0    0    0     0
PIP2    0    0    0     0     1    0    0     1    0    0     0
PIP3    1    0    0     0     0    0    0     0    0    0     0
PKA     1    1    0     0     0    0    0     0    0    0     0
PKC     0    0    1     0     0    0    0     0    1    1     0
PLCg    0    0    0     0     0    0    1     0    0    0     1
Raf     0    1    0     0     0    0    0     0    0    0     0
p38     0    0    1     0     0    0    0     0    0    0     0
pIP3    0    0    0     0     0    0    1     0    0    0     0

Final Edge List (A → B) after pruning:

