Certainly! Below is a rewritten version where all math expressions are fully compatible for rendering in a notebook. Each step is laid out clearly with proper display-style math for effective visualization.

---

### **1. Graph Representation**

A graph \( G \) is represented as:

$$
G = (V, E)
$$

Where:
- \( V \) is the set of nodes (vertices), \( |V| = n \),
- \( E \subseteq V \times V \) is the set of edges.

Node and edge features are represented as:
- Node features: \( \mathbf{X} \in \mathbb{R}^{n \times d_x} \), where \( d_x \) is the dimension of each node feature.
- Edge features: \( \mathbf{A} \in \mathbb{R}^{|E| \times d_e} \), where \( d_e \) is the dimension of each edge feature.

---

### **2. Bellman-Ford Updates**

The Bellman-Ford algorithm iteratively updates the distance of each node from a source node \( s \). Its core equation is:

$$
d_u = \min_{v \in N(u)} \big( d_v + w_{v \to u} \big)
$$

Where:
- \( d_u \): Distance to node \( u \),
- \( N(u) \): Neighbors of \( u \),
- \( w_{v \to u} \): Weight of the edge between \( v \) and \( u \).

In a GNN, this can be expressed using message-passing:

1. **Message Aggregation**:
   $$ 
   m_u^{(t)} = \min_{v \in N(u)} \big( \mathbf{h}_v^{(t-1)} + \mathbf{w}_{v \to u} \big)
   $$

   Here:
   - \( \mathbf{h}_v^{(t-1)} \): Hidden state of node \( v \) from the previous layer (\( t-1 \)),
   - \( \mathbf{w}_{v \to u} \): Edge weight or feature.

2. **Node Update**:
   $$ 
   \mathbf{h}_u^{(t)} = \sigma \big( \mathbf{W}_u \cdot \big[ \mathbf{h}_u^{(t-1)} \, \Vert \, m_u^{(t)} \big] \big)
   $$

   Where:
   - \( \sigma \): Non-linear activation (e.g., ReLU),
   - \( \mathbf{W}_u \): Learnable weight matrix,
   - \( \Vert \): Concatenation operator.

---

### **3. Encode-Process-Decode Framework**

1. **Encoding**:
   Raw node features \( \mathbf{x}_u \) are mapped to a latent space:
   $$
   \mathbf{h}_u^{(0)} = \sigma \big( \mathbf{W}_{enc} \cdot \mathbf{x}_u \big)
   $$

   Here, \( \mathbf{W}_{enc} \) is the encoder's learnable weight matrix.

2. **Processing**:
   Perform \( T \) iterations of message-passing:
   $$
   \mathbf{h}_u^{(t)} = \sigma \big( \mathbf{W}_u \cdot \big[ \mathbf{h}_u^{(t-1)} \, \Vert \, m_u^{(t)} \big] \big)
   $$

3. **Decoding**:
   After \( T \) iterations, the latent state \( \mathbf{h}_u^{(T)} \) is decoded to produce the final prediction:
   $$
   \hat{y}_u = \sigma \big( \mathbf{W}_{dec} \cdot \mathbf{h}_u^{(T)} \big)
   $$

   Where:
   - \( \mathbf{W}_{dec} \): Learnable weight matrix of the decoder,
   - \( \hat{y}_u \): Predicted value for node \( u \).

---

### **4. Reinforcement Learning (Actor-Critic)**

#### **Policy Network (Actor):**
The policy network outputs a probability distribution over actions \( a \):
$$
\pi(a | \mathbf{s}) = \text{softmax} \big( \mathbf{W}_{actor} \cdot \mathbf{s} \big)
$$
Where:
- \( \mathbf{s} \): Current state embedding,
- \( \mathbf{W}_{actor} \): Actor network's weight matrix.

#### **Value Network (Critic):**
The critic estimates the value of a state:
$$
V(\mathbf{s}) = \mathbf{W}_{critic} \cdot \mathbf{s}
$$
Where:
- \( \mathbf{W}_{critic} \): Critic network's weight matrix.

#### **Loss Functions:**
1. **Policy Loss**:
   $$
   \mathcal{L}_{policy} = -\sum_{t} \log \pi(a_t | \mathbf{s}_t) \cdot \big( R_t - V(\mathbf{s}_t) \big)
   $$

   Where:
   - \( R_t \): Discounted cumulative reward,
   - \( V(\mathbf{s}_t) \): Value estimate of the state.

2. **Value Loss**:
   $$
   \mathcal{L}_{value} = \frac{1}{2} \sum_{t} \big( R_t - V(\mathbf{s}_t) \big)^2
   $$

3. **Total Loss**:
   $$
   \mathcal{L} = \mathcal{L}_{policy} + \alpha \cdot \mathcal{L}_{value}
   $$

   Where \( \alpha \) balances the two losses.

---

### **5. Supervised Learning**

For supervised learning, predictions \( \hat{y}_u \) are compared to ground-truth labels \( y_u \). The loss is binary cross-entropy:
$$
\mathcal{L}_{BCE} = - \frac{1}{N} \sum_{i=1}^{N} \big( y_i \log \hat{y}_i + (1 - y_i) \log (1 - \hat{y}_i) \big)
$$

---

### **6. Discounted Cumulative Returns**

For reinforcement learning, discounted cumulative returns \( R_t \) are calculated as:
$$
R_t = \sum_{k=0}^{\infty} \gamma^k r_{t+k}
$$

Where:
- \( r_{t+k} \): Reward at time \( t+k \),
- \( \gamma \in [0, 1] \): Discount factor for future rewards.

---

### **Summary Flow**

1. Input graph \( G \) includes:
   - Node features \( \mathbf{X} \),
   - Edge features \( \mathbf{A} \),
   - Adjacency \( E \).
2. Encode features:
   $$
   \mathbf{h}_u^{(0)} = \sigma(\mathbf{W}_{enc} \cdot \mathbf{x}_u)
   $$
3. Propagate messages \( T \) times:
   $$
   m_u^{(t)} = \min_{v \in N(u)} \big( \mathbf{h}_v^{(t-1)} + \mathbf{w}_{v \to u} \big)
   $$
   $$
   \mathbf{h}_u^{(t)} = \sigma \big( \mathbf{W}_u \cdot \big[ \mathbf{h}_u^{(t-1)} \Vert m_u^{(t)} \big] \big)
   $$
4. Decode to predict:
   $$
   \hat{y}_u = \sigma(\mathbf{W}_{dec} \cdot \mathbf{h}_u^{(T)})
   $$

