# Differentiable Pathfinding with Dense Matrix Operations

This notebook demonstrates how to solve pathfinding problems using differentiable programming. Instead of traditional graph algorithms, we use gradient descent to learn the optimal sequence of moves through a graph.

## Key Concepts
- **Differentiable discrete optimization**: Making discrete pathfinding continuous and learnable
- **Matrix-based transitions**: Using adjacency matrices to constrain valid moves
- **Parameter learning**: Learning step parameters that encode movement decisions

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

## Graph Definition

Define the graph structure using an adjacency matrix. This 5-node graph is based on CLRS page 590 with edge 2-3 removed.

**Graph structure:**
- Nodes: 0, 1, 2, 3, 4  
- Edges: 0-1, 0-4, 1-3, 1-4, 2-3, 3-4

The adjacency matrix constrains movement to only valid connections between nodes.

In [None]:
# CLRS 3rd edition page 590 but I removed 2-3 edge
adjacency_matrix = torch.tensor(
    [
        [0, 1, 0, 0, 1],
        [1, 0, 0, 1, 1],
        [0, 0, 0, 1, 0],
        [0, 1, 1, 0, 1],
        [1, 1, 0, 1, 0],
    ],
    dtype=torch.float32,
)
adjacency_matrix

tensor([[0., 1., 0., 0., 1.],
        [1., 0., 0., 1., 1.],
        [0., 0., 0., 1., 0.],
        [0., 1., 1., 0., 1.],
        [1., 1., 0., 1., 0.]])

## Experiment 1: Learning with Non-Zero Initialization

**Problem Setup:**
- Start at node 0: `[1, 0, 0, 0, 0]`
- Target node 2: `[0, 0, 1, 0, 0]`
- Need to find path: 0 → 1 → 3 → 2

**Algorithm:**
1. **State representation**: One-hot vectors for current position
2. **Learnable parameters**: Three step parameters (initialized to 0.2) that control movement decisions
3. **Forward pass**: For each step, multiply current state by step parameters, then by adjacency matrix
4. **Loss**: MSE between final state and target position

**Key insight**: The adjacency matrix multiplication `adjacency_matrix @ (state * step_params)` ensures only valid graph moves are possible.

In [3]:
# Starting node (one-hot)
start = torch.tensor([1.0, 0.0, 0.0, 0.0, 0.0])
# Target node (one-hot)
target = torch.tensor([0.0, 0.0, 1.0, 0.0, 0.0])
# 3 transition probability vectors (learnable parameters)
step1 = nn.Parameter(torch.tensor([0.2, 0.2, 0.2, 0.2, 0.2]))
step2 = nn.Parameter(torch.tensor([0.2, 0.2, 0.2, 0.2, 0.2]))
step3 = nn.Parameter(torch.tensor([0.2, 0.2, 0.2, 0.2, 0.2]))
optimizer = optim.SGD([step1, step2, step3], lr=0.1)

# Training loop
found_correct = False
for epoch in range(1000):
    optimizer.zero_grad()
    # Forward pass: apply transitions
    state = start
    for step_params in [step1, step2, step3]:
        # Negative numbers are required for inhibitory backprop
        # step_probs = torch.softmax(step_params, dim=0)  # Proper probabilities
        state = adjacency_matrix @ (state * step_params)
        # state = state / (state.sum() + 1e-8)  # Normalize to maintain probability mass
    # Loss: MSE to target
    loss = nn.MSELoss()(state, target)
    # Backward pass
    loss.backward()
    optimizer.step()

    # Check if we found the correct answer
    if not found_correct:
        step1_choice = torch.argmax(step1.data).item()
        step2_choice = torch.argmax(step2.data).item()
        step3_choice = torch.argmax(step3.data).item()

        if step1_choice == 0 and step2_choice == 1 and step3_choice == 3:
            print(f"Found correct answer at step {epoch}")
            found_correct = True

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

print(f"\nFinal state ({torch.argmax(step1.data)}): {state}")
print(f"Step 1 ({torch.argmax(step1.data)}) probs: {step1.data}")
print(f"Step 2 ({torch.argmax(step2.data)}) probs: {step2.data}")
print(f"Step 3 ({torch.argmax(step3.data)}) probs: {step3.data}")

Found correct answer at step 0
Epoch 0, Loss: 0.1944
Epoch 100, Loss: 0.1207
Epoch 200, Loss: 0.0000
Epoch 300, Loss: 0.0000
Epoch 400, Loss: 0.0000
Epoch 500, Loss: 0.0000
Epoch 600, Loss: 0.0000
Epoch 700, Loss: 0.0000
Epoch 800, Loss: 0.0000
Epoch 900, Loss: 0.0000

Final state (0): tensor([-8.9432e-08,  1.7890e-07,  1.0000e+00, -8.9432e-08,  1.7881e-07],
       grad_fn=<MvBackward0>)
Step 1 (0) probs: tensor([0.9725, 0.2000, 0.2000, 0.2000, 0.2000])
Step 2 (1) probs: tensor([0.2000, 0.7019, 0.2000, 0.2000, 0.7019])
Step 3 (3) probs: tensor([-7.3255e-01, -4.3828e-08,  2.0000e-01,  7.3256e-01, -8.7199e-08])


## Experiment 2: Zero Initialization Problem

**Hypothesis**: What happens if we initialize all parameters to zero?

This experiment demonstrates the **dead neuron problem** in neural networks. When all parameters start at zero:
- All gradients become zero
- No learning occurs (parameters remain at zero)  
- The algorithm gets stuck and cannot find the solution

**Result**: Loss remains constant at 0.2000, showing the importance of proper parameter initialization.

In [4]:
# Starting node (one-hot)
start = torch.tensor([1.0, 0.0, 0.0, 0.0, 0.0])
# Target node (one-hot)
target = torch.tensor([0.0, 0.0, 1.0, 0.0, 0.0])
# 3 transition probability vectors (learnable parameters)
step1 = nn.Parameter(torch.tensor([0.0, 0.0, 0.0, 0.0, 0.0]))
step2 = nn.Parameter(torch.tensor([0.0, 0.0, 0.0, 0.0, 0.0]))
step3 = nn.Parameter(torch.tensor([0.0, 0.0, 0.0, 0.0, 0.0]))
optimizer = optim.SGD([step1, step2, step3], lr=0.1)

# Training loop
found_correct = False
for epoch in range(1000):
    optimizer.zero_grad()
    # Forward pass: apply transitions
    state = start
    for step_params in [step1, step2, step3]:
        # Negative numbers are required for inhibitory backprop
        # step_probs = torch.softmax(step_params, dim=0)  # Proper probabilities
        state = adjacency_matrix @ (state * step_params)
        # state = state / (state.sum() + 1e-8)  # Normalize to maintain probability mass
    # Loss: MSE to target
    loss = nn.MSELoss()(state, target)
    # Backward pass
    loss.backward()
    optimizer.step()

    # Check if we found the correct answer
    if not found_correct:
        step1_choice = torch.argmax(step1.data).item()
        step2_choice = torch.argmax(step2.data).item()
        step3_choice = torch.argmax(step3.data).item()

        if step1_choice == 0 and step2_choice == 1 and step3_choice == 3:
            print(f"Found correct answer at step {epoch}")
            found_correct = True

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

print(f"\nFinal state ({torch.argmax(step1.data)}): {state}")
print(f"Step 1 ({torch.argmax(step1.data)}) probs: {step1.data}")
print(f"Step 2 ({torch.argmax(step2.data)}) probs: {step2.data}")
print(f"Step 3 ({torch.argmax(step3.data)}) probs: {step3.data}")

Epoch 0, Loss: 0.2000
Epoch 100, Loss: 0.2000
Epoch 200, Loss: 0.2000
Epoch 300, Loss: 0.2000
Epoch 400, Loss: 0.2000
Epoch 500, Loss: 0.2000
Epoch 600, Loss: 0.2000
Epoch 700, Loss: 0.2000
Epoch 800, Loss: 0.2000
Epoch 900, Loss: 0.2000

Final state (0): tensor([0., 0., 0., 0., 0.], grad_fn=<MvBackward0>)
Step 1 (0) probs: tensor([0., 0., 0., 0., 0.])
Step 2 (0) probs: tensor([0., 0., 0., 0., 0.])
Step 3 (0) probs: tensor([0., 0., 0., 0., 0.])


## Key Takeaways

1. **Differentiable pathfinding**: Discrete optimization problems can be made continuous and solved with gradient descent
2. **Adjacency constraints**: Matrix multiplication naturally enforces graph structure constraints  
3. **Initialization matters**: Zero initialization creates dead neurons and prevents learning
4. **Negative parameters**: The algorithm allows negative values for "inhibitory" effects
5. **Matrix operations**: Dense matrix operations can encode complex discrete decision-making processes

**Applications**: This approach can be extended to larger graphs, multiple agents, or dynamic pathfinding scenarios where traditional algorithms might be insufficient.