# Notebook 2: Training the Synapse Agent

**Objective:** To train a PPO agent on our `NetworkRoutingEnv` and save the resulting model for later analysis.

This notebook will:
1. Set up constants for training (e.g., log directories, save paths).
2. Instantiate the environment and the PPO agent with our custom feature extractor.
3. Run the main training loop.
4. Save the final trained model.

### 1. Imports and Setup

In [1]:
import sys
import os

# Add the project root to the Python path
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), os.pardir)))
from synapse.network_env import NetworkRoutingEnv
from synapse.agent import create_agent

### 2. Define Constants and Parameters

Keeping these in one place makes it easy to run new experiments.

In [2]:
TOPOLOGY_FILE = '../data/topologies/nsfnet.gml'
MODEL_SAVE_DIR = '../models/'
MODEL_SAVE_PATH = os.path.join(MODEL_SAVE_DIR, 'synapse_ppo_nsfnet.zip')

# Total number of steps to train the agent for.
# For a 10,000-word paper, a substantial training time is justified.
# Start with 100,000 for a quick test, then increase to 500,000 or 1,000,000 for the final model.
TRAINING_TIMESTEPS = 2000

# Create the model directory if it doesn't exist
os.makedirs(MODEL_SAVE_DIR, exist_ok=True)

### 3. Instantiate Environment and Agent

In [3]:
# Create the environment
env = NetworkRoutingEnv(graph_file=TOPOLOGY_FILE)

# Create the PPO agent
agent = create_agent(env)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


### 4. Train the Agent

This is the main training loop. The agent will interact with the environment for the specified number of timesteps to learn an optimal routing policy. We will also enable TensorBoard logging to monitor the training progress in real-time.

In [4]:
print(f"Starting training for {TRAINING_TIMESTEPS} timesteps...")

agent.learn(
    total_timesteps=TRAINING_TIMESTEPS,
    progress_bar=True # Show a progress bar during training
)

print("\nTraining complete.")

Starting training for 2000 timesteps...
Logging to ./tensorboard_logs/synapse_ppo/PPO_17


ValueError: cannot reshape array of size 14 into shape (1,1)

### 5. Save the Trained Model

After training, we save the model's learned weights. This allows us to load it later for evaluation without needing to retrain.

In [None]:
agent.save(MODEL_SAVE_PATH)
print(f"Model saved to: {MODEL_SAVE_PATH}")
env.close()

### 6. Monitor with TensorBoard

To visualize the learning progress (e.g., rewards, episode length), open a terminal in the project's root directory (`/synapse-marl-routing/`) and run the following command:

```bash
tensorboard --logdir ./tensorboard_logs/
```

Then, open the URL it provides (usually `http://localhost:6006/`) in your web browser. You should see a graph of the `rollout/ep_rew_mean` (mean episode reward). A healthy training process will show this value trending upwards over time.