# Load RL Policy

---

## Define Policy

### Policy Options

- [Documentation](https://docs.ray.io/en/latest/rllib/rllib-saving-and-loading-algos-and-policies.html)
- `policy_dir` is the path to the last version of our policy and its weights
- `checkpoint_dir` is the path to the last checkpoint from our training process. 
- We won't be using `checkpoint_dir`, but it's useful for restoring to the previous state and configuration, allowing us to continue training if needed.

<span style="color:#ef4444">
&#x2B55; When loading in these checkpoints it's best to use the same version of python used in the training environment.
</span>

In [2]:
checkpoint_dir = "sumo_3d_demo/ray_results/Sheridan/PPO/PPO_Sheridan_0c5d9_00000_0_2024-11-26_15-42-31/checkpoint_000018"

policy_dir = "sumo_3d_demo/ray_results/Sheridan/PPO/PPO_Sheridan_0c5d9_00000_0_2024-11-26_15-42-31/checkpoint_000018/policies/default_policy"

### How to load any policy

In [2]:
from ray.rllib.policy.policy import Policy

policy = Policy.from_checkpoint(policy_dir)

print(policy)

PPOTorchPolicy


### How to load any Algorithm checkpoint

In [None]:
from ray.rllib.algorithms.algorithm import Algorithm
algorithm = Algorithm.from_checkpoint(checkpoint_dir)

---

## Load PPO policy for SUMO


### Explanation of the Code

1. **Import Required Libraries**:  
   - `sumo_rl`: Used to interact with the SUMO simulation environment.  
   - `ray.rllib.policy`: For loading and using a pre-trained RL policy.  
   - `numpy`: Handles numerical operations like array transformations.  

2. **Load Pre-trained Policy**:  
   - The policy is loaded from our last checkpoint using `Policy.from_checkpoint(policy_dir)`.

3. **Initialize SUMO-RL Environment**:  
   - The SUMO simulation requires a network file (`osm.net.xml`) and a route file (`osm.rou.xml`).  
   - GUI is enabled, and the simulation runs for up to 80,000 seconds.

4. **Environment Reset**:  
   - The environment is reset using `env.reset()`, providing initial observations (`observations`) for all agents.

5. **Action Loop**:  
   - While agents are active in the environment:  
     - **Action Computation**: For each agent:  
       - Observations are converted to a NumPy array (`obs_array`) for compatibility.  
       - The pre-trained policy computes actions based on observations using `policy.compute_single_action()`.  
     - **Environment Step**:  
       - The computed actions are passed to the environment using `env.step(actions)`, which returns:  
         - `observations`: New state information for agents.  
         - `rewards`: Immediate rewards for actions.  
         - `terminations` and `truncations`: Flags indicating if agents have completed or are truncated.  
         - `infos`: Additional environment data.  

6. **Reward Logging**:  
   - Rewards for each agent are printed for monitoring performance.


In [None]:
import sumo_rl
import numpy as np
from ray.rllib.policy.policy import Policy

policy = Policy.from_checkpoint(policy_dir)

# Initialize SUMO-RL environment
env = sumo_rl.parallel_env(
    net_file='sumo_3d_demo/osm.net.xml',
    route_file='sumo_3d_demo/osm.rou.xml',
    use_gui=True,
    num_seconds=80000
)

# Reset the environment
observations, infos = env.reset()

while env.agents:
    # Compute actions using the loaded policy
    actions = {}
    for agent, obs in observations.items():
        # Convert observation to appropriate format
        obs_array = np.array(obs).astype(np.float32)
        
        # Compute the action using the policy
        # Extract only the action
        action, _, _ = policy.compute_single_action(obs_array) 

        actions[agent] = action
    
    # Step the environment
    observations, rewards, terminations, truncations, infos = env.step(actions)
    
    # Optional: Log rewards or other metrics
    for agent, reward in rewards.items():
        print(f"Agent: {agent}, Reward: {reward}")


---

## Run 3D Unity Script

- Before we can run the Unity script we need to modify our sumo_rl custom launcher.
- In the modified script we're forcing the traci server to port 4001 and reducing the step length to 0.5.
- We also ensure we specify the exection order so our policy takes priority over the Unity script.
- This Unity script is from [traffic3d](https://traffic3d.org/sumo.html)
  - Make sure to install traffic3d and go through a test run
  - Once that's working fine we can modify it to account for our policy script
  - In the `Scripts` folder modify the following:
    - `ControlCommands.cs`: change `CMD_GETVERSION` TO `CMD_SETORDER`
    - Then in `SumoManager.cs` add the following to the if statement on line 44: `client.Control.SetOrder(1);`

In [1]:
import sumo_rl
from pathlib import Path
import shutil

# Get the path of the installed `sumo_rl` package
sumo_rl_path = Path(sumo_rl.__file__).parent

# Path to the existing `env.py` file in the installed package
original_env_path = sumo_rl_path / "environment" / "env.py"

# Path to your modified `env.py` file (placed in the same directory as your notebook)
modified_env_path = Path("modified_env.py")

# Ensure the modified file exists
if not modified_env_path.exists():
    raise FileNotFoundError(f"Modified env.py not found at {modified_env_path}")

# Overwrite the installed `env.py` with the modified version
shutil.copy(modified_env_path, original_env_path)
print(f"Replaced {original_env_path} with your modified `env.py` from {modified_env_path}")


Replaced /Users/jamesb/.pyenv/versions/3.10.12/lib/python3.10/site-packages/sumo_rl/environment/env.py with your modified `env.py` from modified_env.py


<span style="color:#ef4444">
&#x2B55; After running the previous script you have to restart your kernel
</span>

In [None]:
policy_dir = "sumo_3d_demo/ray_results/Sheridan/PPO/PPO_Sheridan_0c5d9_00000_0_2024-11-26_15-42-31/checkpoint_000018/policies/default_policy"

In [3]:
import sumo_rl
import numpy as np
from ray.rllib.policy.policy import Policy

policy = Policy.from_checkpoint(policy_dir)

# Initialize SUMO-RL environment with multiple client support
env = sumo_rl.parallel_env(
    net_file='sumo_3d_demo/osm.net.xml',
    route_file='sumo_3d_demo/osm.rou.xml',
    use_gui=True,
    num_seconds=80000,
    port=4001,          # Use the same port for SUMO and Unity
    num_clients=2,      # Allow two clients (policy script and Unity)
)
# Reset the environment
observations, infos = env.reset()

while env.agents:
    # Compute actions using the loaded policy
    actions = {}
    for agent, obs in observations.items():
        # Convert observation to appropriate format
        obs_array = np.array(obs).astype(np.float32)
        
        # Compute the action using the policy
        # Extract only the action
        action, _, _ = policy.compute_single_action(obs_array) 

        actions[agent] = action
    
    # Step the environment
    observations, rewards, terminations, truncations, infos = env.step(actions)
    
    # Optional: Log rewards or other metrics
    for agent, reward in rewards.items():
        print(f"Agent: {agent}, Reward: {reward}")


 Retrying in 1 seconds
Step #0.00 (0ms ?*RT. ?UPS, TraCI: 9ms, vehicles TOT 0 ACT 0 BUF 0)                      
 Retrying in 1 seconds




In [7]:
import traci
import numpy as np

# Connect to the existing SUMO instance
traci_port = 4001
traci.init(traci_port)

# Set the execution order for this client (e.g., 0 for Python client)
traci.setOrder(0)

# Get the list of traffic light IDs
tls_ids = traci.trafficlight.getIDList()


while True:

    traci.simulationStep()

# Close the connection when done
traci.close()

 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds
Could not connect to TraCI server at localhost:4001 [Errno 61] Connection refused
 Retrying in 1 seconds


KeyboardInterrupt: 