## Basic RL usage

### Initializing environments


#### **Environment settings**

- Initializing an environment is done with the `BaseEnv` class. The `BaseEnv` class leverages the `nocturne` simulator to create a basic RL interface, based on the provided traffic scenario(s). 

---
> 📝 The `env_config.yaml` file defines our environment settings, such as the action space, observation space and traffic scenarios to use.
---

Check out `configs/env_config` for all the details!

In [4]:
import yaml
from nocturne.envs.base_env import BaseEnv

import os
os.chdir('..')

# Load environment settings
with open(f"./configs/env_config.yaml", "r") as stream:
    env_config = yaml.safe_load(stream)

# Initialize environment
env = BaseEnv(config=env_config)

Exception: 

In [None]:
print(f'controlling agents # {[agent.id for agent in env.controlled_vehicles]}')

#### **Data**

- Within `env_config.yaml`, we specify the path to the folder containing the traffic scenarios to use as follows:

```yaml
# Path to folder with traffic scene(s) from which to create an environment
data_path: ../data
```

- [Here](https://github.com/facebookresearch/nocturne/tree/main#downloading-the-dataset) are the instructions to access the complete dataset of traffic scenes. 

- The data folder also has a file named `valid_files.json`. This file lists the names of all the valid traffic scenarios along with the ids of the vehicles that are not valid. These vehicles are excluded from our experiment.

For simplicity, we currently added a single traffic scenario that includes two vehicles in our data folder. Both vehicles can be used, so our `valid_files.json` looks like this:

```yaml
{
    "example_scenario.json": []
}
```

### Interacting with the environment

The classic agent-environment loop of reinforcement learning is implemented as follows:

In [None]:
# Reset
obs_dict = env.reset()

# Get info
agent_ids = [agent_id for agent_id in obs_dict.keys()]
dead_agent_ids = []
num_agents = len(agent_ids)
rewards = {agent_id: 0 for agent_id in agent_ids}

for step in range(1000):

    # Sample actions
    action_dict = {
        agent_id: env.action_space.sample() 
        for agent_id in agent_ids
        if agent_id not in dead_agent_ids
    }
    
    # Step in env
    obs_dict, rew_dict, done_dict, info_dict = env.step(action_dict)

    for agent_id in action_dict.keys():
        rewards[agent_id] += rew_dict[agent_id]

    # Update dead agents
    for agent_id, is_done in done_dict.items():
        if is_done and agent_id not in dead_agent_ids:
            dead_agent_ids.append(agent_id)

    # Reset if all agents are done
    if done_dict["__all__"]:
        print(f'Done after {env.step_num} steps -- total return in episode: {rewards}')
        obs_dict = env.reset()
        dead_agent_ids = []
        rewards = {agent_id: 0 for agent_id in agent_ids}

# Close environment
env.close()

### Accessing information about the environment

In [None]:
# The observation space 
env.observation_space


In [None]:
# The size of the joint action space 
env.action_space


In [None]:
# Which agents are controlled?
env.controlled_vehicles

### 
