# Final Project: Application of Reinforcement Learning in Autonomous Driving
<font size=4>**ELEC-473: Deep Reinforcement Learning**

<font size=2>Jiaqi Guo: JGR9647; 　Yimin Han: YHA1926; 　Ruocheng Jiao: RJB5625


## Background and Motivation
**Autonomous vehicles (AVs)** are expected to enter society in the near future and are expected to be fully available in some regions as early as 2050. However, with the adoption of autonomous driving technology, the impact of these advances is not yet well understood. Numerous technical challenges arise from the goal of analyzing the partial adoption of autonomy: partial control and observation, multi-vehicle interactions, and the sheer variety of scenarios represented by real-world networks.

**Reinforcement learning (RL)** has a wide range of application scenarios, one of which is the application of reinforcement learning in the field of AVs. It permits the decoupling of the mathematical modeling of the system dynamics from the control law design. In our project, we study the feasibility of deep reinforcement learning (RL) for traffic control in a hypothetical single-lane circular track scenario. Our goal is to make the traffic flow smoother and safer, which can be achieved by maximizing our reward function: $$Total\ Rewards = 1.0 \times Desired\ Speed + 0.1 \times Time\ Headways$$

## Methodology
write a sentence to start this section
### PPO
(write intro here)
### DQG
(write intro here)
### Model Comparison
(Write Comparison here)

## Experiment Detail
This section will demonstrate the detail experiment setting.

In [5]:
import flow.networks as networks
from flow.networks import RingNetwork
from flow.core.params import NetParams, InitialConfig # input parameter classes to the network class
from flow.core.params import VehicleParams # input parameter classes to the network class
from flow.core.params import SumoParams, EnvParams
from flow.networks.ring import ADDITIONAL_NET_PARAMS # network-specific parameters
from flow.controllers import IDMController, ContinuousRouter # vehicles dynamics models
from flow.controllers import RLController
from flow.envs import WaveAttenuationEnv
import flow.envs as flowenvs

ModuleNotFoundError: No module named 'flow'

In [None]:
network_name = RingNetwork # ring road network class
name = "training_example" # name of the network
net_params = NetParams(additional_params=ADDITIONAL_NET_PARAMS)
initial_config = InitialConfig(spacing="uniform", perturbation=1) # initial configuration to vehicles

#### Adding Trainable Autonomous Vehicles

In [None]:
vehicles = VehicleParams()
vehicles.add("human",
             acceleration_controller=(IDMController, {}),
             routing_controller=(ContinuousRouter, {}),
             num_vehicles=21)

The above addition to the `Vehicles` class only accounts for 21 of the 22 vehicles that are placed in the network. We now add an additional trainable autuonomous vehicle whose actions are dictated by an RL agent. This is done by specifying an `RLController` as the acceleraton controller to the vehicle.

In [None]:
vehicles.add(veh_id="rl",
             acceleration_controller=(RLController, {}),
             routing_controller=(ContinuousRouter, {}),
             num_vehicles=1)

#### Setting up an Environment
##### EnvParams

`EnvParams` specifies environment and experiment-specific parameters that either affect the training process or the dynamics of various components within the network. For the environment `WaveAttenuationPOEnv`, these parameters are used to dictate bounds on the accelerations of the autonomous vehicles, as well as the range of ring lengths (and accordingly network densities) the agent is trained on.

Finally, it is important to specify here the *horizon* of the experiment, which is the duration of one episode (during which the RL-agent acquire data). 

In [None]:
sim_params = SumoParams(sim_step=0.1, render=False)
# Define horizon as a variable to ensure consistent use across notebook
HORIZON=100   # change this in order to change the dataset size
env_params = EnvParams(
    # length of one rollout
    horizon=HORIZON,
    additional_params={
        # maximum acceleration of autonomous vehicles
        "max_accel": 1,
        # maximum deceleration of autonomous vehicles
        "max_decel": 1,
        # bounds on the ranges of ring road lengths the autonomous vehicle 
        # is trained on
        "ring_length": [220, 270],
    },
)

#### Initializing a Gym Environment
We will use the environment "WaveAttenuationPOEnv", which is used to train autonomous vehicles to attenuate the formation and propagation of waves in a partially observable variable density ring road. To create the Gym Environment, the only necessary parameters are the environment name plus the previously defined variables. These are defined as follows:

In [None]:
env_name = WaveAttenuationEnv          # when use DDPG Algo
#env_name = WaveAttenuationPOEnv       # when use PPO Algo

In [None]:
flow_params = dict(
    exp_tag=name, # name of the experiment
    env_name=env_name, # name of the flow environment the experiment is running on
    network=network_name, # name of the network class the experiment uses
    simulator='traci', # simulator that is used by the experiment
    sim=sim_params, # simulation-related parameters
    env=env_params, # environment related parameters
    net=net_params, # network-related parameters
    veh=vehicles, # vehicles to be placed in the network at the start of a rollout
    initial=initial_config # initialization/reset
)

## Running RL Experiments
#### Initializing Ray
Here, we initialize Ray and experiment-based constant variables specifying parallelism in the experiment as well as experiment batch size in terms of number of rollouts.

In [None]:
import json
import ray
try:
    from ray.rllib.agents.agent import get_agent_class
except ImportError:
    from ray.rllib.agents.registry import get_agent_class
from ray.tune import run_experiments
from ray.tune.registry import register_env
from flow.utils.registry import make_create_env
from flow.utils.rllib import FlowParamsEncoder

In [None]:
N_CPUS = 2  # number of parallel workers
N_ROLLOUTS = 1  # number of rollouts per training iteration
ray.init(num_cpus=N_CPUS)

#### Configuration and Setup
Here, we copy and modify the default configuration for the [PPO algorithm](https://arxiv.org/abs/1707.06347). The agent has the number of parallel workers specified, a batch size corresponding to `N_ROLLOUTS` rollouts (each of which has length `HORIZON` steps), a discount rate $\gamma$ of 0.999, two hidden layers of size 16, uses Generalized Advantage Estimation, $\lambda$ of 0.97, and other parameters as set below.

Once `config` contains the desired parameters, a JSON string corresponding to the `flow_params` specified in section 3 is generated. The `FlowParamsEncoder` maps objects to string representations so that the experiment can be reproduced later. That string representation is stored within the `env_config` section of the `config` dictionary. Later, `config` is written out to the file `params.json`. 

Next, we call `make_create_env` and pass in the `flow_params` to return a function we can use to register our Flow environment with Gym.

#### DQN Algo (Parameter Specification)
All the variables designed for DQG Algo training are defined here:

In [None]:
alg_run = "DDPG"

agent_cls = get_agent_class(alg_run)
config = agent_cls._default_config.copy()
config["num_workers"] = N_CPUS - 1  # number of parallel workers
config["gamma"] = 0.999  # discount rate

# save the flow params for replay
flow_json = json.dumps(flow_params, cls=FlowParamsEncoder, sort_keys=True,
                       indent=4)  # generating a string version of flow_params
config['env_config']['flow_params'] = flow_json  # adding the flow_params to config dict
config['env_config']['run'] = alg_run

# Call the utility function make_create_env to be able to 
# register the Flow env for this experiment
create_env, gym_name = make_create_env(params=flow_params, version=0)

# Register as rllib env with Gym
register_env(gym_name, create_env)

#### PPO Algo (Parameter Specification)
All the variables designed for PPO Algo training are defined here:

In [None]:
alg_run = "PPO"

agent_cls = get_agent_class(alg_run)
config = agent_cls._default_config.copy()
config["num_workers"] = N_CPUS - 1  # number of parallel workers
config["train_batch_size"] = HORIZON * N_ROLLOUTS  # batch size
config["gamma"] = 0.999  # discount rate
config["model"].update({"fcnet_hiddens": [16, 16]})  # size of hidden layers in network
config["use_gae"] = True  # using generalized advantage estimation
config["lambda"] = 0.97  
config["sgd_minibatch_size"] = min(16 * 1024, config["train_batch_size"])
config["kl_target"] = 0.02  # target KL divergence
config["num_sgd_iter"] = 10  # number of SGD iterations
config["horizon"] = HORIZON  # rollout horizon

# save the flow params for replay
flow_json = json.dumps(flow_params, cls=FlowParamsEncoder, sort_keys=True,
                       indent=4)  # generating a string version of flow_params
config['env_config']['flow_params'] = flow_json  # adding the flow_params to config dict
config['env_config']['run'] = alg_run

# Call the utility function make_create_env to be able to 
# register the Flow env for this experiment
create_env, gym_name = make_create_env(params=flow_params, version=0)

# Register as rllib env with Gym
register_env(gym_name, create_env)

#### Training
Training the network

In [6]:
trials = run_experiments({
    flow_params["exp_tag"]: {
        "run": alg_run,
        "env": gym_name,
        "config": {
            **config
        },
        "checkpoint_freq": 1,  # number of iterations between checkpoints
        "checkpoint_at_end": True,  # generate a checkpoint at the end
        "max_failures": 999,
        "stop": {  # stopping conditions
            "training_iteration": 1,  # number of iterations to stop after
        },
    },
})

NameError: name 'run_experiments' is not defined

### Visualizing the results


## Result Analysis


## Conclusion
