(c) Copyright 2023 Enzo Alexander Cording - https://github.com/EnzoCording - GNU GPL v3.0

This pipeline walks through the entire functionalities of FleetRL

1) Creating a custom use-case
    - Updating your data path
    - Changing environment settings if needed
    - Generating your own vehicle schedules
2) Training an RL agent
3) Building benchmark charging strategies
4) Comparing the RL agent to the benchmarks

This code could also be run in a .py file. Then, the code should be wrapped in:

    if __name__ == "__main__":
        #code here

**Importing dependencies**

In [None]:
import datetime as dt
import numpy as np
import math
import matplotlib.pyplot as plt
from typing import Literal
import pandas as pd
import time
import os

from fleetrl.fleet_env.fleet_environment import FleetEnv
from fleetrl.benchmarking.benchmark import Benchmark
from fleetrl.benchmarking.uncontrolled_charging import Uncontrolled
from fleetrl.benchmarking.distributed_charging import DistributedCharging
from fleetrl.benchmarking.night_charging import NightCharging
from fleetrl.benchmarking.linear_optimization import LinearOptimization

from fleetrl.agent_eval.evaluation import Evaluation
from fleetrl.agent_eval.basic_evaluation import BasicEvaluation

from stable_baselines3.common.vec_env import VecNormalize, SubprocVecEnv
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3 import PPO
from stable_baselines3.common.callbacks import EvalCallback, ProgressBarCallback, BaseCallback
from stable_baselines3.common.logger import HParam

from pink import PinkActionNoise
from stable_baselines3.common.noise import OrnsteinUhlenbeckActionNoise, NormalActionNoise

**Creating a custom use-case**
Go through this step by step and check the documentation if needed. The docs specify what type of input data is required, what format it should be in, etc.
The code below is commented to provide the most essential information.

Docs: fleetrl.readthedocs.io

**General settings**
Under general settings, you can adjust how many vehicles to optimize for, whether you would like to create new schedules how long the episodes should be, etc.
There is a pre-trained agent for the 1-EV environments. So you can give this a try before training your own agents and just benchmark performances and evaluate your environment. Once you are sure everything is set up correctly, increase the number of EVs and train your own agents.

In [None]:
# define fundamental parameters
# data path to inputs folder with schedule csv, prices, load, pv, etc.
input_data_path: str = "inputs"  # NOTE: either use "/" (linux) or "\\" (windows)
run_name: str = "Test_run_custom"  # Change this name or make it dynamic, e.g. with a timestamp
n_train_steps = 48  # number of hours in a training episode
n_eval_steps = 48  # number of hours in one evaluation episode
n_eval_episodes = 1  # number of episodes for evaluation
n_evs = 1  # number of evs
n_envs = 2  # number of envs parallel - has to be equal to 1, if train_freq = (1, episode) or default setting
time_steps_per_hour = 4  # temporal resolution
use_case: str = "custom"  # for file name - lmd=last mile delivery, by default can insert "lmd", "ct", "ut", "custom"
custom_schedule_name = "1_lkw.csv"  # name for custom schedule if you have generated one. If you want to generate one this time, this field will be ignored
scenario: Literal["arb", "tariff"] = "tariff"  # arbitrage or tariff. Arbitrage allows for bidirectional spot trading, no fees. Tariff models commercial tariff
gen_new_schedule = True  # generate a new schedule - refer to schedule generator documentation and adjust statistics in config.json
gen_new_test_schedule = True  # generate a new schedule for agent testing

real_time = False  # Experimental - leave False for now but in the future FleetRL will be able to handle real-time data with arbitrary time resolution

**Training settings**
These more low-level settings allow you to change training-related parameters. Refer to the documentation of FleetRL and stable-baselines3 for further details. Observations are by default normalized within SB3, due to their rolling average normalization. You can also conduct absolute normalization via FleetRL.

Adapt total training steps and saving interval for a full run.

In [None]:
# training parameters
norm_obs_in_env = False  # normalize observations within FleetRL (max, min normalization)
vec_norm_obs = True  # normalize observations in SB3 (rolling normalization)
vec_norm_rew = True  # normalize rewards in SB3 (rolling normalization)

# Total steps should be sep to 1e6 or 5e6 for a full run. Check tensorboard for stagnating reward signal and stop training at some point to avoid overfit
total_steps = int(1e3)  # total training time steps

# Specifies how often you want to make an intermediate artifact. For a full run, I recommend every 50k - 100k steps, so you can backtrack for best model
saving_interval = 5e2  # interval for saving the model

**Parameters for environment object creation**
Further settings can be adjusted below, view the comments and docs for more detailed explanations.
Most important:
- Episode length: How long an episode is in hours - at least 36 hours are recommended so the agent always sees one passage of a night for night charging
- include_building, include_pv, include_price: These adjust the shape of the observations to make the problem simpler or more complex
- price_lookahead, bl_pv_lookahead: These dictate how much knowledge into the future the agent has on price, building load and PV in hours
- Time picker: Use random during training: This way, a new episode always starts at a random point in the dataframe
- Deg_emp: For simple degradation, set to True
- Ignore_x_reward: Set accordingly with Include_x... or deactivate certain parts of the reward function to adjust problem complexity

In [None]:
# environment arguments - adjust settings if necessary
# additional settings can be changed in the config files
env_config = {"data_path": input_data_path,
              # Specify file names: there is a naming convention for default files, otherwise, custom name is used
              "schedule_name": (str(n_evs) + "_" + str(use_case) + ".csv") if use_case != "custom" else custom_schedule_name,
              "building_name": "load_" + str(use_case) + ".csv" if use_case != "custom" else "load_lmd.csv",
              "pv_name": None,  # if separate file for PV inputs, specify here, otherwise, uses "PV" column in building_name
              # Define use case
              "use_case": use_case,
              # Change observation space
              "include_building": True,  # False removes building load from Observation
              "include_pv": True,  # False removes PV from Observation
              "include_price": True,  # False removes electricity prices from Observation
              "price_lookahead": 8,  # Hours seen into the future
              "bl_pv_lookahead": 4,  # Hours seen into the future
              "time_steps_per_hour": 4,  # Time resolution
              # Specify time picker: "eval", "static", or "random" are implemented
              "time_picker": "random",  # Pick a random starting day in the schedule dataframe
              # Pick degradation methodology: True sets empirical degradation from experimental degradation
              "deg_emp": False,  # empirical degradation calculation
              # Shape reward function
              "ignore_price_reward": False,  # True sets price-related reward coefficient to 0
              "ignore_invalid_penalty": False,  # True ignores penalties on invalid actions (charging an empty spot)
              "ignore_overcharging_penalty": False,  # True ignores penalties on charging signals above target SOC
              "ignore_overloading_penalty": False,  # True ignores grid connection overloading penalty
              # Set episode length during training
              "episode_length": n_train_steps,  # in hours
              # Additional parameters
              "normalize_in_env": norm_obs_in_env,  # Conduct normalization within FleetRL.
              "verbose": 0,  # Print statements, can slow down FPS
              "aux": True,  # Include auxiliary data (recommended). Check documentation for more information.
              "log_data": False,  # Log data (Makes most sense for evaluation runs)
              "calculate_degradation": True,  # Calculate SOH degradation (Can slow down FPS)
              # Target SOC
              "target_soc": 0.85,  # Signals that would charge above target SOC are clipped.
              # settings regarding the generation of evs
              "gen_schedule": gen_new_schedule,  # generate a new schedule
              "gen_start_date": "2021-01-01 00:00",  # if new schedule, start date
              "gen_end_date": "2021-12-31 23:59:59",  # if new schedule, end date
              "gen_name": "my_custom_schedule.csv",  # name of newly generated schedule
              "gen_n_evs": 1,  # number of EVs in new schedule, per EV it takes ca. 10-20 min.
              # seed for random number generation
              "seed": 42,  # Seed for RNG - can be set to None so always random
              # flag to optionally use real-time functions in FleetRL: no resampling of data, taking it in as is and
              # using the event manager module to decide whether to perform an update or not
              "real_time": real_time,
              # if you are comparing cars with different bess sizes, use this to norm their reward function range
              "max_batt_cap_in_all_use_cases": 600,
              "init_battery_cap": 600,
              # initial state of health of the battery
              "init_soh": 1.0,
              "min_laxity": 1.75,
              "obc_max_power": 250,
              # custom schedule timing settings, mean and standard deviation
              "custom_weekday_departure_time_mean": 7,
              "custom_weekday_departure_time_std": 1,
              "custom_weekday_return_time_mean": 19,
              "custom_weekday_return_time_std": 1,
              "custom_weekend_departure_time_mean": 9,
              "custom_weekend_departure_time_std": 1.5,
              "custom_weekend_return_time_mean": 17,
              "custom_weekend_return_time_std": 1.5,
              "custom_earliest_hour_of_departure": 3,
              "custom_latest_hour_of_departure": 11,
              "custom_earliest_hour_of_return": 12,
              "custom_latest_hour_of_return": 23,
              # custom distance settings
              "custom_weekday_distance_mean": 300,
              "custom_weekday_distance_std": 25,
              "custom_weekend_distance_mean": 150,
              "custom_weekend_distance_std": 25,
              "custom_minimum_distance": 20,
              "custom_max_distance": 400,
              # custom consumption data for vehicle
              "custom_consumption_mean": 1.3,
              "custom_consumption_std": 0.167463672468669,
              "custom_minimum_consumption": 0.3,
              "custom_maximum_consumption": 2.5,
              "custom_maximum_consumption_per_trip": 500,
              # custom ev-related settings
              "custom_ev_charger_power_in_kw": 120,
              "custom_ev_battery_size_in_kwh": 600,
              "custom_grid_connection_in_kw": 500
              }

# commercial tariff scenario, fixed fee on spot price (+10 ct/kWh, and a 50% mark-up)
# Feed-in tariff orientates after PV feed-in, with 25% deduction
if scenario == "tariff":
    env_config["spot_markup"] = 10
    env_config["spot_mul"] = 1.5
    env_config["feed_in_ded"] = 0.25
    env_config["price_name"] = "spot_2021_new.csv"
    env_config["tariff_name"] = "fixed_feed_in.csv"

# arbitrage scenario, up and down prices are spot price, no markups or taxes
elif scenario == "arb":
    env_config["spot_markup"] = 0
    env_config["spot_mul"] = 1
    env_config["feed_in_ded"] = 0
    env_config["price_name"] = "spot_2021_new.csv"
    env_config["tariff_name"] = "spot_2021_new_tariff.csv"

**Environment object creation**
Vec_Env are created to enable multi-processing.

Train_vec_env: For agent training
Eval_vec_env: For agent evaluation during training on same csv file (70% training data, 30% evaluation data)

In [None]:
env_kwargs = {"env_config": env_config}

train_vec_env = make_vec_env(FleetEnv,
                             n_envs=n_envs,
                             vec_env_cls=SubprocVecEnv,
                             env_kwargs=env_kwargs,
                             seed=env_config["seed"])

train_norm_vec_env = VecNormalize(venv=train_vec_env,
                                  norm_obs=vec_norm_obs,
                                  norm_reward=vec_norm_rew,
                                  training=True,
                                  clip_reward=10.0)

env_config["time_picker"] = "eval"

if gen_new_schedule:
    env_config["gen_schedule"] = False
    env_config["schedule_name"] = env_config["gen_name"]

env_kwargs = {"env_config": env_config}

eval_vec_env = make_vec_env(FleetEnv,
                            n_envs=n_envs,
                            vec_env_cls=SubprocVecEnv,
                            env_kwargs=env_kwargs,
                            seed=env_config["seed"])

eval_norm_vec_env = VecNormalize(venv=eval_vec_env,
                                  norm_obs=vec_norm_obs,
                                  norm_reward=vec_norm_rew,
                                  training=True,
                                  clip_reward=10.0)

If you want to conduct simple analytics on a FleetEnv object, it is recommended to create it directly instead over make_vec_env (which makes debugging more difficult due to sub-processes)

FleetEnv objects can be created with a one-liner via specifying where the config file is located.

In [None]:
# As alternative, FleetEnv objects can be created via config.json files.
test_env = FleetEnv("config.json")

This creates a schedule for testing the trained agents on unseen data.
It is recommended to generate a testing schedule along with a newly generated training schedule.

In [None]:
if gen_new_test_schedule:
    # generate an evaluation schedule
    test_sched_name = env_config["gen_name"]
    if not test_sched_name.endswith(".csv"):
        test_sched_name = test_sched_name + "_test" + ".csv"
    else:
        test_sched_name = test_sched_name.strip(".csv")
        test_sched_name = test_sched_name + "_test" + ".csv"

    env_config["gen_schedule"] = True
    env_config["gen_name"] = test_sched_name

    env_kwargs = {"env_config": env_config}

    test_vec_env = make_vec_env(FleetEnv,
                                n_envs=1,
                                vec_env_cls=SubprocVecEnv,
                                env_kwargs=env_kwargs,
                                seed=env_config["seed"])

    env_config["gen_schedule"] = False
    env_config["schedule_name"] = test_sched_name

    env_kwargs = {"env_config": env_config}

test_vec_env = make_vec_env(FleetEnv,
                            n_envs=n_envs,
                            vec_env_cls=SubprocVecEnv,
                            env_kwargs=env_kwargs,
                            seed=env_config["seed"])

test_norm_vec_env = VecNormalize(venv=test_vec_env,
                                 norm_obs=vec_norm_obs,
                                 norm_reward=vec_norm_rew,
                                 training=True,
                                 clip_reward=10.0)

Callbacks are regularly called during training and enable useful functionalities such as logging or progress reporting. View SB3 docs for further information. Note that wandb callbacks are possible with SB3.

Eval callback triggers an evaluation at fixed intervals
HyperParamCallback logs hyperparameters, also visible in TensorBoard
ProgressBar indicated progress of an epoch

In [None]:
eval_callback = EvalCallback(eval_env=eval_norm_vec_env,
                             warn=True,
                             verbose=1,
                             deterministic=True,
                             eval_freq=max(10000 // n_envs, 1),
                             n_eval_episodes=5,
                             render=False,
                             )

class HyperParamCallback(BaseCallback):
    """
    Saves hyperparameters and metrics at start of training, logging to tensorboard
    """

    def _on_training_start(self) -> None:
        hparam_dict = {
            "algorithm": self.model.__class__.__name__,
            "learning rate": self.model.learning_rate,
            "gamma": self.model.gamma,
        }

        metric_dict = {
            "rollout/ep_len_mean": 0,
            "train/value_loss": 0.0,
        }

        self.logger.record(
            "hparams",
            HParam(hparam_dict, metric_dict),
            exclude=("stdout", "log", "json", "csv")
        )

    def _on_step(self) -> bool:
        return True

progress_bar = ProgressBarCallback()

## wandb callback possible, check documentation of SB3 and wandb

In [None]:
hyperparameter_callback = HyperParamCallback()

If you use TD3, pink action noise is said to improve performance
If you use PPO, this is not used

In [None]:
# model-related settings
n_actions = train_norm_vec_env.action_space.shape[-1]
param_noise = None
noise_scale = 0.1
seq_len = n_train_steps * time_steps_per_hour
action_noise = PinkActionNoise(noise_scale, seq_len, n_actions)

You can choose to create your own agent of load an existing one. A pretrained PPO agent exists for a 1 EV environment.

In [None]:
model = PPO.load(path="./rl_agents/trained_agents/LMD_arbitrage_1e6_steps_example_agent/PPO-fleet_LMD_2021_arbitrage_PPO_mul3.zip", env=train_norm_vec_env)

If you want to use a loaded agent, you can skip the next 5 cells below

In [None]:
model = PPO(policy="MlpPolicy",
            verbose=1, # setting verbose to 0 can introduce performance increases in jupyterlab environments
            env=train_norm_vec_env,
            tensorboard_log="./rl_agents/trained_agents/tb_log")

# might introduce performance increases
            # gamma=0.99,
            # learning_rate=0.0005,
            # batch_size=128,
            # n_epochs=8,
            # gae_lambda=0.9,
            # clip_range=0.2,
            # clip_range_vf=None,
            # normalize_advantage=True,
            # ent_coef=0.0008,
            # vf_coef=0.5,
            # max_grad_norm=0.5,
            # n_steps=2048)

In [None]:
# NOTE: make the Notebook trusted
%reload_ext tensorboard
%tensorboard --logdir ./rl_agents/trained_agents/tb_log --port 6006 --bind_all

In [None]:
# NOTE: make the Notebook trusted
%reload_ext tensorboard
%tensorboard --logdir ./rl_agents/trained_agents/tb_log --port 6006 --bind_all

In [None]:
comment = run_name
time_now = int(time.time())
trained_agents_dir = f"./rl_agents/trained_agents/vec_PPO_{time_now}_{run_name}"
logs_dir = f"{trained_agents_dir}/logs/"

if not os.path.exists(trained_agents_dir):
    os.makedirs(trained_agents_dir)

if not os.path.exists(logs_dir):
    os.makedirs(logs_dir)

In [None]:
# model training
# models are saved in a specified interval: once with unique step identifiers
# model and the normalization metrics are saved as well, overwriting the previous file every time
for i in range(0, int(total_steps / saving_interval)):
    model.learn(total_timesteps=saving_interval,
                reset_num_timesteps=False,
                tb_log_name=f"PPO_{time_now}_{comment}",
                callback=[eval_callback, hyperparameter_callback, progress_bar])

    model.save(f"{trained_agents_dir}/{saving_interval * i}")

    # Don't forget to save the VecNormalize statistics when saving the agent
    tmp_dir = f"{trained_agents_dir}/tmp/"
    model_path = tmp_dir + f"PPO-fleet_{comment}_{time_now}"
    model.save(model_path)
    stats_path = os.path.join(tmp_dir, f"vec_normalize-{comment}_{time_now}.pkl")
    train_norm_vec_env.save(stats_path)

Agent evaluation
- Sets the time_picker to static, so an evaluation always starts at the same point in time
- Usually at the beginning of the year and then runs for a long time, e.g. a whole year to test the agent's ability to handle large amounts of unseen data
- Make sure that the environment passed to the trained agent includes the unseen schedule
- For that you should use test_norm_vec_env with the _test.csv schedule

- Data is logged to be used for data analytics later. Every datapoint during the evaluation is tracked at every timestep

In [None]:
# environment arguments for evaluation
env_config["time_picker"] = "static"  # Pick a random starting day in the schedule dataframe
env_config["log_data"] = True,  # Log data (Makes most sense for evaluation runs)

env_kwargs = {"env_config": env_config}

Creation of Evaluation object

In [None]:
eval: Evaluation = BasicEvaluation(n_steps=n_eval_steps,
                                   n_evs=n_evs,
                                   n_episodes=n_eval_episodes,
                                   n_envs=1)

- Use this is you are using a loaded agent. Otherwise, skip this cell
- If you have just trained an agent, stats_path and model_path have been automatically defined

In [None]:
stats_path = "./rl_agents/trained_agents/LMD_arbitrage_1e6_steps_example_agent/vec_normalize-LMD_2021_arbitrage_PPO_mul3.pkl"
model_path = "./rl_agents/trained_agents/LMD_arbitrage_1e6_steps_example_agent/PPO-fleet_LMD_2021_arbitrage_PPO_mul3.zip"

Saves the log from agent evaluation

In [None]:
rl_log = eval.evaluate_agent(env_kwargs=env_kwargs, norm_stats_path=stats_path, model_path=model_path, seed=env_config["seed"])

The cells below define benchmarks and run them, saving the logs in the same way as RL evaluation
- Uncontrolled charging: Plug in on arrival
- LP: Linear-programming model - requires an LP solver (on Windows you might run into trouble when doing pip install glpk or similar due to C++ or MSVC. In that case you can try using a Conda environment for the repo and it should work.)
  - Gurobi requires a license to be specified
- Dist: Distributed charging, charges such that the car is completely filled up the moment before it leaves, charging the whole time with evenly distributed power
- Night: Charging at night only at set times, e.g. from 1am to 5am

In [None]:
uncontrolled_charging: Benchmark = Uncontrolled(n_steps=n_eval_steps,
                                                n_evs=n_evs,
                                                n_episodes=n_eval_episodes,
                                                n_envs=1,
                                                time_steps_per_hour=time_steps_per_hour)

uc_log = uncontrolled_charging.run_benchmark(env_kwargs=env_kwargs, use_case=use_case, seed=env_config["seed"])

To try out linear optimisation, glpk must be installed. Alternatively, you can use your gurobi license. Simply swap out "glpk" for "gurobi" in linear_optimization.py in line 224

In [None]:
lp: Benchmark = LinearOptimization(n_steps=n_eval_steps,
                                   n_evs=n_evs,
                                   n_episodes=n_eval_episodes,
                                   n_envs=1,
                                   time_steps_per_hour=time_steps_per_hour)

lp_log = lp.run_benchmark(env_kwargs=env_kwargs, use_case=use_case, seed=env_config["seed"])

In [None]:
dist: Benchmark = DistributedCharging(n_steps=n_eval_steps, n_evs=n_evs, n_episodes=n_eval_episodes, n_envs=1, time_steps_per_hour=time_steps_per_hour)

dist_log = dist.run_benchmark(env_kwargs=env_kwargs, use_case=use_case, seed=env_config["seed"])

In [None]:
night: Benchmark = NightCharging(n_steps=n_eval_steps, n_evs=n_evs, n_episodes=n_eval_episodes, n_envs=1, time_steps_per_hour=time_steps_per_hour)

night_log = night.run_benchmark(env_kwargs=env_kwargs, use_case=use_case, seed=env_config["seed"])

You can plot the average actions of the benchmarks. Just provide the log dataframes

In [None]:
lp.plot_benchmark(lp_log)

In [None]:
uncontrolled_charging.plot_benchmark(uc_log)

This showcases some evaluation methods that have already been written for easy comparison
You can, of course, create new analytics from the log dataframes

In [None]:
eval.compare(rl_log=rl_log, benchmark_log=uc_log)
eval.compare(rl_log=rl_log, benchmark_log=lp_log)
eval.plot_soh(rl_log=rl_log, benchmark_log=uc_log)
eval.plot_soh(rl_log=rl_log, benchmark_log=dist_log)
eval.plot_soh(rl_log=rl_log, benchmark_log=night_log)
eval.plot_violations(rl_log=rl_log, benchmark_log=uc_log)
eval.plot_action_dist(rl_log=rl_log, benchmark_log=uc_log)

The detailed plot shows what is happening during certain days of the simulation. You can add as many benchmark logs as you have compiled, the figure will automatically adjust based on number of EVs, number of logs, etc.

In [None]:
detailed_fig = eval.plot_detailed_actions(start_date='2021-01-01 00:00',
                                          end_date='2021-01-04 00:10',
                                          rl_log=rl_log,
                                          uc_log=uc_log,
                                          dist_log=dist_log,
                                          night_log=night_log)

detailed_fig.show()