# Simulations with reinforcement learning

In this notebook you can run simulations with 5 different reinforcement learning algorithms. There are different example buildings that can be simulated.

<br>

Things to focus on:
-  Why is it making more sub_runs than episodes??
-  There are only 10 episodes sub-files that stay. The rest is deleted??
-  Better to make a list with the mean rewards, so we can make plots after running.
-  Now, the room is always on temperature, how to see this. Weekends off? only on on working hours? 





In [49]:
import sinergym
from sinergym.utils.callbacks import LoggerEvalCallback
from sinergym.utils.rewards import *
from sinergym.utils.wrappers import LoggerWrapper
from datetime import datetime
import gym
from stable_baselines3 import DQN, DDPG, PPO, A2C, SAC, TD3 

from stable_baselines3.common.callbacks import CallbackList
from stable_baselines3.common.vec_env import DummyVecEnv
import numpy as np


Next you can set different variables for het simulation. You can choose the period that you want to simulate.

(For now the year is always 1991, has something to do with the current weather file, which is standard in the environment. Can be changed later on)

<br>

You also set the reward function here. Now it is set to exponential.





In [50]:
#environment = "Eplus-demo-v1"
environment  = "Eplus-5Zone-hot-continuous-v1"
#environment = "Eplus-5Zone-mixed- continuous-v1"
weather = "USA_NY_New.York-J.F.Kennedy.Intl.AP.744860_TMY3.epw"



episodes = 25
experiment_date = datetime.today().strftime('%Y-%m-%d %H:%M')

#choose the simulation period
begin_day = 1
begin_month = 1
begin_year = 2022
end_day = 1
end_month = 2
end_year = 2022


# register run name
name = F"{environment}-episodes_{episodes}({experiment_date})"

# Set to one month only to reduce running time
extra_params={'timesteps_per_hour' : 4,
              'runperiod' : (begin_day,begin_month,begin_year,end_day,end_month,end_year)}

#env = gym.make(environment, config_params=extra_params)



env = gym.make(environment, weather_file = weather,reward=ExpReward ,config_params = extra_params)
                                                # ,
                                                # reward_kwargs={
                                                #                     'temperature_variable': 'Zone Air Temperature (SPACE1-1)',
                                                #                     'energy_variable': 'Facility Total HVAC Electricity Demand Rate(Whole Building)',
                                                #                     'range_comfort_winter': (20.0, 23.5),
                                                #                     'range_comfort_summer': (22.0, 26.0)})
                                                                    #,'energy_weight': 0.5})



[2023-01-05 09:46:56,682] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-01-05 09:46:56,682] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-01-05 09:46:56,682] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-01-05 09:46:56,682] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-01-05 09:46:56,682] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-01-05 09:46:56,682] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf ExternalInterface object if it is not present...
[2023-01-05 09:46:56,708] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Updating idf Site:Location and SizingPeriod:DesignDay(s) to w

  logger.warn(


We can add also a Wrapper to the environment, we are going to use a Logger (extension of ``gym.Wrapper``) this is used to monitor and log the interactions with the environment and save the data into a CSV.

<br>

Need to change the output of the monitor file more. I want the temperatures for all the thermal zones. Now only space1-1 is shown.

<br>


CHECK HOW TO DO THIS

In [51]:
env = LoggerWrapper(env)


At this point we have the environment all set up and ready to be used to define and create our learning model in this case it's going to be a DQN, but we can use any other (have a look at the `DRL_battery.py` and read :ref:`Deep Reinforcement Learning Integration` for more detailed information on available DRL algorithms).

You can choose the following algorithms:
- DQN
- DDPG
- A2C
- PPO
- SAC
- TD3


In [52]:
#model = DQN('MlpPolicy', env, verbose=1)


In [53]:
model = PPO('MlpPolicy', env, verbose=1)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


In [None]:
model = A2C('MlpPolicy', env, verbose=1)

Now we need to calculate the number of timesteps of each episode for the evaluation.

In [54]:
n_timesteps_episode = env.simulator._eplus_one_epi_len / \
                      env.simulator._eplus_run_stepsize
#n_timesteps_episode = 300

print(n_timesteps_episode)
print((4*24*365)/12+(4*24))

3072.0
3016.0


Now we need to create a vectorized wrapper for the environment because the callbacks we are going to use require a vector.

In [55]:
env_vec = DummyVecEnv([lambda: env])

We are going to use the LoggerEval callback to print and save the best model evaluated during training.

In [56]:
callbacks = []

# Set up Evaluation and saving best model
eval_callback = LoggerEvalCallback(
    env_vec,
    best_model_save_path='best_model/' + name + '/',
    log_path='best_model/' + name + '/',
    eval_freq=n_timesteps_episode * 2,
    deterministic=True,
    render=False,
    n_eval_episodes=2)
callbacks.append(eval_callback)

callback = CallbackList(callbacks)

This is the number of total time steps for the training.

In [57]:
timesteps = episodes * n_timesteps_episode
print(timesteps)

76800.0


In [58]:
print(env.get_zones())

['plenum-1', 'space1-1', 'space2-1', 'space3-1', 'space4-1', 'space5-1']


Now you can train the model with the callbacks defined earlier. Takes a lot of runnig time.

<br>

For two months and 6 episodes it is taking 17,5 mins

Two months 3 episodes is almost 8 min

One month 50 episodes is 108 min (126 sub_runs created)


PPO, one month 100 episodes: 205 minutes

PPO, one month, 10 episodes: +- 20 min

PPO, one month, 200 episodes: 576 min

In [59]:
model.learn(
    total_timesteps=timesteps,
    callback=callback,
    log_interval=1)

[2023-01-05 09:47:05,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-01-05 09:47:05,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-01-05 09:47:05,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-01-05 09:47:05,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-01-05 09:47:05,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-01-05 09:47:05,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-01-05 09:47:05,955] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:EnergyPlus working directory is in /workspaces/sinergym/examples/Eplus-env-5Zone-hot-continuous-v1-res1/Eplus-env-sub_run1
[2023-01-05 09:47:05,955] EPLUS_EN

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
  ret = ret.dtype.type(ret / rcount)


[2023-01-05 09:49:42,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-01-05 09:49:42,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-01-05 09:49:42,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-01-05 09:49:42,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-01-05 09:49:42,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-01-05 09:49:42,303] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:EnergyPlus episode completed successfully. 
[2023-01-05 09:49:42,402] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode...
[2023-01-05 09:49:42,402] EPLUS_ENV_5Zone-hot-continuous-v1_MainThread_ROOT INFO:Creating new EnergyPlus simulation episode

<stable_baselines3.ppo.ppo.PPO at 0x7f7181a26320>

In [84]:
model.save(env.simulator._env_working_dir_parent + '/' + name)|

In [60]:
env.close()|


FileNotFoundError: [Errno 2] No such file or directory: '/workspaces/sinergym/examples/Eplus-env-5Zone-hot-continuous-v1-res1/Eplus-env-sub_run62/monitor.csv'