<a href="https://colab.research.google.com/github/intelligent-environments-lab/CityLearn/blob/master/examples/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# QuickStart

Install the latest CityLearn version from PyPi with the :code:`pip` command:

In [None]:
pip install CityLearn==2.0b2

## Centralized RBC
Run the following to simulate an environment controlled by centralized RBC agent for a single episode:

In [2]:
from citylearn.citylearn import CityLearnEnv
from citylearn.agents.rbc import BasicRBC as RBCAgent

dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=1000)
model = RBCAgent(env)
model.learn(episodes=1)

# print cost functions at the end of episode
kpis = model.env.evaluate().pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

name,Building_1,Building_2,Building_3,Building_4,Building_5,District
cost_function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
annual_peak_average,,,,,,1.095001
carbon_emissions_total,1.102525,1.121603,1.160467,1.288898,1.156323,1.165963
cost_total,1.051627,1.049339,1.099391,1.238803,1.060591,1.09995
daily_peak_average,,,,,,1.127063
discomfort_delta_average,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_maximum,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_minimum,0.0,0.0,0.0,0.0,0.0,0.0
electricity_consumption_total,1.154034,1.201578,1.22107,1.351636,1.251857,1.236035
one_minus_load_factor_average,,,,,,0.995416
ramping_average,,,,,,1.157388


## Decentralized-Independent SAC

Run the following to simulate an environment controlled by decentralized-independent SAC agents for 1 training episode:

In [3]:
from citylearn.citylearn import CityLearnEnv
from citylearn.agents.sac import SAC as RLAgent

dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=1000)
model = RLAgent(env)
model.learn(episodes=2, deterministic_finish=True)

# print cost functions at the end of episode
kpis = model.env.evaluate().pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

name,Building_1,Building_2,Building_3,Building_4,Building_5,District
cost_function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
annual_peak_average,,,,,,1.0
carbon_emissions_total,1.0,1.0,1.00534,1.0,1.007307,1.002529
cost_total,1.0,1.0,1.004709,1.0,1.00614,1.00217
daily_peak_average,,,,,,1.000939
discomfort_delta_average,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_maximum,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_minimum,0.0,0.0,0.0,0.0,0.0,0.0
electricity_consumption_total,1.0,1.0,1.005122,1.0,1.007551,1.002535
one_minus_load_factor_average,,,,,,0.99917
ramping_average,,,,,,0.999745


## Decentralized-Cooperative MARLISA

Run the following to simulate an environment controlled by decentralized-cooperative MARLISA agents for 1 training episode:

In [4]:
from citylearn.citylearn import CityLearnEnv
from citylearn.agents.marlisa import MARLISA as RLAgent

dataset_name = 'citylearn_challenge_2022_phase_1'
env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=1000)
model = RLAgent(env)
model.learn(episodes=2, deterministic_finish=True)

kpis = model.env.evaluate().pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

name,Building_1,Building_2,Building_3,Building_4,Building_5,District
cost_function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
annual_peak_average,,,,,,0.999291
carbon_emissions_total,1.0,1.0,1.001302,1.010642,1.005289,1.003447
cost_total,1.0,1.0,1.000672,1.009545,1.004885,1.003021
daily_peak_average,,,,,,1.00067
discomfort_delta_average,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_maximum,0.0,0.0,0.0,0.0,0.0,0.0
discomfort_delta_minimum,0.0,0.0,0.0,0.0,0.0,0.0
electricity_consumption_total,1.0,1.0,1.002234,1.009856,1.004656,1.003349
one_minus_load_factor_average,,,,,,0.998677
ramping_average,,,,,,0.999372


## Stable Baselines3 Reinforcement Learning Algorithms

Install the latest version of Stable Baselines3:

In [None]:
pip install stable-baselines3

Before the environment is ready for use in Stable Baselines3, it needs to be wrapped. Firstly, wrap the environment using the `NormalizedObservationWrapper` (see [docs](https://www.citylearn.net/api/citylearn.wrappers.html#citylearn.wrappers.NormalizedObservationWrapper)) to ensure that observations served to the agent are min-max normalized between [0, 1] and cyclical observations e.g. hour, are encoded using the cosine transformation.

Next, we wrap with the `StableBaselines3Wrapper` (see [docs](https://www.citylearn.net/api/citylearn.wrappers.html#citylearn.wrappers.StableBaselines3Wrapper)) that ensures observations, actions and rewards are served in manner that is compatible with Stable Baselines3 interface.

For the following Stable Baselines3 example, the `baeda_3dem` dataset that support building temperature dynamics is used.

> ⚠️ **NOTE**: `central_agent` in the `env` must be `True` when using Stable Baselines3  as it does not support multi-agents.

In [5]:
from stable_baselines3.sac import SAC
from citylearn.citylearn import CityLearnEnv
from citylearn.wrappers import NormalizedObservationWrapper, StableBaselines3Wrapper

dataset_name = 'baeda_3dem'
env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=1000)
env = NormalizedObservationWrapper(env)
env = StableBaselines3Wrapper(env)
model = SAC('MlpPolicy', env)
model.learn(total_timesteps=env.time_steps*2)

# evaluate
observations = env.reset()

while not env.done:
    actions, _ = model.predict(observations, deterministic=True)
    observations, _, _, _ = env.step(actions)

kpis = env.evaluate().pivot(index='cost_function', columns='name', values='value')
kpis = kpis.dropna(how='all')
display(kpis)

name,Building_1,Building_2,Building_3,Building_4,District
cost_function,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
annual_peak_average,,,,,0.652018
cost_total,0.55984,0.664455,0.694391,0.718245,0.659233
daily_peak_average,,,,,0.711316
discomfort_delta_average,0.023803,0.70325,3.912871,1.861718,1.625411
discomfort_delta_maximum,2.824947,2.61878,6.868706,7.118988,4.857855
discomfort_delta_minimum,-4.086712,-2.64057,-3.700124,-2.782747,-3.302538
discomfort_proportion,0.384314,0.446996,0.876979,0.637155,0.586361
discomfort_too_cold_proportion,0.247059,0.008834,0.003654,0.004354,0.065975
discomfort_too_hot_proportion,0.137255,0.438163,0.873325,0.632801,0.520386
electricity_consumption_total,0.651177,0.691996,0.750006,0.750259,0.710859
