# Topic I, Lab 2

*  we will build on a previous lab - using CityLearn library
*  we will develop a simple data generator and will compare the behaviour of various policies
*  one of the paper discussed the Differenciable MPC, so we will also play with it

## CityLearn

CityLearn ([Nweye et al., 2024](https://doi.org/10.48550/arXiv.2405.03848)) is an open-source Gymnasium environment for the easy implementation and benchmarking of rule-based control (RBC), reinforcement learning control (RLC), and model-predictive control (MPC) algorithms for distributed energy resources (DERs) in a demand response (DR) setting. CityLearn is used to reshape the aggregated electricity load profile by controlling DERs in a district of diverse
buildings, and allows for multi-agent control and district-level key performance indicators (KPIs) evaluation. First developed by [Vazquez-Canteli et al.](https://doi.org/10.1145/3360322.3360998), it has been [applied
extensively for DER control benchmarking in scenarios of
DR ,voltage regulation as well as control policy meta-learning, and transfer
learning](https://www.citylearn.net/index.html#applications).

CityLearn is used in [The CityLearn Challenge](https://www.citylearn.net/citylearn_challenge/index.html) which is an opportunity to compete in investigating the potential of artificial intelligence (AI) and distributed control systems to tackle multiple problems in the built-environment. It attracts a multidisciplinary audience including researchers, industry experts, sustainability enthusiasts and AI hobbyists as a means of crowd-sourcing solutions to these multiple problems.

### Environment

<figure class="image">
  <img src="https://github.com/intelligent-environments-lab/CityLearn/blob/master/assets/images/environment.jpg?raw=true"  width="800" alt="An overview of the heating, ventilation and air conditioning systems, energy storage systems, on-site electricity sources and grid interaction in buildings in the CityLearn environment." style="background-color:white;margin:20px;padding:5px">
  <figcaption>Figure: CityLearn building model including electricity sources that power controllable DERs including electric devices and ESSs, used to satisfy thermal and electrical loads as well as provide the grid with energy flexibility. A distinction is made between environment and control aspects of a building to show the transfer of actions from the control agent and reception of measurable observations by the control agent that quantifies the building's states (<a href="https://doi.org/10.48550/arXiv.2405.03848">Nweye et al., 2024</a>).</figcaption>
</figure>

CityLearn models a district of buildings with similar or different loads, electric devices, energy storage systems (ESSs) and electricity sources that satisfy the loads as shown in the figure above. There is no upper limit on the number of buildings in a district and a district can have as few as one building.

There are up to five loads in a building including space cooling, space heating domestic hot water (DHW) heating, electric equipment, and electric (EV) loads. The space cooling and heating loads refer to the energy needed to maintain the indoor
dry-bulb temperature at its setpoint. A building in CityLearn is modeled as a single thermal zone where space thermal loads affect its indoor dry-bulb temperature. It uses an long-short-term memory regression model to approximate its thermal dynamics that quantifies the effect of the thermal load on temperature. Then, an occupant model has the ability to override the temperature setpoint. The DHW heating load is the total heating energy needed to satisfy hot water end-uses such as shower, bathroom, and kitchen sinks, and other water end uses requiring water heating that are not space heating-related. Electric equipment refer to non-shiftable plug loads such as lighting, entertainment and kitchen appliances. The EV load is the energy required to charge an EV to a scheduled departure state-of-charge (SoC).

Not all loads need to exist in a building e.g., a building situated in a heating dominant climate may not have cooling loads year-round. Also, anyone or all of these loads are either known a priori from building energy performance
simulation (BEPS) or real-world measurement. In these instances, the ideal load must be satisfied. Alternatively, they are controlled loads and are inferred at runtime e.g., heat pump power control drives space cooling or heating loads.

To satisfy these loads in either the ideal or control-action case, CityLearn makes use of heating ventilation and air conditioning (HVAC) systems directly or ESSs through load shifting. The `cooling_device`, `heating_device`, and `dhw_device` are HVAC electric device objects in CityLearn that are used to satisfy the space cooling, space heating and DHW heating loads respectively. The `cooling_device` is a heat pump while the `heating_device` and `dhw_device` are either heat pump or electric heater type. These HVAC systems may be used to charge thermal energy storage (TES) systems in the building.

There are up to five optional and controlled ESSs in a building including `cooling_storage`, `heating_storage`, `dhw_storage`, `electrical_storage`, and `electric_vehicle` ESS objects. The `cooling_storage`, `heating_storage`, and `dhw_storage` are thermal energy storage (TES) DER type and provide space cooling, space heating and DHW heating load shifting flexibility respectively. They are charged by the HVAC device used to meet the thermal load which they service e.g. is the cooling_device charging the `cooling_storage`. The `electrical_storage` is a battery energy storage system (BESS) DER type that powers any of the aforementioned electric devices when in discharge mode or is powered by one or more of the electricity sources. The `electric_vehicle` is an EV DER type and performs similar function as the `electrical_storage`, however, the EV is available on a schedule defined by its arrival and departure
times. The EV can be used in three modes: grid-to-vehicle (G2V), vehicle-to-grid (V2G), and no control (i.e., where the EV acts as a load without any possible control over its charging).

The electric devices are primarily powered by the electric grid. CityLearn at the time of writing, does not include a grid model so the power a building is able to draw from the grid at a given time step is unconstrained, except in the
case of a power outage. Optionally, a building may have a photovoltaic (PV) system that provides self-generation as a first source of electricity before the grid. The optional `electrical_storage` and `electric_vehicle` are charged by the grid and PV but also augment the electricity supply when in discharge mode to supply the building with electricity. Excess self-generation, `electrical_storage`, and `electric_vehicle` discharge are sent to the grid as part of the building's net export.


### Control

<table>
    <tr>
        <th>Name</th>
        <th><code>a</code> range</th>
        <th>Description</th>
    </tr>
    <tr>
        <td colspan="3"><strong>Energy storage system</strong></td>
    </tr>
    <tr>
        <td><code>cooling_storage</code></td>
        <td>[-1, 1]</td>
        <td>Proportion of <code>cooling_storage</code> capacity to be charged (<code>a</code> > 0) or discharged (<code>a</code> < 0).</td>
    </tr>
    <tr>
        <td><code>heating_storage</code></td>
        <td>[-1, 1]</td>
        <td>Proportion of <code>heating_storage</code> capacity to be charged (<code>a</code> > 0) or discharged (<code>a</code> < 0).</td>
    </tr>
    <tr>
        <td><code>dhw_storage</code></td>
        <td>[-1, 1]</td>
        <td>Proportion of <code>dhw_storage</code> capacity to be charged (<code>a</code> > 0) or discharged (<code>a</code> < 0).</td>
    </tr>
    <tr>
        <td><code>electrical_storage</code></td>
        <td>[-1, 1]</td>
        <td>Proportion of <code>electrical_storage</code> capacity to be charged (<code>a</code> > 0) or discharged (<code>a</code> < 0).</td>
    </tr>
    <tr>
        <td><code>electric_vehicle_storage</code></td>
        <td>[-1, 1]</td>
        <td>Proportion of <code>electric_vehicle_storage</code> capacity to be charged (<code>a</code> > 0) or discharged (<code>a</code> < 0).</td>
    </tr>
    <tr>
        <td colspan="3"><strong>Electric device</strong></td>
    </tr>
    <tr>
        <td><code>cooling_device</code></td>
        <td>[0, 1]</td>
        <td>Proportion of space <code>cooling_device</code> nominal power to be supplied.</td>
    </tr>
    <tr>
        <td><code>heating_device</code></td>
        <td>[0, 1]</td>
        <td>Proportion of space <code>heating_device</code> nominal power to be supplied.</td>
    </tr>
</table>

The table above summarizes the continuous control action space in CityLearn where there are five ESS-related actions controlling the proportion of storage capacity to be charged or discharged and two HVAC electric device actions controlling the proportion of nominal power to be supplied. There are as many `electric_vehicle_storage` actions as there are EV chargers in a building.

<figure class="image">
  <img src="https://github.com/intelligent-environments-lab/CityLearn/blob/master/assets/images/gymnasium_interface.jpg?raw=true"  width="200" alt="Farama Foundation Gymnasium interface." style="background-color:white;margin:20px;padding:5px">
  <figcaption>Figure: Farama Foundation Gymnasium interface (<a href="https://zenodo.org/records/10655021">Towers et al., 2023</a>).</figcaption>
</figure>

The CityLearn environment makes use of the Farama Foundation Gymnasium interface for standardized RLC environment design, where there is an observation-action-reward exchange loop between the environment and control agent as the environment transitions from one time step to another. In the current time step, $t$, the control agent receives the environment's observations, $o_t$ and prescribes actions $a_t$. The actions are applied to the environment to affect the observations at the next time step, $o_{t + 1}$. $o_{t + 1}$ and a reward, $r_{t + 1}$ (from reward function, $R$) that quantifies the quality of $a_t$ in optimizing the outcome of a control objective or KPI are returned to the control agent to teach it to learn a control policy, $\pi$. $\pi$ maps actions to observations that maximize the
cumulative reward over an episode i.e., the terminal state of the environment, after initialization ($t = 0$), beyond which there are new observations.

<figure class="image">
  <img src="https://github.com/intelligent-environments-lab/CityLearn/blob/master/assets/images/control_architecture.jpg?raw=true"  width="600" alt="Single-agent (left), independent multi-agent (middle), and coordinated multi-agent (right) control configurations." style="background-color:white;margin:20px;padding:5px">
  <figcaption>Figure: Single-agent (left), independent multi-agent (middle), and coordinated multi-agent (right) control configurations (<a href="https://doi.org/10.48550/arXiv.2405.03848">Nweye et al., 2024</a>).</figcaption>
</figure>

There are three possible control configurations in CityLearn namely; single-agent, independent multi-agent, and coordinated multi-agent as shown in the figure above. In the single-agent configuration, there is a one-to-many relationship between the control agent and buildings where a centralized agent collects observations and prescribes actions for all DERs in the district and, receives a single reward value each time step to learn a generalized control policy. This is akin to an energy aggregator controlling flexible resources in a distributed manner. The independent multi-agent configuration has a one-to-one agent-building relationship thus, there are as many rewards as buildings each time step and a unique control policy is learned for each building. The coordinated multi-agent configuration is similar to the independent multi-agent configuration except that agents can share information to achieve cooperative objectives e.g. district peak reduction or competitive objectives e.g. price bidding in the energy flexibility market.

We emphasize that CityLearn is not limited to RLC algorithms alone despite its Gymnasium interface as it works with other simple control theory algorithms e.g., RBC as well as advanced control theory algorithms e.g., MPC. In RBC, the reward is not utilized while in MPC, the reward is akin to the control objective. Also, RBC policy is static and does not consider the observations
in the following time step to update its rules.

# Task #1 - generating our own data

*  let's consider a house whose isolation is characterized by heat transfer coefficient U.
*  heat transfer coefficient U - determines the amount of constant stream of heat flowing through 1 m2 of the surface of the partition separating two centers at a 1 K temperature difference of both centers. https://www.solbet.pl/en/thermal-isolation/
*  using this characterization we could estimate amount of heat/cooling that is needed for given house based on the difference beween inside and outsize.
*  let us consider that the inside temperature has to be kept between 20-22 degrees C
*  generate a 1 year data of outside temperatures with a winter around -10C and summers climbing to +40C
*  the second important parameter of the house will be the surface area of the house.


In [2]:
# Code to generate data for several houses with different surface areas and heat transfer coefficients
# Missing parameters can be assumed or generated based on your own assumptions

# Task #2 - adding battery storage and PVs
*  extend the data by adding a batteries to each house with a capacity of 80% to 200% of house peak heating/cooling needs
*  extend the data by adding a base load for each house, maybe define a few patterns (e.g. some houses have people that stay there until 8am (then going to work) and then from 6pm until next day
*  another house patterns can be some small company - having peaks from 8am-5pm.
*  the final load of the house will be the termal energy + (base-load * Noise) where Noise will be $\sim N(1,\sigma)$ (feel free to adjust this distribution)

# Task #3 - DP solution for $\sigma=0$
*  let's consider the only cost-measure for the collection of houses to be total energy taken from grid
*  use LP solvers to find an optimal policy (and optimal cost)

# Task #4 - RL
*  train the RL policy (choose reasonable $\sigma$)

# Task #5 - Differenciable MPC
*  follow the idea from the paper to explore the use of Differenciable MPC