# BABY STEPS - Getting Started

```
Author: Chia E Tungom
Email: bamtungom@protonmail.com
```

This Notebook demonstrates the basic facets of the CityLearn Environment. You can play with it to get familiar with the environment.
Important aspects of the environment that covered include include:

1. Observation Space (dataset)

2. Action Space (discrete or continous)

3. Model (Policy)

4. Action (steps)

5. Evaluation (reward)

We use general purpose functions common to most RL environments for illustration.

__Note:__ To run this notebook, place it in the root directory of your CityLearn Phase one repository (same directory as requirements.txt)

__Lets Goooooo!!!__

In [2]:
# To run this example, move this file to the main directory of this repository
from citylearn import CityLearn
import matplotlib.pyplot as plt
from pathlib import Path
from agents.marlisa import MARLISA
import numpy as np
from tqdm import tqdm
import time

  from .autonotebook import tqdm as notebook_tqdm


# 1. Define Environment

The first thing we need to do is create a CityLearn environment. The environment is defined using a json schema and dataset which can be found in the data directory.

In [3]:
# Load environment
climate_zone = 5
params = {'data_path':Path("data/Climate_Zone_"+str(climate_zone)), 
        'building_attributes':'building_attributes.json', 
        'weather_file':'weather_data.csv', 
        'solar_profile':'solar_generation_1kW.csv', 
        'carbon_intensity':'carbon_intensity.csv',
        'building_ids':["Building_"+str(i) for i in [1,2,3,4,5,6,7,8,9]],
        'buildings_states_actions':Path('buildings_state_action_space.json'), 
        'simulation_period': (0, 8760*4-1), 
        'cost_function': ['ramping','1-load_factor','average_daily_peak','peak_demand','net_electricity_consumption','carbon_emissions'], 
        'central_agent': False,
        'save_memory': False }

# Contain the lower and upper bounds of the states and actions, to be provided to the agent to normalize the variables between 0 and 1.
# Can be obtained using observations_spaces[i].low or .high
env = CityLearn(**params)
observations_spaces, actions_spaces = env.get_state_action_spaces()


In [47]:

# Provides information on Building type, Climate Zone, Annual DHW demand, Annual Cooling Demand, Annual Electricity Demand, Solar Capacity, and correllations among buildings
building_info = env.get_building_information()
building_info

{'Building_1': {'building_type': 1,
  'climate_zone': 5,
  'solar_power_capacity (kW)': 120,
  'Annual_DHW_demand (kWh)': 11643.328,
  'Annual_cooling_demand (kWh)': 619542.205,
  'Annual_nonshiftable_electrical_demand (kWh)': 215850.522,
  'Correlations_DHW': {'Building_2': 0.505,
   'Building_3': nan,
   'Building_4': nan,
   'Building_5': 0.338,
   'Building_6': 0.317,
   'Building_7': 0.116,
   'Building_8': 0.186,
   'Building_9': 0.272},
  'Correlations_cooling_demand': {'Building_2': 0.798,
   'Building_3': 0.869,
   'Building_4': 0.802,
   'Building_5': 0.75,
   'Building_6': 0.756,
   'Building_7': 0.67,
   'Building_8': 0.734,
   'Building_9': 0.708},
  'Correlations_non_shiftable_load': {'Building_2': 0.58,
   'Building_3': 0.717,
   'Building_4': 0.566,
   'Building_5': 0.168,
   'Building_6': 0.205,
   'Building_7': -0.162,
   'Building_8': 0.13,
   'Building_9': 0.028}},
 'Building_2': {'building_type': 2,
  'climate_zone': 5,
  'solar_power_capacity (kW)': 0,
  'Annual_D

# 2. OBSERVATION SPACE

The observation space is the data of the environment. This is what the agent sees inorder to decide which action to take.

In the observation space, we see there are 9 different observations corresponding to the 9 buildings. For each building, the observation space is a 1x28 dimensional array of type float

1. Use `env.get_state_action_spaces()[0]` to explore the properties of the environment. Index 0 stands for the observation space
2. Use `env.get_state_action_spaces()[0].sample()` to see a sample observation.

In [4]:
observation  = env.get_state_action_spaces()[0]

print(observation[0])
observation[0].sample()

Box([ 1.    1.    1.   -4.15 -4.41 -4.72 -5.34 14.38 14.33 14.29 14.19  0.
 -0.    0.    0.    0.   -0.    0.    0.   16.8  11.17  7.42  0.    0.
  0.    0.    0.    0.  ], [1.2000000e+01 8.0000000e+00 2.4000000e+01 3.6700001e+01 3.6970001e+01
 3.7279999e+01 3.7910000e+01 1.0000000e+02 1.0250000e+02 1.0500000e+02
 1.0999000e+02 4.9162000e+02 5.0142001e+02 5.1122000e+02 5.3081000e+02
 1.2126200e+03 1.2308300e+03 1.2532200e+03 1.2979900e+03 2.6379999e+01
 9.2389999e+01 7.0910004e+01 9.9999962e+01 1.0000000e+00 1.0000000e+00
 1.0000000e+00 3.3668436e+02 6.8709970e-01], (28,), float32)


array([ 5.65421915e+00,  7.93409348e+00,  6.29955769e+00,  2.02119331e+01,
        2.79655533e+01,  2.42183056e+01, -5.20351791e+00,  4.98451920e+01,
        1.00471237e+02,  3.41179314e+01,  5.04825516e+01,  1.70586945e+02,
        1.17957466e+02,  1.46562485e+02,  4.56233795e+02,  1.01897644e+03,
        3.45226166e+02,  5.23505493e+02,  1.14603259e+03,  1.68150330e+01,
        3.47147331e+01,  5.51451302e+01,  2.04869328e+01,  8.36130142e-01,
        5.89760303e-01,  9.75291967e-01,  3.23234528e+02,  5.44102907e-01],
      dtype=float32)

### Getting Lower and Upperboud of a given observation

You can get the upper and lower bounds of a buildigs observation space by running take the observation space of the building and using the method `high` or `low` respectively

you can further index to get the high or low for a particular observation by adding  the index of the observation


In [52]:
print(env.get_state_action_spaces()[0][0].high)
print(env.get_state_action_spaces()[0][0].low)

[1.2000000e+01 8.0000000e+00 2.4000000e+01 3.6700001e+01 3.6970001e+01
 3.7279999e+01 3.7910000e+01 1.0000000e+02 1.0250000e+02 1.0500000e+02
 1.0999000e+02 4.9162000e+02 5.0142001e+02 5.1122000e+02 5.3081000e+02
 1.2126200e+03 1.2308300e+03 1.2532200e+03 1.2979900e+03 2.6379999e+01
 9.2389999e+01 7.0910004e+01 9.9999962e+01 1.0000000e+00 1.0000000e+00
 1.0000000e+00 3.3668436e+02 6.8709970e-01]
[ 1.    1.    1.   -4.15 -4.41 -4.72 -5.34 14.38 14.33 14.29 14.19  0.
 -0.    0.    0.    0.   -0.    0.    0.   16.8  11.17  7.42  0.    0.
  0.    0.    0.    0.  ]


In [5]:
# Using the normal gym environment commands dont make sense in these case
print(f'OBSERVATION SPACE for Builiding ONE is {env.observation_space}')
print(f'SAMPLE OBSERVATION SPACE for Builiding {env.observation_space.sample(), len(env.observation_space.sample())}')


OBSERVATION SPACE for Builiding ONE is Box([ 1.    1.    1.   -4.15 -4.41 -4.72 -5.34 14.38 14.33 14.29 14.19  0.
 -0.    0.    0.    0.   -0.    0.    0.   16.8  11.17  7.42  0.    0.
  0.    0.    0.    0.   18.91  7.01  1.92  0.    0.    0.    0.    0.
 17.36 13.69  1.77  0.    0.    0.    0.   16.83  9.27  0.44  0.    0.
  0.    0.    0.   21.48 10.18  3.3   0.    0.    0.    0.    0.   22.
 11.73  3.2   0.    0.    0.    0.    0.   21.47 10.57  3.7   0.    0.
  0.    0.    0.   22.41 10.17  2.6   0.    0.    0.    0.    0.   22.29
 10.16  4.6   0.    0.    0.    0.    0.  ], [1.2000000e+01 8.0000000e+00 2.4000000e+01 3.6700001e+01 3.6970001e+01
 3.7279999e+01 3.7910000e+01 1.0000000e+02 1.0250000e+02 1.0500000e+02
 1.0999000e+02 4.9162000e+02 5.0142001e+02 5.1122000e+02 5.3081000e+02
 1.2126200e+03 1.2308300e+03 1.2532200e+03 1.2979900e+03 2.6379999e+01
 9.2389999e+01 7.0910004e+01 9.9999962e+01 1.0000000e+00 1.0000000e+00
 1.0000000e+00 3.3668436e+02 6.8709970e-01 2.8830000e+01 7

# 3. ACTION SPACE

This shows us the type of actions we can take along with the dimension and property (discrete of contineous) of each actions. 

The action space in citylearn is a 1x3 array. where there are 3 distinct continous action each in a given range.

- Based on our environment, the action space for each building a 1x3 array.
- one array is of the form `([lower bounds],[upper bounds], (3,), float32)` which correspond to `[(lower bound, upper bound), (dimension,), datatype]`
- __lower bound__ is the lowest or smallest value of an action while __upper bound__ is the highest. The values in the index of lowerbound and upperboud correspond to a lower and upper bound of a given action
- Dimension stands for  of our action which here is 3 (use `env.get_state_action_spaces()[1][0].sample()` to see an action). `env.get_state_action_spaces()[1]` is the action space and `env.get_state_action_spaces()[1][0]` is the action space of building 1.
- Datatype is the data type of our action which here is float

The actions that can be taken in a building include 
- __"cooling_storage"__: Index 1, 
- __"dhw_storage"__: Index 2, 
- __"electrical_storage"__: index 3

__Note: Buildings have different set of actions e.g__
- _for building 3 and 4 {"cooling_storage": true, "dhw_storage": false, "electrical_storage": true}, only cooling and electrical storage are available_
- Therefore their action spaces are two dimensional with the first index and second index corresponding to cooling and electrical storage respectively 

The cell below illustrates the action space(s). Play with it for understanding the actions.

`actions_spaces[0].sample()` produces a random action for building 1

Note: You must pick an action space of a given building inorder to sample (use index e.g `actions_spaces[0]`)

In [6]:
actions_spaces  = env.get_state_action_spaces()[1]

print(actions_spaces)

for building in range(9):
    print(f' Sample Action for building {building} is >>> { actions_spaces[building].sample()}')

[Box([-0.5 -0.5 -1. ], [0.5 0.5 1. ], (3,), float32), Box([-0.33333334 -0.33333334 -1.        ], [0.33333334 0.33333334 1.        ], (3,), float32), Box([-0.5 -1. ], [0.5 1. ], (2,), float32), Box([-0.6666667 -1.       ], [0.6666667 1.       ], (2,), float32), Box([-0.2857143 -0.6666667 -1.       ], [0.2857143 0.6666667 1.       ], (3,), float32), Box([-0.6666667  -0.33333334 -1.        ], [0.6666667  0.33333334 1.        ], (3,), float32), Box([-0.5 -0.5 -1. ], [0.5 0.5 1. ], (3,), float32), Box([-0.33333334 -0.33333334 -1.        ], [0.33333334 0.33333334 1.        ], (3,), float32), Box([-0.33333334 -0.33333334 -1.        ], [0.33333334 0.33333334 1.        ], (3,), float32)]
 Sample Action for building 0 is >>> [-0.10754889 -0.39910504  0.8946236 ]
 Sample Action for building 1 is >>> [-0.04923359 -0.30237332 -0.15773708]
 Sample Action for building 2 is >>> [0.35133678 0.46253598]
 Sample Action for building 3 is >>> [-0.44977948  0.6696908 ]
 Sample Action for building 4 is >>> [

In [7]:
# Using the normal gym environment commands dont make sense in this case

print(f' ACTION SPACES {env.action_space}')
print(f' ACTION SPACE for Builiding is {env.action_space.sample()}')

# sample some actions
# for action in range(5):
#     print(f' SAMPLE ACTION SPACE for Builiding ONE >>> {env.action_space[1].sample()}')

# we can observe the actions are continous in the range [-1,1]

 ACTION SPACES Box([-0.5        -0.5        -1.         -0.33333334 -0.33333334 -1.
 -0.5        -1.         -0.6666667  -1.         -0.2857143  -0.6666667
 -1.         -0.6666667  -0.33333334 -1.         -0.5        -0.5
 -1.         -0.33333334 -0.33333334 -1.         -0.33333334 -0.33333334
 -1.        ], [0.5        0.5        1.         0.33333334 0.33333334 1.
 0.5        1.         0.6666667  1.         0.2857143  0.6666667
 1.         0.6666667  0.33333334 1.         0.5        0.5
 1.         0.33333334 0.33333334 1.         0.33333334 0.33333334
 1.        ], (25,), float32)
 ACTION SPACE for Builiding is [-0.1834386   0.01785553  0.34468722  0.32285252  0.05876467 -0.016072
  0.3878966   0.40696472  0.5071004   0.8418417  -0.08242489 -0.6341966
 -0.74125296  0.25694144 -0.13599339 -0.6389603  -0.43374893  0.16214001
 -0.38510537 -0.05827567  0.19372386 -0.8778577   0.01835014  0.11552128
 -0.8877348 ]


# 4. Define A Model or Agent 

The agent is the Policy which decides what action to take given an observation. We can use Rule based actions(agents). The CityLearn setting is built for multiagent systems but a single agent can aslo be used.

Here we just show how to load an agent 

In [8]:
from agents.sac import SAC

# SAC??

# 5. TAKING AN ACTION

As already explained with the action spaces, $n$ buildings will have $n$ actions with each action corresponding to one building. Therefore our actions should appear as follows

- Action should be a List containing List(number of buildings). inside the list is a list conatining the actions corresponding to the action to be taken for a given building
- Example for a five buildings environment, we could have.
- The first, second and third action for each building corresponds to cooling, dhw and hot water respectively.

``` python

Actions = [ [0.0, 0.0, 0.0 ], [0.0, 0.0, 0.0 ],  ... , [0.0, 0.0, 0.0 ] ]

```

We take an action when we want to move one step ahead. We can do this using `env.step(action)`

When we take an action the output contains a tuple with the following:

1. Next State: Returns 9 arrays each corresponding to a building's next state (index order)
2. Reward: An with nine values each corresponding to reward for one building (index order)
3. If the state is a Terminal State
4. Information about the environment 


In [45]:
# print(env_reset(env)["action_space"])
# env_reset(env)["observation_space"]
# env.reset()[0]
env.reset()
import random
Actions = [([random.uniform(-1,1) for _ in range(3)]) for _ in range(9)]

#------------------generate random actions for each building -----------
Actions = []
for buildinG in range(1,10):
    if (buildinG == 3) or (buildinG == 4):
        # print(" IN building ", buildinG)
        Actions.append([random.uniform(-1,1) for _ in range(2)])
    else:
        # print(" IN building ", buildinG)
        Actions.append([random.uniform(-1,1) for _ in range(3)])

Actions = np.array(Actions)
#------------------------------------------------------------------------

print(f' WE are about to take {Actions} \n')
next_state, reward, terminal, info = env.step(Actions)

print(f' NEXT STATE \n {next_state} \n')
print(f' REWARDS {reward} \n')
print(f' TERMINAL OR NOT >> {terminal} \n')
print(f' INFO {info}')

# obs_dict = env_reset(env)
# agent = OrderEnforcingAgent()
# print(agent.register_reset(obs_dict))
# env.step(agent.register_reset(obs_dict))

 WE are about to take [list([-0.803533978392067, -0.9708785736040213, 0.00045900101177709374])
 list([-0.5002960863123926, 0.649633845235247, 0.7087104432107385])
 list([-0.1746678411131779, 0.6681019390379295])
 list([0.8481027891329702, -0.47712984075921194])
 list([-0.2944183629306081, -0.8662764001969054, -0.6632250024315305])
 list([0.3332782476124221, 0.8911335169033676, 0.3679087609471774])
 list([-0.3066792464312331, -0.9133752116445204, -0.5697539264524627])
 list([-0.5628983225272679, -0.6545682089841969, -0.23888611504349977])
 list([-0.6102541963285941, 0.5641690830147785, -0.26051513377919533])] 

 NEXT STATE 
 [array([ 1.00000000e+00,  8.00000000e+00,  2.00000000e+00,  7.61000000e+00,
         9.92000000e+00,  1.47200000e+01,  1.30300000e+01,  9.30000000e+01,
         9.36200000e+01,  8.85800000e+01,  1.01620000e+02,  0.00000000e+00,
         1.67700000e+01,  1.12240000e+02,  0.00000000e+00,  0.00000000e+00,
         1.11000000e+00,  2.04000000e+00, -0.00000000e+00,  1.88



# 6. Evaluating Actions

After Taking actions we can evaluate the performance of our agent or agents. The Final evaluation can only be done after steps are completed.
for one interation, we can only see the reward

In [46]:
env.cost()

AttributeError: 'list' object has no attribute 'max'

# SAMPLE RUN or LOCAL EVALUATION

Some modification have been made from the origial code. For isinstance

- We can run a test for a month i.e $30*24$ to quickly evaluate our agent 

we add the following code in the evaluation section 

``` python 

    # Skipping to shorten training time
    days = 30*5
    training_steps = 24*days
    skipping = False
```

In [42]:
#-------------Define Random Action function---------------------------------
def randomActs():
    Actions = []
    for buildinG in range(1,10):
        if (buildinG == 3) or (buildinG == 4):
            # print(" IN building ", buildinG)
            Actions.append([random.uniform(-1,1) for _ in range(2)])
        else:
            # print(" IN building ", buildinG)
            Actions.append([random.uniform(-1,1) for _ in range(3)])

    return np.array(Actions)


#-------------Train----------------------------------------------------------
n_episodes = 1
Days = 10000
# Skipping to shorten training time
training_steps = 24*Days

start = time.time()
for e in tqdm(range(n_episodes)): 
    state = env.reset()
    done = False
    skipping = True
    moves = 0  

    while (not done) and skipping:
        next_state, reward, done, _ = env.step(randomActs())
        state = next_state
        moves += 1
        try:
            print(env.cost())
        except:
            pass
        # print(done, skipping)
        if moves >= training_steps:
            # print(" TERMINATING AT SET STEP ", episode)
            skipping = False
        
    print('Loss -',env.cost(), 'Simulation time (min) -',(time.time()-start)/60.0)
    # CPU training for 603mins 

  if sys.path[0] == '':
100%|██████████| 1/1 [07:18<00:00, 438.08s/it]

({'ramping': 3.6079023, '1-load_factor': 1.371217217793463, 'average_daily_peak': 1.5311601, 'peak_demand': 1.425577, 'net_electricity_consumption': 1.067433, 'carbon_emissions': 1.0771184, 'total': 1.6800680105471608, 'coordination_score': 1.9839641667282577}, {'ramping_last_yr': 3.5773516, '1-load_factor_last_yr': 1.3675885944909316, 'average_daily_peak_last_yr': 1.5559206, 'peak_demand_last_yr': 1.4492459, 'net_electricity_consumption_last_yr': 1.0673436, 'carbon_emissions_last_yr': 1.0795386, 'coordination_score_last_yr': 1.9875266738073643, 'total_last_yr': 1.726359363633722})
Loss - ({'ramping': 3.6079023, '1-load_factor': 1.371217217793463, 'average_daily_peak': 1.5311601, 'peak_demand': 1.425577, 'net_electricity_consumption': 1.067433, 'carbon_emissions': 1.0771184, 'total': 1.6800680105471608, 'coordination_score': 1.9839641667282577}, {'ramping_last_yr': 3.5773516, '1-load_factor_last_yr': 1.3675885944909316, 'average_daily_peak_last_yr': 1.5559206, 'peak_demand_last_yr': 1.




In [40]:
env.cost()

({'ramping': 3.7272313,
  '1-load_factor': 1.3625216173050632,
  'average_daily_peak': 1.5394294,
  'peak_demand': 1.3857098,
  'net_electricity_consumption': 1.0712593,
  'carbon_emissions': 1.0811657,
  'total': 1.6945528336186844,
  'coordination_score': 2.0037230175464806},
 {'ramping_last_yr': 3.7011318,
  '1-load_factor_last_yr': 1.358140732676143,
  'average_daily_peak_last_yr': 1.5539014,
  'peak_demand_last_yr': 1.566414,
  'net_electricity_consumption_last_yr': 1.0716972,
  'carbon_emissions_last_yr': 1.0840944,
  'coordination_score_last_yr': 2.044896996475844,
  'total_last_yr': 1.768610946094414})

# MARLISA

The MARLISA Algorithm takes inputs that can be classified into two categories

### 1. Environment Parameters

These are parameters specific to the reinforcement learning environment. They give information about the simulation envrionment that will be used. details about these environmental variables are explored above. A summary explanation can be laid down as follows.

- __'building_ids':__ These are the building number and include written in the form `"Building_id"` where id is a building number a building_id can look as follows
    - `["Building_1", "Building_2", ... , "Building_n"]`
- __'buildings_states_actions':__ This is a json file defining the different states and actions possible for a building e.g If a building has a 
    - `states {day :  False,  temp: True}` it means there will be information for temp but not for day
    - `"actions": {"cooling_storage": true, "dhw_storage": true, "electrical_storage": false}`. this means there will be no action required or electric storage is absent in the building
- __'building_info':__ Gives valuable information about a building like (not given in 2022)
    - `building_type`
    - `climate_zone`
    - `solar_power_capacity (kW)`
    - `Annual_DHW_demand (kWh)`
    - `Annual_cooling_demand (kWh)`
    - `Annual_nonshiftable_electrical_demand (kWh)`
    - `etc`
- __'observation_spaces':__ This is information about the observation space of every building in the environment. 
    - It contains n arrays where n is the number of buildings
    - Each array contains the lower and upper bound for the building observation along with it's dimension and datatype
- __'action_spaces':__ This is information about the actions_spaces of every building in the environment
    - It contains n arrays where n is the number of buildings
    - Each array contains the lower and upper bound for the building action along with it's dimension and datatype


### 2. Algorithm Parameters

These are parameters specific to our reinforcement learning algorithm. Details about these parameters can be found in the paper

- `hidden_dim`:[256,256], 
- `discount`:0.99, 
- `tau`:5e-3, 
- `lr`:3e-4, 
- `batch_size`:256, 
- `replay_buffer_capacity`:1e5, 
- `regression_buffer_capacity`:3e4, 
- `start_training`:600, # Start updating actor-critic networks
- `exploration_period`:7500, # Just taking random actions
- `start_regression`:500, # Start training the regression model
- `information_sharing`:True, # If True -> set the appropriate 'reward_function_ma' in reward_function.py
- `pca_compression`:.95, 
- `action_scaling_coef`:0.5, # Actions are multiplied by this factor to prevent too aggressive actions
- `reward_scaling`:5., # Rewards are normalized and multiplied by this factor
- `update_per_step`:2, # How many times the actor-critic networks are updated every hourly time-step
- `iterations_as`:2,# Iterations of the iterative action selection (see MARLISA paper for more info)
- `safe_exploration`:True

## MARLISA CODE Line By Line

## 1. Initialization (__init__ body) 

`[Line 58 to 65]` abstract information to individual buildings

```python
        self.action_list_ = []
        self.action_list2_ = []
        
        self.time_step = 0
        self.pca_flag = {uid : 0 for uid in building_ids}
        self.regression_flag = {uid : 0 for uid in building_ids}
        self.action_spaces = {uid : a_space for uid, a_space in zip(building_ids, action_spaces)}
        self.observation_spaces = {uid : o_space for uid, o_space in zip(building_ids, observation_spaces)}
```

`[Line 75-83]` Energy size coefficient  for every building (Not Needed in 2022)

```python

      self.energy_size_coef = {}
        self.total_coef = 0
        for uid, info in building_info.items():
            _coef = info['Annual_DHW_demand (kWh)']/.9 + info['Annual_cooling_demand (kWh)']/3.5 + info['Annual_nonshiftable_electrical_demand (kWh)'] - info['solar_power_capacity (kW)']*8760/6.0
            self.energy_size_coef[uid] = max(.3*(_coef + info['solar_power_capacity (kW)']*8760/6.0), _coef)/8760
            self.total_coef += self.energy_size_coef[uid]
            
        for uid in self.energy_size_coef:
            self.energy_size_coef[uid] = self.energy_size_coef[uid]/self.total_coef
```

`[Line 86-111]` __Define Encoder__: Set Regression Learner for every building, define Encoding for every observation (think of it as data column) and set target variable to be removed

`[Line 131-145]` __Define Regression Encoder__: for transforming states in regression model

`[Line 149-164]` __Solar Capacity (remove variables if no solar PV)__: removes solar radiation related variables for houses without PV

`[Line 131-145]` __Define Regression Encoder__: for transforming states in regression model

`[Line 167-179]` __PCA for Dimensionality Reduction__: reduce state space dimension

`[Line 181-192]` __Initialize Network__: for transforming states in regression model

`[Line 195-202]` __Initialize Policy__: for transforming states in regression model


## 2. Select Action (MARLISA method or function)

- Takes as inputs `states` and `deterministic`
    - `states` is the stated of the buildings
    - `deterministic` boolean can be true or false

`[Line 212-220]` shuffles the action and state order. which building should take the first action. The next  building has to be know so as to be used during information sharing.

`**[Line 222-226]` __Initialize coordination variables__: coordination variables are two dimesional for every building. 
- `Capacity  Dispatched:` This is the toal amont of electricity already dispatched. it is related to the energy size coefficient at every time step
- `**Electrical Demand`: Related to the total electricty demand estimated by a prediction algorithm. __Different in Information Sharing and Non Info Sharing Cases__
    - computed as `(toal demand - expected demand)/total coeff` for non exploration or non info sharing and as `total_demand/total coeff` for info sharing situations
    - `total demand (non info sharing)` increments expected demand at every time step
    - `total demand (info sharing)` increments expected demand at every time step with _expected demand of first agent minus expected demand of next agent_
    - `expected demand` is the predicted demand at a given time step
        - in information sharing expected demand for next agent is `total demand/total coeff`
    - `total coeff` is computed at the start  of simulation  for every building

`[Line 230-286]` Explore if its exploration period. safe exploration uses rule based coordination.

``` python
        if explore:
            for uid, uid_next, state in zip(_building_ids, _building_ids_next, _states):
                if self.safe_exploration:
                    multiplier = 0.4
                    hour_day = state[2]
                    a_dim = len(self.action_spaces[uid].sample())
        
                    act = [0.0 for _ in range(a_dim)]
                    if hour_day >= 7 and hour_day <= 11:
                        act = [-0.05 * multiplier for _ in range(a_dim)]
                    elif hour_day >= 12 and hour_day <= 15:
                        act = [-0.05 * multiplier for _ in range(a_dim)]
                    elif hour_day >= 16 and hour_day <= 18:
                        act = [-0.11 * multiplier for _ in range(a_dim)]
                    elif hour_day >= 19 and hour_day <= 22:
                        act = [-0.06 * multiplier for _ in range(a_dim)]

                    # Early nightime: store DHW and/or cooling energy
                    if hour_day >= 23 and hour_day <= 24:
                        act = [0.085 * multiplier for _ in range(a_dim)]
                    elif hour_day >= 1 and hour_day <= 6:
                        act = [0.1383 * multiplier for _ in range(a_dim)]
```

`[Line 287-352]` Uses the MARLISA network to learn


## 3. Add to Buffer (MARLISA method or function)

Takes as input `states, actions, rewards, next_states, done, coordination_vars, coordination_variables_next`