# DRL 

DRL is good for demand charge problem since we don't need to incorporate the demand charge cost into every reward. We will incorporate it only into the final states. 

State:
- Last k indoor temperatures of all zones (For now just use current and last)
- Last k outdoor temperatures (For now just use current)
- Last k actions  (For now just use current)
- Time of Month (For demand charge)
- Max Consumption so far
- Comfortband for t steps into the future
- Do not exceed for t steps into the future
- occupancy for t steps into the future
- price t steps into future

Actions: 
[0,1,2] x num_zones

We limit our observation space to one month. disregarding sesonality. 

- Add random gaussian noise to all temperatures. Gaussian noise should be distributed according to our uncertainty (historic uncertainty for outdoor temperature for last years etc). 
- Comfortband/DoNotExceed should be set for one month? 
- Occupancy should have random noise added i guess. for now just assume schedule. 

Outdoor temperature we want to find distribution:
$$P(T_{t+1} | T_{t})$$ so that we can sample from it. 
For now we could assume:
$$P(T_{t+1} | T_{t}) = P(\delta t_{t+1}) $$
which is distributed according to gaussian distribution which has the same variance as our data. 

## How is this adding to MPC
- Easier to make demand charges happen. Do not need to incorporate into objective function at every step. Will be rewarded at the end of month. 
- Will learn a much longer predictive horizon. 
- Can use more complex models for predicting indoor temperature. MPC would loose DP possibility if using mmore complex and higher order models. 
- Could learn underlying effects of occupancy/comfortband which MPC could not catch. 

In [14]:
import sys
sys.path.append("../")
from DataManager.DataManager import DataManager
from Thermostat import Tstat

In [15]:
import numpy as np
import xbos_services_getter as xsg

import gym, ray
from gym.spaces import MultiDiscrete, Box, Discrete

In [16]:
ray.init(ignore_reinit_error=True)

2019-05-09 16:01:12,317	INFO node.py:497 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-05-09_16-01-12_316527_38061/logs.
2019-05-09 16:01:12,428	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:61852 to respond...
2019-05-09 16:01:12,544	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:47161 to respond...
2019-05-09 16:01:12,548	INFO services.py:806 -- Starting Redis shard with 6.87 GB max memory.
2019-05-09 16:01:12,559	INFO node.py:511 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-05-09_16-01-12_316527_38061/logs.
2019-05-09 16:01:12,563	INFO services.py:1441 -- Starting the Plasma object store with 10.31 GB memory using /tmp.


{'node_ip_address': '10.142.38.66',
 'redis_address': '10.142.38.66:61852',
 'object_store_address': '/tmp/ray/session_2019-05-09_16-01-12_316527_38061/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2019-05-09_16-01-12_316527_38061/sockets/raylet',
 'webui_url': None,
 'session_dir': '/tmp/ray/session_2019-05-09_16-01-12_316527_38061'}

In [40]:
class BuildingEnv(gym.Env):
    def __init__(self, env_config):

        self.DataManager = DataManager(env_config["building"], env_config["zones"],
                         env_config["start"], env_config["end"], env_config["window"])
        
        self.start = start
        self.unix_start = start.timestamp() * 1e9
        self.end = end
        self.unix_end = end.timestamp() * 1e9
        self.window = window  # timedelta string

        self.building = building
        self.zones = zones
        
        self.lambda_val = env_config["lambda_val"]

        # assert self.zones == all zones in building. this is because of the thermal model needing other zone temperatures.

        self.curr_timestep = 0

        self.indoor_starting_temperatures = env_config[
            "indoor_starting_temperatures"]  # to get starting temperatures [last, current]
        self.outdoor_starting_temperature = env_config["outdoor_starting_temperature"]

        self.tstats = {}
        for iter_zone in self.zones:
            self.tstats[iter_zone] = Tstat(self.building, iter_zone,
                                           self.indoor_starting_temperatures[iter_zone]["current"],
                                           last_temperature=self.indoor_starting_temperatures[iter_zone]["last"])

        assert 60 * 60 % xsg.get_window_in_sec(self.window) == 0  # window divides an hour
        assert (self.end - self.start).total_seconds() % xsg.get_window_in_sec(
            self.window) == 0  # window divides the timeframe

        # the number of timesteps
        self.num_timesteps = int((self.end - self.start).total_seconds() / xsg.get_window_in_sec(self.window))

        self.unit = env_config["unit"]
        assert self.unit == "F"

        # all zones current and last temperature = 2*num_zones
        # building outside temperature -> make a class for how this behaves = 1
        # timestep -> do one hot encoding of week, day, hour, window  \approx 4 + 7 + 24 + 60*60 / window
        low_bound = [32] * (2 * len(
            self.zones) + 1)  # we could use parametric temperature bounds... for now we will give negative inft reward
        high_bound = [100] * (2 * len(self.zones) + 1)  # plus one for building

        low_bound += [0] * (self.num_timesteps + 1)  # total timesteps plus the final timestep which wont be executed
        high_bound += [1] * (self.num_timesteps + 1)  # total timesteps plus the final timestep which wont be executed

        self.observation_space = Box(
            low=np.array(low_bound), high=np.array(high_bound), dtype=np.float32)

        self.action_space = MultiDiscrete([3] * len(self.zones))

        self.reset()


    def reset(self):
        self.curr_timestep = 0

        for iter_zone in self.zones:
            self.tstats[iter_zone].reset(self.indoor_starting_temperatures[iter_zone]["current"],
                                         last_temperature=self.indoor_starting_temperatures[iter_zone]["last"])
        self.outdoor_temperature = self.outdoor_starting_temperature

        return self.create_curr_obs()  # obs

    def step(self, action):
        
        self.curr_timestep += 1

        # if we reach the end time.
        if self.curr_timestep == self.num_timesteps:
            return self.create_curr_obs(), 0, True, {}

        # find what new temperature would be. use thermal model with uncertainty. use reset if exceeding
        # do_not_exceed. can't force it to take a different action anymore.

        # update temperatures
        for i, iter_zone in enumerate(self.zones):
            self.tstats[iter_zone].next_temperature(action[i])
            self.outdoor_temperature += np.random.normal()  # TODO we should make a thermostat for the outdoor temperature.

        # check that in saftey temperature band
        for iter_zone in self.zones:
            curr_safety = self.DataManager.do_not_exceed[iter_zone].iloc[self.curr_timestep]
            if not (curr_safety["t_low"] <= self.tstats[iter_zone].temperature <= curr_safety["t_high"]):
                return self.create_curr_obs(), -float('inf'), True, {}  # TODO do we want to add info?

        # get reward by calling discomfort and consumption model ...
        reward = self.get_reward(action)

        return self.create_curr_obs(), reward, False, {}  # obs, reward, done, info

    def get_reward(self, action):
        """Get the reward for the given action with the current observation parameters."""
        # get discomfort across edge
        discomfort = {}
        for iter_zone in self.zones:
            # TODO Check this again since we are a timestep ahead and we want average comfortband and average occupancy over the edge.
            curr_comfortband = self.DataManager.comfortband[iter_zone].iloc[self.curr_timestep]
            curr_occupancy = self.DataManager.occupancy[iter_zone].iloc[self.curr_timestep]
            curr_tstat = self.tstats[iter_zone]
            average_edge_temperature = (curr_tstat.temperature + curr_tstat.last_temperature) / 2.

            discomfort[iter_zone] = self.DataManager.get_discomfort(
                self.building, average_edge_temperature,
                curr_comfortband["t_low"], curr_comfortband["t_high"],
                curr_occupancy)

        # Get consumption across edge
        price = 1  # self.prices.iloc[root.timestep] TODO also add right unit conversion, and duration
        consumption_cost = {self.zones[i]: price * self.DataManager.hvac_consumption[self.zones[i]][action[i]]
                            for i in range(len(self.zones))}

        cost = ((1 - self.lambda_val) * (sum(consumption_cost.values()))) + (
                self.lambda_val * (sum(discomfort.values())))
        return -cost

    def create_curr_obs(self):
        return self._create_obs(self.tstats, self.outdoor_temperature, self.curr_timestep)

    def _create_obs(self, tstats, outdoor_temperature, curr_timestep):
        obs = np.zeros(self.observation_space.low.shape)
        idx = 0
        for iter_zone in self.zones:
            obs[idx] = tstats[iter_zone].last_temperature
            idx += 1
            obs[idx] = tstats[iter_zone].temperature
            idx += 1
        obs[idx] = outdoor_temperature
        idx += 1

        obs[idx + curr_timestep] = 1

        return obs

In [41]:
import numpy as np
import gym
from ray.rllib.models import FullyConnectedNetwork, Model, ModelCatalog
from gym.spaces import Discrete, Box

import ray
from ray import tune
from ray.tune import grid_search

import datetime
import pytz

start = datetime.datetime(year=2019, month=1, day=1).replace(tzinfo=pytz.utc)
end = start + datetime.timedelta(days=1)
window = "15m"
building = "avenal-animal-shelter"
zones = ["hvac_zone_shelter_corridor"]
indoor_starting_temperatures = {iter_zone: {"last": 70, "current": 71} for iter_zone in zones}
outdoor_starting_temperature = 60
unit = "F"
lambda_val = 0.999

config = {
    "start": start,
    "end": end,
    "window": window,
    "building": building,
    "zones": zones,
    "indoor_starting_temperatures": indoor_starting_temperatures,
    "outdoor_starting_temperature": outdoor_starting_temperature,
    "unit": unit,
    "lambda_val": lambda_val
}

In [42]:
e = BuildingEnv(config)


In [43]:
while True:
    obs, rew, done, info = e.step([0])
    if done:
        break

In [44]:
# Can also register the env creator function explicitly with:
# register_env("corridor", lambda config: SimpleCorridor(config))
# ModelCatalog.register_custom_model("my_model", CustomModel)
tune.run(
    "PPO",
    stop={
        "timesteps_total": 10000,
    },
    config={
        "env": BuildingEnv,  # or "corridor" if registered above
        "lr": grid_search([1e-2, 1e-4, 1e-6]),  # try different lrs
        "num_workers": 1,  # parallelism
        "env_config": config,
    },
)
# e = BuildingEnv(config)
# print(e.step([0]))

2019-05-09 16:17:39,779	INFO tune.py:60 -- Tip: to resume incomplete experiments, pass resume='prompt' or resume=True to run()
2019-05-09 16:17:39,780	INFO tune.py:223 -- Starting a new experiment.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/0 GPUs
Memory usage on this node: 19.2/34.4 GB

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 2/12 CPUs, 0/0 GPUs
Memory usage on this node: 19.2/34.4 GB
Result logdir: /Users/daniellengyel/ray_results/PPO
Number of trials: 3 ({'RUNNING': 1, 'PENDING': 2})
PENDING trials:
 - PPO_BuildingEnv_1_lr=0.0001:	PENDING
 - PPO_BuildingEnv_2_lr=1e-06:	PENDING
RUNNING trials:
 - PPO_BuildingEnv_0_lr=0.01:	RUNNING



2019-05-09 16:17:42,541	ERROR trial_runner.py:497 -- Error processing event.
Traceback (most recent call last):
  File "/Users/daniellengyel/ray/python/ray/tune/trial_runner.py", line 446, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/Users/daniellengyel/ray/python/ray/tune/ray_trial_executor.py", line 316, in fetch_result
    result = ray.get(trial_future[0])
  File "/Users/daniellengyel/ray/python/ray/worker.py", line 2197, in get
    raise value
ray.exceptions.RayTaskError: [36mray_PPOTrainer:train()[39m (pid=38091, host=Daniels-MacBook-Pro-4.local)
  File "pyarrow/serialization.pxi", line 458, in pyarrow.lib.deserialize
  File "pyarrow/serialization.pxi", line 421, in pyarrow.lib.deserialize_from
  File "pyarrow/serialization.pxi", line 272, in pyarrow.lib.SerializedPyObject.deserialize
  File "pyarrow/serialization.pxi", line 171, in pyarrow.lib.SerializationContext._deserialize_callback
ModuleNotFoundError: No module named 'Thermostat'

2019-05

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/0 GPUs
Memory usage on this node: 19.5/34.4 GB
Result logdir: /Users/daniellengyel/ray_results/PPO
Number of trials: 3 ({'ERROR': 3})
ERROR trials:
 - PPO_BuildingEnv_0_lr=0.01:	ERROR, 1 failures: /Users/daniellengyel/ray_results/PPO/PPO_BuildingEnv_0_lr=0.01_2019-05-09_16-17-39z5dnbmm7/error_2019-05-09_16-17-42.txt
 - PPO_BuildingEnv_1_lr=0.0001:	ERROR, 1 failures: /Users/daniellengyel/ray_results/PPO/PPO_BuildingEnv_1_lr=0.0001_2019-05-09_16-17-39ggww599p/error_2019-05-09_16-17-42.txt
 - PPO_BuildingEnv_2_lr=1e-06:	ERROR, 1 failures: /Users/daniellengyel/ray_results/PPO/PPO_BuildingEnv_2_lr=1e-06_2019-05-09_16-17-39_37fbxe0/error_2019-05-09_16-17-42.txt



TuneError: ('Trials did not complete', [PPO_BuildingEnv_0_lr=0.01, PPO_BuildingEnv_1_lr=0.0001, PPO_BuildingEnv_2_lr=1e-06])

In [None]:
# task: Write a function that takes a positive integer n as input and does
# 1) pick positive integer n
# 2) print number
# 3) subtracts by one
# 4) repeats from 2) until n != 0 
# and return length of sequence printed number

In [2]:
# task: Write a function that takes a positive integer n as input and does
# 1) pick positive integer n
# 2) if even: prints
# 3) odd: does nothing
# 4) subtracts by one
# 5) repeats until n != 0 
# and return length of sequence printed number

4


In [5]:
def countdown(n):
    counter = 0 
    while n > 0:
        if n % 2 == 0:
            print(n)
            counter += 1
        else:
            pass
        n -= 1
    
    return
    
    

In [6]:
countdown(10)

10
8
6
4
2
