<a href="https://colab.research.google.com/github/Julian-Banks/EEE4022S_BNKJUL001_Thesis/blob/main/PythonWorkspace/EMSv0_2_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Version Notes**

# **v0.2_1**
**Added:**
*  Monitor wrapper
*  DummyVec wrapper
*  Wand (weights and bais) enabled

**Parameters:**
* lowered to 3 predictions  

**To do:**
* Try to use hyperparameter Optimisation
* Try normalise
* Try differnet models
* Find out how the bounds for the obs_space box effect things

# **v0.2**
**Added:**
*  simplified load_forecast and gen_forecast to be power_bal_forecasts.
*  combined current_load and current_gen to also show current_power_balance
*  added proper evaluate call

**Parameters:**
* No changes  

**Results:**
* 5% savings on PPO deterministic = true
* 4.3% savings on PPO determnistic = false

**To do:**
* Try to use hyperparameter Optimisation
* Try normalise
* Try differnet models
* Try see if discount rate can be tweaked - at good level.
* Find out how the bounds for the obs_space box effect things

# **v0.1**
**Added:**
* added real loads, gen, tou_id's

**Parameters:**
* training episode = 6000 timesteps
* testing episode  = 2760 timesteps
* bat_threshold = 100
* bat_cap = 500
* battery_level at reset = bat_cap/2
* num_preds = 24
* Trained PPO for 1.65mil timesteps
* Trained A2C for 1.2 mil timesteps

**Results:**
* PP0 - 3.7% improvement from standby mode Deterministic = False
* PPO - 6.1% Deterministic =  True
* A2C  - -0.3% improvement. And the models after this got worse as training progressed!
**To do:**
* Try lower num_preds
* Try to use hyperparameter Optimisation
* Try normalise
* Try differnet models
* Try see if discount rate can be tweaked




In [10]:
%%capture
!pip install gymnasium
!pip install stable_baselines3[extra]
!pip install wandb
%load_ext tensorboard

In [11]:
! git clone https://github.com/Julian-Banks/EEE4022S_BNKJUL001_Thesis


fatal: destination path 'EEE4022S_BNKJUL001_Thesis' already exists and is not an empty directory.


In [12]:
#to update the rep
%cd /content/EEE4022S_BNKJUL001_Thesis
! git pull

/content/EEE4022S_BNKJUL001_Thesis
Already up to date.


In [13]:
#to save the code afterwards
'''
%cd /content/EEE4022S_BNKJUL001_Thesis

Message = "Changes to EMSv0_2_1"

! git add.
! git commit -m "{Message}"
! git push

'''

'\n%cd /content/EEE4022S_BNKJUL001_Thesis\n\nMessage = "Changes to EMSv0_2_1"\n\n! git add.\n! git commit -m "{Message}"\n! git push\n\n'

In [14]:
#import needed libarys
import gymnasium as gym
import numpy as np
import pandas as pd
from gymnasium import spaces
import datetime
from stable_baselines3 import PPO, A2C, DQN
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecVideoRecorder
from google.colab import drive
import os
import wandb
from wandb.integration.sb3 import WandbCallback

#mount the drive
drive.mount('/content/drive')
#define paths to logs and model saves
model_type = "A2C"
version    = "EMSv0_2_1"
model_dir = f"/content/drive/MyDrive/Colab Notebooks/{version}/models/{model_type}/"
log_dir   = f"/content/drive/MyDrive/Colab Notebooks/{version}/models/{model_type}logs/"

#make the appropriate directory if it does not exist
if not os.path.exists(model_dir):
    os.makedirs(model_dir)
if not os.path.exists(log_dir):
    os.makedirs(log_dir)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**Define our environment class!!**

In [15]:
class EMSv0_2_1(gym.Env):
    """Custom Environment that follows gym interface."""

    metadata = {"render_modes": ["human"], "render_fps": 30}

    def __init__(self,bat_threshold = 0.1, bat_cap = 1, actual_load = "none", actual_gen = "none", purchase_price = [1,1,1,1,1,1,1,1,2,2,2,2] , episode_len = 8760,num_preds = 24,render_mode = "none"):

        super(EMSv0_2_1, self).__init__()

        #define time frame
        self.current_step = 0
        self.final_step = int(episode_len)-num_preds-2 #one years worth of steps

        #Might make a function for these
        #fill all of the actual loads NB!!! is just random for now NB!!! is normalised 0-1
        if isinstance(actual_load,str) :
            self.actual_load = np.random.rand(self.final_step+num_preds+1).astype(np.float32) #will load from a file or something
        else:
            self.actual_load = actual_load[:episode_len]

        #fill all of the actual generation steps.
        if isinstance(actual_gen,str):
            self.actual_gen  = np.random.rand(self.final_step+num_preds+1).astype(np.float32) #will load from file or something
        else:
            self.actual_gen  = actual_gen[:episode_len]

        #define the purchase price for every step of the year
        purchase_price = np.array(purchase_price).astype(np.float32)
        repetitions    = (self.final_step+num_preds+1) // len(purchase_price)
        remainder      = (self.final_step+num_preds+1) % len(purchase_price)
        self.purchase_price =np.concatenate([purchase_price]*repetitions+[purchase_price[:remainder]])#need to read in from somewhere

        #define var for storing the excess gen
        self.excess_gen = 0
        #define a var for determine amount purchased per step (dont want to make it total as this will incure growing penalties for the Agent if used in reward structure)
        self.step_purchased = 0
        #define the battery max capacity
        self.bat_cap = bat_cap
        #define the battery low threshold
        self.bat_threshold = np.float32(bat_threshold)
        #define default action
        self.default_action = 0
        #define actions and observations space
        n_actions = 2 # keeping it simple

        self.num_preds = num_preds # day ahead predictions
        self.action_space = spaces.Discrete(n_actions)
        # Dict space to store all the different things
        self.observation_space = spaces.Dict({
                "power_bal_forecast": gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1,num_preds), dtype=np.float32),
                "price_forecast": gym.spaces.Box(low=0, high=np.inf, shape=(1,num_preds+1), dtype=np.float32),
                "bat_level": gym.spaces.Box(low=0, high=np.inf, shape=(1,), dtype=np.float32),
                "current_power_bal": gym.spaces.Box(low=-np.inf, high=np.inf, shape=(1,), dtype=np.float32),
                })

    def step(self, action):

        #update the current state with the action (needs to be done before current_step is inc since we want to apply the action to the previous step to get the current state)
        self.update_state(action)
        #Calculate reward from the action
        reward = self.calc_reward()

        #inc time step into Future
        self.current_step += 1
        #get next observation (for next time step)
        observation = self.get_obs()
        #Set terminated to False since there are no failure states
        self.terminated = False
        #Check if timelimit reached
        self.truncated = False if self.current_step<self.final_step else True
        #dont know what to put into info for now
        info = {}
        return observation, reward, self.terminated, self.truncated, info

    def reset(self, seed=None, options=None):
        super().reset(seed = seed, options=options)

        self.current_step = 0
        self.terminated = False
        self.truncated = False

        #reset these cause I dont want the model to just memorise the random data (so it gets changed every reset)
        #fill all of the actual loads NB!!! is just random for now NB!!! is normalised 0-1
        #self.actual_load = np.random.rand(self.final_step+self.num_preds+1).astype(np.float32) #will load from a file or something
        #fill all of the actual generation steps.
        #self.actual_gen   = np.random.rand(self.final_step+self.num_preds+1).astype(np.float32) #will load from file or something


        #reset the state
        self.battery_level = self.bat_cap/2
        self.current_load = self.actual_load[0]
        self.excess_gen = 0
        self.step_purchased = 0
        #get the first observation
        observation = self.get_obs()
        #Still don't know what to do with info
        info = {}
        return observation, info

    def render(self):
        #Reaaaaalllyyyy want to render something, Maybe the curent load as a point, the forecasts as a plot and the bat levels as bar
        pass

    def close(self):
        #don't think i need this for my application
        pass

    def update_state(self, action):
        #Update current state with actions
        if action == 0: #do nothing action
            self.standby()
        elif action == 1: #buy from Grid
            self.purchase()
        else:  #error case
            raise ValueError(
              f"Received invalid action = {action} which is not part of the action space."
            )
        #case list for each action?

    def calc_reward(self):
        #Calculate reward based on the state
        reward = -self.step_purchased*self.purchase_price[self.current_step]

        return reward

    def get_obs(self):
        #Fill the observation space with the next observation

        #Get Forecasts Will probaly write a function for this? idk maybe a schlep to return all the info
        load_forecast  = np.array( [self.actual_load[self.current_step+1: self.current_step + self.num_preds+1]] , dtype = np.float32) #will load from a file or something
        if load_forecast.shape != (1,self.num_preds):
            print(f"load_forecast shape is {load_forecast.shape} but it should be {(1, self.num_preds)}. Current step is {self.current_step}")
        gen_forecast   = np.array( [self.actual_gen[self.current_step+1: self.current_step + self.num_preds+1]] , dtype = np.float32) #will load from a file or something
        if gen_forecast.shape != (1,self.num_preds):
            print(f"gen_forecast shape is {gen_forecast.shape} but it should be {(1, self.num_preds)}. Current step is {self.current_step}")
        #calculate the power forecast
        power_bal_forecast = gen_forecast-load_forecast
        #get the prices for the current frame and the next 24 hours. Maybe will cut this down since that seems like a lot of info
        price_forecast = np.array( [self.purchase_price[self.current_step:self.current_step+self.num_preds+1]] , dtype = np.float32)
        #Just for readibility of the dict object
        bat_level      = np.array([self.battery_level] , dtype= np.float32)

        #calculate the current power balance
        current_load   = np.array([self.actual_load[self.current_step]], dtype = np.float32)
        current_gen    = np.array([self.actual_gen[self.current_step]], dtype  = np.float32)
        current_power_bal = current_gen - current_load



        obs = dict({
                "bat_level":      bat_level,
                "current_power_bal" :   current_power_bal,
                "power_bal_forecast":  power_bal_forecast,
                "price_forecast": price_forecast,
        })
        return obs

    def standby(self):
        #ems stands by, load is met by generation, battery and then grid
        #if there is excess generation it is used to charge the batteries

        #define step_gen and step_load for readability
        step_gen  =  self.actual_gen[self.current_step]
        step_load =  self.actual_load[self.current_step]
        battery   =  self.battery_level
        #check for gen meeting load
        if step_load <= step_gen :
            #set the purchased elect to 0 since gen meets load
            self.step_purchased = 0
            #calulate the excess elec that was generated
            step_excess = step_gen - step_load
            #check if battery needs to be charged
            if battery < self.bat_cap :
                #check if the excess amount that was generated is less than the available capacity
                if self.bat_cap-battery-step_excess > 0:
                    self.battery_level += step_excess
                else:
                    #if the excess is greater than the availability then charge till full
                    self.battery_level = self.bat_cap
                    #set step excess to excess minus the amount used to charge
                    step_excess -= (self.bat_cap-battery)
                    self.excess_gen += step_excess
            else:
                #if the battery is full then just inc excess_gen
                self.excess_gen += step_excess
        else:
            #if the generation does not meet load
            step_shortfall = step_load - step_gen
            #checking if battery is above a threshold.
            if battery > self.bat_threshold:
                #check if battery has enough capacity to meet the load
                if battery - step_shortfall >= self.bat_threshold:
                    #if it does then subtract the shortfall from battery level
                    self.battery_level -= step_shortfall
                    #set the purchased variable to 0 since nothing was purchased
                    self.step_purchased = 0
                else:
                    #set the battery to min value and purchase the rest from the grid
                    self.battery_level = self.bat_threshold
                    #calculate how much needs to be purchased
                    step_shortfall -= (battery - self.bat_threshold)
                    self.step_purchased = step_shortfall
            else:
                #no battery available, therefore everything needs to be bought from the grid.
                self.step_purchased = step_shortfall

    def purchase(self):
        #purchase electricity to charge battery even if there is enough generation (I assume this will be used to buy at lower prices)
        #get values for readability
        step_load = self.actual_load[self.current_step]
        step_gen  = self.actual_gen[self.current_step]
        battery = self.battery_level

        #calculate the total power need (the load plus the amount that the battery needs to charge)
        total_need = step_load + (self.bat_cap-battery)
        #if the generation is less than the need then purchase the remainder
        if step_gen<total_need:
            #purchashing the shortfall
            self.step_purchased = total_need - step_gen
            #setting the battery levels to full
            self.battery_level = self.bat_cap
        else:
            #if the gen is enough then set purchase to 0
            self.step_purchased  = 0
            #set the battery to fully charged
            self.battery_level = self.bat_cap
            #inc excess_gen by caluclating the excess between the step gen and the total need (includes amount needed to charge the battery)
            self.excess_gen += (step_gen - total_need)


Check the environment with stable_baselines3 check_env.

In [16]:
from stable_baselines3.common.env_checker import check_env
env = EMSv0_2_1()
check_env(env,warn = True)



**Load in the data for our specific microgrid.**

In [17]:
#need to import data from Github
path_data = "/content/EEE4022S_BNKJUL001_Thesis/PythonWorkspace/dataClean.csv"
data = pd.read_csv(path_data)

path_gen = "/content/EEE4022S_BNKJUL001_Thesis/Generation/BNKJUL001_Thesis_solarGen500kWHomer.csv"
data_gen = pd.read_csv(path_gen)

#Not actually using this rn but will be soon :)
path_shedding = "/content/EEE4022S_BNKJUL001_Thesis/MatlabWorkSpace/loadShedding2022.csv"
data_shedding = pd.read_csv(path_shedding)
load_shedding = data_shedding['LoadShedding'].values.astype(np.float32)

actual_gen = data_gen['PV_Out'].values.astype(np.float32)
actual_load = data['AC'].values.astype(np.float32)
purchase_price = data['tou_id'].values.astype(np.float32)


Evaluate the base model (no EMS, just using standby mode)

In [18]:
#define the base environment
base_env = EMSv0_2_1(episode_len = 6000, actual_load = actual_load, actual_gen = actual_gen, bat_threshold = 100, bat_cap = 500, purchase_price = purchase_price,num_preds = 3)
#going to print out a bunch of things to test the different spaces.
obs,_    = base_env.reset()
print(f"The reset observation space looks like: {obs}")
action_standby = 0
obs,reward,terminated,truncated,info = base_env.step(action_standby)
print(f"After action {action_standby} the observation space looks like {obs}")
print(f"The reward we recieved was {reward}")

The reset observation space looks like: {'bat_level': array([250.], dtype=float32), 'current_power_bal': array([-96.855], dtype=float32), 'power_bal_forecast': array([[-97.053, -97.229, -96.859]], dtype=float32), 'price_forecast': array([[1., 1., 1., 1.]], dtype=float32)}
After action 0 the observation space looks like {'bat_level': array([153.14499], dtype=float32), 'current_power_bal': array([-97.053], dtype=float32), 'power_bal_forecast': array([[-97.229, -96.859, -98.018]], dtype=float32), 'price_forecast': array([[1., 1., 1., 1.]], dtype=float32)}
The reward we recieved was 0.0


  and should_run_async(code)


A loop to get an average reward for the base model only perfoming the standby option

In [19]:
#reset the environment and save the obs
#going to run it 100 times to get a benchmark
#reset score
score = 0
for step in range(1000):
    obs,_    = base_env.reset()
    #ensure that the exit condition is reset
    truncated = False
    #define the action to take
    action_standby = 0

    while not truncated:
        obs,reward,terminated,truncated,info = base_env.step(action_standby)
        score += reward

print(f"Done iteration! Total reward accumulated is: {score/step}")

Done iteration! Total reward accumulated is: -880769.0387836567


Connect to drive, create new directories for the logs and models to be saved in.

Open tensorboard

In [20]:


%load_ext tensorboard
%tensorboard --logdir "{log_dir}" --load_fast=false --port 8008



The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


<IPython.core.display.Javascript object>

**LOAD OR MAKE MODEL HERE!**

In [21]:
config = {
    "policy_type": "MultiInputPolicy",
    "total_timesteps": 1500000,
}

run = wandb.init(
    project="4022_intelligent_ems",
    config=config,
    sync_tensorboard=True,  # auto-upload sb3's tensorboard metrics
    monitor_gym=True,  # auto-upload the videos of agents playing the game
    save_code=True,  # optional
)





  and should_run_async(code)
  return LooseVersion(v) >= LooseVersion(check)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [22]:
#create a new environment to train the model in.
train_env = EMSv0_2_1(episode_len = 6000, actual_load = actual_load, actual_gen = actual_gen, bat_threshold = 100, bat_cap = 500, purchase_price = purchase_price,num_preds = 3)

train_env = Monitor(train_env)
#train_env = DummyVecEnv(train_env)
#Create the model with the MultiInputPolicy, use the training env, verbose is off because tensorboard loging is enabled
model = A2C("MultiInputPolicy",train_env, verbose = 0, tensorboard_log = f"runs/{run.id}") #log_dir

#Load model, fetch the latest (or whichever one you want from the model_dir)
#PPO latest:
#model_load = f"{model_dir}/"
#model  = PPO.load(model_load, env = train_env)

  and should_run_async(code)


**Infinite while loop to train  model.**

In [23]:
wand_name = f"{version}_{model_type}"+datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
for i in range(50):
    #define the name for the specific log

    #make the model learn, set the reset to false so that it keeps its old learning
    model.learn(total_timesteps= 30000,
                tb_log_name = wand_name,
                reset_num_timesteps=False,
                callback = WandbCallback(
                        gradient_save_freq=100,
                        model_save_path=f"models/{run.id}",
                        verbose=2,
                ))
    log_name = f"{version}_{model_type}"+datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    model.save(f"{model_dir}{log_name}")
    #open tensorboard
run.finish()

VBox(children=(Label(value='1.245 MB of 1.245 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
rollout/ep_len_mean,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
rollout/ep_rew_mean,▁▂▃▄▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▇▇▇▇██
time/fps,███▇▇▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁
train/entropy_loss,████▅▇██▅█▇███▇████▆▅▃▇▇▆▂▅▂█▂█▁▆▁▆▅▃▄▇▇
train/explained_variance,▇▁▇▇▇▇▇▆▇▇▇▇▇█▇█▇█▇█▇▇▇▇█▇▇▇▇▇█▇█▇█▇▇▇█▇
train/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/policy_loss,████▇▅██▆███████████▃▁██▇▂▇▇█▆█▅▇▇▇█▃▇██
train/value_loss,▃▁▁▁▄▁▁▁▂▁▁▆▁▁▂▁▁▁▁▁▃█▁▁▁▅▂▁▁▁▁▂▁▁▂▁▆▁▁▁

0,1
global_step,1500000.0
rollout/ep_len_mean,5995.0
rollout/ep_rew_mean,-881977.8125
time/fps,120.0
train/entropy_loss,-0.35933
train/explained_variance,0.00451
train/learning_rate,0.0007
train/policy_loss,-214.18924
train/value_loss,663746.875


**Define a new test environment and load up the best performing model to test it.**

In [24]:
#define a test environment
test_env = EMSv0_2_1(episode_len = 2760, actual_load = actual_load[6001:], actual_gen = actual_gen[6001:], bat_threshold = 100, bat_cap = 500, purchase_price = purchase_price[6001:],num_preds = 3)
#reset the environment and save the obs

#Load model, fetch the latest (or whichever one you want from the model_dir)
#Best A2C model:/content/drive/MyDrive/Colab Notebooks/EMSv0_2_1/models/PPO/EMSv0_2_1_PPO20231005-102532.zip
#best_PPO_model = "EMSv0_2_1_PPO20231005-102532.zip"


#model_load = f"{model_dir}/{best_PPO_model}"
#model  = PPO.load(model_load, env = test_env)


#first run it with only standby (default)
obs,_    = test_env.reset()
#ensure that the exit condition is reset
truncated = False
#define the action to take
action_standby = 0
#reset score
standby_score = 0
while not truncated:
    #step the model with the action
    obs,reward,terminated,truncated,info = test_env.step(action_standby)
    #accumulate the score
    standby_score += reward

EMS_reward,EMS_std_reward = evaluate_policy(model,test_env,n_eval_episodes = 100,deterministic=True)


print(f"Note: The term does not refer to the cost in rands but rather to the reward as defined by the reward function!")
print(f"Done the Standby Test! Total cost accumulated is: {standby_score}")
print(f"Done applying the trained model! Total cost accumulated is: {EMS_reward} +- {EMS_std_reward}")

savings = EMS_reward - standby_score
print(f"The amount that was saved by applying the EMS agent: {savings}")
print(f"This was saved over a period of {2760/24} days")
print(f"The savings represents {(savings/(-standby_score))*100} % of the cost if no EMS is installed")
print(f"And it represents {(savings/(-EMS_reward))*100} % of the cost if the EMS is installed")




Note: The term does not refer to the cost in rands but rather to the reward as defined by the reward function!
Done the Standby Test! Total cost accumulated is: -309839.33152008057
Done applying the trained model! Total cost accumulated is: -328105.00497436523 +- 0.0
The amount that was saved by applying the EMS agent: -18265.673454284668
This was saved over a period of 115.0 days
The savings represents -5.89520812760367 % of the cost if no EMS is installed
And it represents -5.567020672455686 % of the cost if the EMS is installed
