The main aim of this notebook is to demo the low level energy_py API.  All of the functionality exposed here is wrapped into a higher level experiment API (see readme).

This higher level API wraps more functionality than is exposed int his example - i.e. generating log files and writing data to TensorBoard.  For the scope of this low level example, we will just use data available locally - episode rewards in the `Runner` class and data for the last episode - the `info` dictionary.

This notebook also demonstrates the ability of a DQN agent to learn to optimize simplfied electric battery storage.

This example involves a constant and repetitive electricity price profile, combined with a perfect forecast. The agent has both the ability to memorize this profile and lives in a near Markov environment.  More interesting work randomly samples different price rollouts and uses realistic forecasts.  

A real world application of using reinforcement learning to control a battery would have to deal with both a variable price profile and a non-Markov understanding of what the price profile would do in the future.  It could also involve additional reward signals, such as payments from fast frequency response needed to be balanced against price arbitrage.

In [1]:
from datetime import datetime
import os
import random

import numpy as np
import pandas as pd
import tensorflow as tf

import energypy

In [2]:
#  define a total number of steps for the experiment to run
TOTAL_STEPS = 10000

#  to setup the agent we use a dictionary
#  a dictionary allows us to eaisly save the config to csv if we want
agent_config = {
    'discount': 0.97,                 #  the discount rate
    'tau': 0.001,                     #  parameter that controls the copying of weights from online to target network
    'total_steps': TOTAL_STEPS,   
    'batch_size': 32,                 #  size of the minibatches used for learning
    'layers': (50, 50),               #  structure of the neural network used to approximate Q(s,a)
    'learning_rate': 0.0001,          #  controls the stength of weight updates during learning       
    'epsilon_decay_fraction': 0.3,    #  a fraction as % of total steps where epsilon decayed from 1.0 to 0.1
    'memory_fraction': 0.4,           #  the size of the replay memory as a % of total steps
    'memory_type': 'deque',           #  the replay memory implementation we want
               }

#  keep all of the BatteryEnv variables (episode length, efficiency etc) at their defaults
#  we just need to let our env know where our state.csv and observation.csv are (data_path)
env = energypy.make_env('battery')

#  set seeds for reproducibility
env.seed(42)

def print_time():
    print(datetime.now().strftime('%Y-%m-%d %H:%M:%S'))

In [3]:
print_time()
#  reset the graph (without this nb needs to be restart each time)
tf.reset_default_graph()

#  initialize Tensorflow machinery
with tf.Session() as sess:
    
    #  add the tf session and the environment to the agent config dictionary
    #  and initialize the agent
    agent_config['sess'] = sess
    agent_config['env'] = env
    agent = energypy.make_agent(agent_id='dqn', **agent_config)
    
    #  initial values for the step and episode number
    step, episode = 0, 0

    #  outer while loop runs through multiple episodes
    rewards = []
    while step < TOTAL_STEPS:
        episode += 1
        done = False
        observation = env.reset()
        
        #  inner while loop runs through a single episode
        episode_rewards = []
        while not done:
            step += 1
            #  select an action
            action = agent.act(observation)
            
            #  take one step through the environment
            next_observation, reward, done, info = env.step(action)
            
            #  store the experience
            agent.remember(observation, action, reward,
                           next_observation, done)
            
            #  moving to the next time step
            observation = next_observation
            #  saving the reward 
            episode_rewards.append(reward)
            
            #  we don't start learning until the memory is half full
            if step > int(agent.memory.size * 0.5):
                train_info = agent.learn()
                            
        rewards.append(sum(episode_rewards))
        
        if episode % 10 == 1:
            print('ep {} {:.2f} % rew {}'.format(episode, 100 * step / TOTAL_STEPS, sum(episode_rewards)))
            
print_time()

2019-03-22 18:34:55
Instructions for updating:
Colocations handled automatically by placer.

For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Instructions for updating:
Use keras.layers.batch_normalization instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.cast instead.
ep 1 23.04 % rew [[1294.24818182]]
2019-03-22 18:35:35


In [4]:
#  results of the last episode
info = pd.DataFrame(info)
info

Unnamed: 0,step,state,observation,action,reward,next_state,next_observation,done,electricity_price,old_charge,charge,gross_rate,losses,net_rate
0,0,"[[43.82, 0.00035444199999999995, 2.0]]","[[-0.22027078263726202, -0.1527319338816422, -...",[[-2.0]],[[7.967272727272731]],"[[43.82, 0.000506512, 1.8333333333333333]]","[[-0.22027078263726202, -0.1527319338816422, -...",False,43.82,2,[[1.8333333333333333]],[[-2.000000000000001]],0,[[-2.000000000000001]]
1,1,"[[43.82, 0.000506512, 1.8333333333333333]]","[[-0.22027078263726202, -0.1527319338816422, -...",[[0.0]],[[-0.0]],"[[43.82, 0.000153676, 1.8333333333333333]]","[[-0.22027078263726202, -0.1527319338816422, -...",False,43.82,[[1.8333333333333333]],[[1.8333333333333333]],[[0.0]],0,[[0.0]]
2,2,"[[43.82, 0.000153676, 1.8333333333333333]]","[[-0.22027078263726202, -0.1527319338816422, -...",[[-2.0]],[[7.967272727272731]],"[[43.82, 0.00029357900000000003, 1.66666666666...","[[-0.22027078263726202, -0.1527319338816422, -...",False,43.82,[[1.8333333333333333]],[[1.6666666666666665]],[[-2.000000000000001]],0,[[-2.000000000000001]]
3,3,"[[43.82, 0.00029357900000000003, 1.66666666666...","[[-0.22027078263726202, -0.1527319338816422, -...",[[-2.0]],[[7.967272727272731]],"[[43.82, 0.000106107, 1.4999999999999998]]","[[-0.22027078263726202, -0.1527319338816422, -...",False,43.82,[[1.6666666666666665]],[[1.4999999999999998]],[[-2.000000000000001]],0,[[-2.000000000000001]]
4,4,"[[43.82, 0.000106107, 1.4999999999999998]]","[[-0.22027078263726202, -0.1527319338816422, -...",[[-2.0]],[[7.967272727272731]],"[[43.82, 0.00036606, 1.333333333333333]]","[[-0.22027078263726202, -0.1527319338816422, -...",False,43.82,[[1.4999999999999998]],[[1.333333333333333]],[[-2.000000000000001]],0,[[-2.000000000000001]]
5,5,"[[43.82, 0.00036606, 1.333333333333333]]","[[-0.22027078263726202, -0.1527319338816422, -...",[[-2.0]],[[7.967272727272731]],"[[150.0, 0.0, 1.1666666666666663]]","[[-0.15267041167248643, -0.21883698114390648, ...",False,43.82,[[1.333333333333333]],[[1.1666666666666663]],[[-2.000000000000001]],0,[[-2.000000000000001]]
6,6,"[[150.0, 0.0, 1.1666666666666663]]","[[-0.15267041167248643, -0.21883698114390648, ...",[[-2.0]],[[27.272727272727266]],"[[150.0, 0.000246857, 0.9999999999999997]]","[[-0.15267041167248643, -0.21883698114390648, ...",False,150.00,[[1.1666666666666663]],[[0.9999999999999997]],[[-1.9999999999999996]],0,[[-1.9999999999999996]]
7,7,"[[150.0, 0.000246857, 0.9999999999999997]]","[[-0.15267041167248643, -0.21883698114390648, ...",[[-2.0]],[[27.272727272727266]],"[[150.0, 0.0, 0.833333333333333]]","[[-0.15267041167248643, -0.21883698114390648, ...",False,150.00,[[0.9999999999999997]],[[0.833333333333333]],[[-1.9999999999999996]],0,[[-1.9999999999999996]]
8,8,"[[150.0, 0.0, 0.833333333333333]]","[[-0.15267041167248643, -0.21883698114390648, ...",[[-2.0]],[[27.272727272727266]],"[[150.0, 0.0, 0.6666666666666664]]","[[-0.15267041167248643, -0.21883698114390648, ...",False,150.00,[[0.833333333333333]],[[0.6666666666666664]],[[-1.9999999999999996]],0,[[-1.9999999999999996]]
9,9,"[[150.0, 0.0, 0.6666666666666664]]","[[-0.15267041167248643, -0.21883698114390648, ...",[[-2.0]],[[27.272727272727266]],"[[150.0, 0.0, 0.4999999999999998]]","[[-0.15267041167248643, -0.21883698114390648, ...",False,150.00,[[0.6666666666666664]],[[0.4999999999999998]],[[-1.9999999999999996]],0,[[-1.9999999999999996]]
