# Q-Learning with Battery Example
This notebook demonstrates the ability of a DQN agent to learn to optimize electric battery storage.

This example involves a constant and repetitive electricity price profile, combined with a perfect forecast. The agent has both the ability to memorize this profile and lives in a near Markov environment.  

A real world application of using reinforcement learning to control a battery would have to deal with both a variable price profile and a non-Markov understanding of what the price profile would do in the future.  It could also involve additional reward signals, such as payments from fast frequency response needed to be balanced against price arbitrage.

In [1]:
import os
import random

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf

import energy_py

  return f(*args, **kwds)
  from ._conv import register_converters as _register_converters


In [2]:
#  set random seeds for repeatability
random.seed(42)
np.random.seed(42)
tf.set_random_seed(42)

In [3]:
#  define a total number of steps for the experiment to run
TOTAL_STEPS = 400000

#  to setup the agent we use a dictionary
#  a dictionary allows us to eaisly save the config to csv if we want
agent_config = {
    'discount': 0.97,                 #  the discount rate
    'tau': 0.001,                     #  parameter that controls the copying of weights from online to target network
    'total_steps': TOTAL_STEPS,   
    'batch_size': 32,                 #  size of the minibatches used for learning
    'layers': (50, 50),               #  structure of the neural network used to approximate Q(s,a)
    'learning_rate': 0.0001,          #  controls the stength of weight updates during learning       
    'epsilon_decay_fraction': 0.3,    #  a fraction as % of total steps where epsilon decayed from 1.0 to 0.1
    'memory_fraction': 0.4,           #  the size of the replay memory as a % of total steps
    'memory_type': 'deque',           #  the replay memory implementation we want
               }

#  keep all of the BatteryEnv variables (episode length, efficiency etc) at their defaults
#  we just need to let our env know where our state.csv and observation.csv are (data_path)
env = energy_py.make_env('battery')

In [4]:
#  initialize Tensorflow machinery
with tf.Session() as sess:
    
    #  Runner is a class that helps us with experiments - tracking rewards, writing environment info to csv and managing TensorBoard
    #  in this notebook we just use it to track rewards for us
    runner = energy_py.Runner(
        sess,  
        {'tb_rl': './tb_rl',
         'ep_rewards': './rewards.csv'},
        TOTAL_STEPS
    )
    
    #  add the tf session and the environment to the agent config dictionary
    #  and initialize the agent
    agent_config['sess'] = sess
    agent_config['env'] = env
    agent = energy_py.make_agent(agent_id='dqn', **agent_config)
    
    #  initial values for the step and episode number
    step, episode = 0, 0

    #  outer while loop runs through multiple episodes
    while step < TOTAL_STEPS:
        episode += 1
        done = False
        observation = env.reset()
        
        #  inner while loop runs through a single episode
        while not done:
            step += 1
            #  select an action
            action = agent.act(observation)
            
            #  take one step through the environment
            next_observation, reward, done, info = env.step(action)
            
            #  store the experience
            agent.remember(observation, action, reward,
                           next_observation, done)
            
            #  moving to the next time step
            observation = next_observation
            #  saving the reward 
            runner.record_step(reward)
            
            #  we don't start learning until the memory is half full
            if step > int(agent.memory.size * 0.5):
                train_info = agent.learn()
            
        runner.record_episode()

Instructions for updating:
Use the retry module or similar alternatives.


Instructions for updating:
Use the retry module or similar alternatives.


KeyboardInterrupt: 

In [None]:
#  energy_py uses TensorBoard for logging - for the scope of this notebook example we will do
#  some simple plotting using matplotlib
plt.plot(runner.global_rewards, label='Total reward per episode [$]')

In [None]:
#  we can also look at what happened in our last episode
ep_hist = pd.DataFrame.from_dict(info)
ep_hist.head()

In [None]:
plt.plot(ep_hist.loc[:, 'new_charge'])

In [None]:
plt.plot(ep_hist.loc[:, 'electricity_price'])