<div style="background:#FFFFAA">
<img src="./logo.jpg", width=150, ALIGN="left", border=20>
<h1>L2RPN Starting Kit </h1> 

<br>This code was tested with <br>
Python 3.6.6 |Anaconda custom (64-bit)| (default, Nov 2018, 11:07:29) (https://anaconda.org/)<br>
<i> Adapted for Chalab by Isabelle Guyon from original code of Balázs Kégl</i> <br>
<a href="http://www.datascience-paris-saclay.fr">Paris Saclay Center for Data Science (CDS)</a>
</center>
<p>
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". The CDS, CHALEARN, AND/OR OTHER ORGANIZERS OR CODE AUTHORS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL AUTHORS AND ORGANIZERS BE LIABLE FOR ANY SPECIAL, 
INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. 
</div>

<div style="background:#FFFFAA">
    <h2>Introduction </h2>
    <p> 
     <br>
The goal of this challenge is to use Reinforcement Learning in Power Grid management by designing RL agents to automate the control of the power grid. The dataset used in this challenge is from <a href="https://github.com/MarvinLer/pypownet">pypownet</a>, made by Marvin Lerousseau, it is a simulator that is able to emulate a power grid of any size and electrical properties subject to a set of temporal injections for discretized time-steps.

References and credits: <br>
Founder of pypownet was Marvin Lerousseau. The competition protocol was designed by Isabelle Guyon. Our mentors are Balthazar Donon and Antoine Marot. Pypownet, 2017. https://github.com/MarvinLer/pypownet. The baseline methods were inspired by work performed by Kimang Khun.
 <br> 
</div>

In [1]:
model_dir = 'example_submission/'
problem_dir = 'ingestion_program/'  
score_dir = 'scoring_program/'
input_dir = 'public_data/'
output_dir = 'output/'
from sys import path; path.append(model_dir); path.append(problem_dir); path.append(score_dir);
path.append(input_dir); path.append(output_dir);
%matplotlib inline
# Uncomment the next lines to auto-reload libraries (this causes some problem with pickles in Python 3)
%load_ext autoreload
%autoreload 2
import seaborn as sns; sns.set()
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)


<div style="background:#FFFFAA">
    <h1> Step 1: Exploratory data analysis </h1>
<p>
We provide data with the starting kit.
    <br>
</div>

## Electrical grid
<div >
<img src="./ExampleGrid.JPG", width=750, ALIGN="left", border=20>
    <br>
    <br>
(courtesy of Marvin Lerousseau)
</div>

During the challenge, a grid of 14 substations is given. 20 lines connected the nodes of the network.

For the following example, we take the case where there are 11 loads and 5 prods and particularly the hard level. Furthermore, the information shown are only those of January.

In [2]:
data_dir = 'public_data/hard'              # Change this to the directory where you put the input data
!ls $data_dir*

chronics  configuration.yaml  reference_grid.m


For convenience, we load the data as a "pandas" data frame, so we can use "pandas" to explore the data.

<div style="background:#FFFFAA">
<h1>Step 2: Building an Agent</h1>
</div>

<div style="background:#FFFFAA">
    <h2>Loading data with pypownet</h2>
    <p>
We reload the data with the environment class of pypownet
   <br>
    
To win, flows in a line have to stay under a threshold. Above this threshold, the line will overheat and after a certain amount of overheating, the line will break. Thermal limits are already defined in pypownet.
</div>

In [3]:
import os
import pypownet.environment
import pypownet.runner
data_dir = 'public_data'  
environment = pypownet.environment.RunEnv(parameters_folder=os.path.abspath(data_dir),
                                              game_level="hard",
                                              chronic_looping_mode='natural', start_id=0,
                                              game_over_mode="soft")

Using custom reward signal CustomRewardSignal of file /home/nicolas/test/Grid2/starting_kit/public_data/reward_signal.py



                     GAME PARAMETERS
    hard_overflow_coefficient: 1.0
    loadflow_backend: pypower
    loadflow_mode: AC
    max_number_loads_game_over: 6
    max_number_prods_game_over: 3
    max_seconds_per_timestep: 1.0
    n_timesteps_consecutive_soft_overflow_breaks: 10
    n_timesteps_hard_overflow_is_broken: 10
    n_timesteps_horizon_maintenance: 48
    n_timesteps_soft_overflow_is_broken: 10



<div style="background:#FFFFAA">
    <h2>Building an agent</h2>
    <p>
We provide examples of agent (for reinforcement learning) in the `starting-kit/example_submission` directory. It is a quite stupid agent: it does nothing. Replace it with your own agent.
    </div>

## Scoring the results of an agent

<div style="background:#FFFFAA">
    <br>
    <p>
<b>The metric chosen for your challenge</b> is identified in the "metric.txt" file found in the `scoring_function/` directory. The function "get_metric" searches first for a metric having that name in my_metric.py, then in libscores.py, then in sklearn.metric.
    <br>
The aim of a reinforcement learning problem is to maximize the reward function.

When running the agent, two values are given back : the first one is the reward of the last timestep and the second one is the cumulative reward for all the iterations of the run of the agent. The reward indicates if the game is going towards a game over or not.

Specifically, our reward function is composed of 5 subrewards. They describe the proportion of isolated productions, loads, the cost of an action, an indication of the amount of changes between the current grid and the initial grid and lastly information on the lines capacity usage. 
    </div>

In [4]:
from scoring_program import libscores
from libscores import get_metric
metric_name, scoring_function = get_metric()
print('Using scoring metric:', metric_name)
# Uncomment the next line to display the code of the scoring metric
#??scoring_function

Using scoring metric: reward


In [5]:
import time
start = time.time()
end = time.time()
print(end-start)

1.52587890625e-05


In [6]:
class CustomAgent(pypownet.agent.Agent):
    """
    An example of a baseline controler that randomly switches the status of one random power line per timestep (if the
    random line is previously online, switch it off, otherwise switch it on).
    """

    def __init__(self, environment):
        super().__init__(environment)
        self.verbose = True

    def act(self, observation):
        # Sanity check: an observation is a structured object defined in the environment file.
        assert isinstance(observation, pypownet.environment.Observation)
        action_space = self.environment.action_space

        # Create template of action with no switch activated (do-nothing action)
        action = action_space.get_do_nothing_action()

        # Select lines to switch
        if True :
            lines_load = observation.get_lines_capacity_usage()
            nb_lines = len(lines_load)
            assert nb_lines == action_space.lines_status_subaction_length
            for i in range(nb_lines):
                lines_status = action_space.get_lines_status_switch_from_id(action,i)
                if lines_status == 0:
                    action_space.set_lines_status_switch_from_id(action=action,line_id=i,new_switch_value=0)
                if lines_load[i] > 1:
                    action_space.set_lines_status_switch_from_id(action=action,line_id=i,new_switch_value=1)
                    action_name = 'switching status of line %d' % i
                    if self.verbose:
                        print('Action chosen: ', action_name, '; expected reward %.4f' % reward)


        # Test the reward on the environment
        reward_aslist = self.environment.simulate(action, do_sum=False)
        reward = sum(reward_aslist)
        if self.verbose:
            print('reward: [', ', '.join(['%.2f' % c for c in reward_aslist]), '] =', reward)


        return action

        # No learning (i.e. self.feed_reward does pass)

In [7]:
import logging
import sys
import time
start = time.time()
NUMBER_ITERATIONS = 50

submission_dir = 'example_submission'
sys.path.append(submission_dir)

if not os.path.exists(output_dir):
    os.makedirs(output_dir)
log_path = os.path.abspath(os.path.join(output_dir, 'runner.log'))


open(log_path, 'w').close()
submitted_controler = CustomAgent(environment)
# Instanciate a runner, that will save the run statistics within the log_path file, to be parsed and processed
# by the scoring program
phase_runner = pypownet.runner.Runner(environment, submitted_controler, verbose=True, vverbose=False,
                                      log_filepath=log_path)
phase_runner.ch.setLevel(logging.ERROR)
# Run the planned experiment of this phase with the submitted model
score = phase_runner.loop(iterations=NUMBER_ITERATIONS)
print("cumulative rewards : {}".format(score))
end = time.time()
print(end-start)

reward: [ -0.00, -0.00, 0.00, -0.00, -0.25 ] = -0.24796949553161035
reward: [ -0.00, -0.00, 0.00, -0.00, -0.23 ] = -0.2336597951442309
reward: [ -0.00, -0.00, 0.00, -0.00, -0.22 ] = -0.21913744390058026
reward: [ -0.00, -0.00, 0.00, -0.00, -0.20 ] = -0.20384386816027017
reward: [ -0.00, -0.00, 0.00, -0.00, -0.19 ] = -0.19069972264196305
reward: [ -0.00, -0.00, 0.00, -0.00, -0.18 ] = -0.17744492936380044
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.16367978698921049
reward: [ -0.00, -0.00, 0.00, -0.00, -0.15 ] = -0.152185684456322
reward: [ -0.00, -0.00, 0.00, -0.00, -0.14 ] = -0.14216176841593572
reward: [ -0.00, -0.00, 0.00, -0.00, -0.13 ] = -0.1349605806565615
reward: [ -0.00, -0.00, 0.00, -0.00, -0.13 ] = -0.1288497684212152
reward: [ -0.00, -0.00, 0.00, -0.00, -0.12 ] = -0.12406685688698504
reward: [ -0.00, -0.00, 0.00, -0.00, -0.12 ] = -0.12088946521997883
reward: [ -0.00, -0.00, 0.00, -0.00, -0.12 ] = -0.11828113986188324
reward: [ -0.00, -0.00, 0.00, -0.00, -0.12 ] = -0.115

<div style="background:#FFFFAA">
    <b> Save the best agent </b> it should be a class Submission and save in "example_submission/submission.py".  Uncomment the line <i>%%writefile example_submission/submission.py to save the agent. </i>
</div>

In [8]:
#%%writefile example_submission/submission.py
import pypownet.agent
import pypownet.environment
import numpy as np
import os

class Submission(pypownet.agent.Agent):
    """
    An example of a baseline controler that randomly switches the status of one random power line per timestep (if the
    random line is previously online, switch it off, otherwise switch it on).
    """

    def __init__(self, environment):
        super().__init__(environment)
        self.verbose = True

    def act(self, observation):
        # Sanity check: an observation is a structured object defined in the environment file.
        assert isinstance(observation, pypownet.environment.Observation)
        action_space = self.environment.action_space

        # Create template of action with no switch activated (do-nothing action)
        action = action_space.get_do_nothing_action()

        # Select lines to switch
        if True :
            lines_load = observation.get_lines_capacity_usage()
            nb_lines = len(lines_load)
            assert nb_lines == action_space.lines_status_subaction_length
            for i in range(nb_lines):
                lines_status = action_space.get_lines_status_switch_from_id(action,i)
                if lines_status == 0:
                    action_space.set_lines_status_switch_from_id(action=action,line_id=i,new_switch_value=0)
                if lines_load[i] > 1:
                    action_space.set_lines_status_switch_from_id(action=action,line_id=i,new_switch_value=1)
                    action_name = 'switching status of line %d' % i
                    if self.verbose:
                        print('Action chosen: ', action_name, '; expected reward %.4f' % reward)


        # Test the reward on the environment
        reward_aslist = self.environment.simulate(action, do_sum=False)
        reward = sum(reward_aslist)
        if self.verbose:
            print('reward: [', ', '.join(['%.2f' % c for c in reward_aslist]), '] =', reward)


        return action

        # No learning (i.e. self.feed_reward does pass)

<div style="background:#FFFFAA">
<h1> Step 3: Making a submission </h1> 

<h2> Unit testing </h2> 

It is <b><span style="color:red">important that you test your submission files before submitting them</span></b>. All you have to do to make a submission is modify the file <code>submission.py</code> in the <code>starting_kit/example_submission/</code> directory, then run this test to make sure everything works fine. This is the actual program that will be run on the server to test your submission. 
<br>
Keep the sample code simple.
</div>

In [9]:
!python $problem_dir/ingestion.py $input_dir $input_dir/res $problem_dir $model_dir

input dir: /home/nicolas/test/Grid2/starting_kit/public_data
output dir: /home/nicolas/test/Grid2/starting_kit/public_data/res
program dir: /home/nicolas/test/Grid2/starting_kit/ingestion_program
submission dir: /home/nicolas/test/Grid2/starting_kit/example_submission
input content ['medium', '__pycache__', 'reward_signal.py', 'hard', 'level0', 'easy', 'res']
output content ['runner.log']
program content ['data_io.py', '__pycache__', 'data_manager.py', 'data_converter.py', 'ingestion.py', 'metadata']
submission content ['my_agents.py', '__pycache__', 'submission.py', 'baseline_agents.py', 'metadata']
Using custom reward signal CustomRewardSignal of file /home/nicolas/test/Grid2/starting_kit/public_data/reward_signal.py

                     GAME PARAMETERS
    hard_overflow_coefficient: 1.0
    loadflow_backend: pypower
    loadflow_mode: AC
    max_number_loads_game_over: 6
    max_number_prods_game_over: 3
    max_seconds_per_timestep: 1.0
    n_timesteps_consecutive_soft_overflow_br

reward: [ -0.00, -0.00, 0.00, -0.00, -0.34 ] = -0.3412070082170262
reward: [ -0.00, -0.00, 0.00, -0.00, -0.34 ] = -0.33648028384029344
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.33216499985153103
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.3280802749641189
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.3247692225875993
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.32272830561402144
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.31886850845293485
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.3165906741897905
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.3134893072359359
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.31097161411875673
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.30799301128774004
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.30490406048699104
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.30286251079409715
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.299420432574864
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.29568

reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.5780354019331515
reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.5827777309968438
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5884401444336843
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5914332940566519
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5937222360757902
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5921248166681169
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5921570460446898
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5892366487764729
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5875356584073027
reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.5844260758169285
reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.5811511274016746
reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.5767142753728414
reward: [ -0.00, -0.00, 0.00, -0.00, -0.57 ] = -0.573493492135135
reward: [ -0.00, -0.00, 0.00, -0.00, -0.57 ] = -0.5672427429579638
reward: [ -0.00, -0.00, 0.00, -0.00, -0.56 ] = -0.5619744354403

reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.3035709309115596
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.31056534306499506
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.31916940916816056
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.3289237604156333
reward: [ -0.00, -0.00, 0.00, -0.00, -0.34 ] = -0.3376558800306794
reward: [ -0.00, -0.00, 0.00, -0.00, -0.35 ] = -0.3456116845403208
reward: [ -0.00, -0.00, 0.00, -0.00, -0.35 ] = -0.3528589265444484
reward: [ -0.00, -0.00, 0.00, -0.00, -0.36 ] = -0.3583873673735222
reward: [ -0.00, -0.00, 0.00, -0.00, -0.36 ] = -0.3647631845023519
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.37201868356454115
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.379563666922854
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.38966849346400206
reward: [ -0.00, -0.00, 0.00, -0.00, -0.40 ] = -0.4004675593977906
reward: [ -0.00, -0.00, 0.00, -0.00, -0.41 ] = -0.41300931938113644
reward: [ -0.00, -0.00, 0.00, -0.00, -0.43 ] = -0.42811600

reward: [ -0.00, -0.00, 0.00, -0.00, -0.62 ] = -0.6161222351854103
reward: [ -0.00, -0.00, 0.00, -0.00, -0.62 ] = -0.617101389974894
reward: [ -0.00, -0.00, 0.00, -0.00, -0.62 ] = -0.616998030762571
reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.6146399611436636
reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.614239718878615
reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.614179537387279
reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.6124798916359451
reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.6096010184264727
reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.6083819142253801
reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.605817231006342
reward: [ -0.00, -0.00, 0.00, -0.00, -0.60 ] = -0.6040493266942694
reward: [ -0.00, -0.00, 0.00, -0.00, -0.60 ] = -0.59984369156307
reward: [ -0.00, -0.00, 0.00, -0.00, -0.60 ] = -0.5972424796558555
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5933706078770313
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5905430088530624
re

reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.3934505274670742
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.3906880219341179
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.3882091273996838
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.3880181696875712
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.3873463838632198
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.38286196261524835
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.381084114942165
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.37661511638826667
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.37098204202826185
reward: [ -0.00, -0.00, 0.00, -0.00, -0.36 ] = -0.3646764057626064
reward: [ -0.00, -0.00, 0.00, -0.00, -0.36 ] = -0.35801519763363243
reward: [ -0.00, -0.00, 0.00, -0.00, -0.35 ] = -0.35102339707994035
reward: [ -0.00, -0.00, 0.00, -0.00, -0.34 ] = -0.3439129569501285
reward: [ -0.00, -0.00, 0.00, -0.00, -0.34 ] = -0.33874559342974603
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.3340334

reward: [ -0.00, -0.00, 0.00, -0.00, -0.40 ] = -0.3959537786308647
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.3938704313245944
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.3901648211357968
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.3860584076150907
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.38325897707028067
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3813162241549944
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3808769463321285
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3810731485067732
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3826039442885082
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3840173143033173
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.38395617392663617
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.38296406500347896
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3809979739538651
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3785360980045292
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.374019583

reward: [ -0.00, -0.00, 0.00, -0.00, -0.27 ] = -0.27281266906007245
reward: [ -0.00, -0.00, 0.00, -0.00, -0.27 ] = -0.2744186008085885
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.2767603778912382
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.28234549316725505
reward: [ -0.00, -0.00, 0.00, -0.00, -0.29 ] = -0.29130384065089177
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.2993746141461543
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.30886971858027984
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.3161706751539632
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.31817749381896027
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.31843470548160635
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.31443636856220225
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.3066916626087776
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.2999623201699581
reward: [ -0.00, -0.00, 0.00, -0.00, -0.29 ] = -0.29274295247960275
reward: [ -0.00, -0.00, 0.00, -0.00, -0.29 ] = -0.2875

reward: [ -0.00, -0.00, 0.00, -0.00, -0.18 ] = -0.17792961917780303
reward: [ -0.00, -0.00, 0.00, -0.00, -0.18 ] = -0.17546064561465116
reward: [ -0.00, -0.00, 0.00, -0.00, -0.17 ] = -0.17200113900312125
reward: [ -0.00, -0.00, 0.00, -0.00, -0.17 ] = -0.16904867411047356
reward: [ -0.00, -0.00, 0.00, -0.00, -0.17 ] = -0.1675274805688046
reward: [ -0.00, -0.00, 0.00, -0.00, -0.17 ] = -0.1659445159919406
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.16469637160117945
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.16346287892193387
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.1620327543837718
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.1614872683898152
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.16055463458101796
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15965411622475853
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15806773411732827
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15748246628086035
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15

<div style="background:#FFFFAA">
Also test the scoring program:
    </div>

In [10]:
scoring_output_dir = 'output'
!python $score_dir/evaluate.py $input_dir $scoring_output_dir

public_data/
output
step : 1000, cumulative rewards : -453.92


<div style="background:#FFFFAA">
    <h1> Preparing the submission </h1>

Zip the contents of `sample_code_submission/` (without the directory), or download the challenge public_data and run the command in the previous cell, after replacing sample_data by public_data.
Then zip the contents of `sample_result_submission/` (without the directory).
<b><span style="color:red">Do NOT zip the data with your submissions</span></b>.

In [11]:
import datetime 
from data_io import zipdir
the_date = datetime.datetime.now().strftime("%y-%m-%d-%H-%M")
sample_code_submission = 'sample_code_submission_' + the_date + '.zip' 
zipdir(sample_code_submission, model_dir) 
print("Submit one of these files:\n" + sample_code_submission + "\n")

Submit one of these files:
sample_code_submission_19-03-07-23-04.zip

