<div style="background:#FFFFAA">
<img src="./logo.jpg", width=150, ALIGN="left", border=20>
<h1>L2RPN Starting Kit </h1> 

<br>This code was tested with <br>
Python 3.6.6 |Anaconda custom (64-bit)| (default, Nov 2018, 11:07:29) (https://anaconda.org/)<br>
<i> Adapted for Chalab by Isabelle Guyon from original code of Balázs Kégl</i> <br>
<a href="http://www.datascience-paris-saclay.fr">Paris Saclay Center for Data Science (CDS)</a>
</center>
<p>
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". The CDS, CHALEARN, AND/OR OTHER ORGANIZERS OR CODE AUTHORS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL AUTHORS AND ORGANIZERS BE LIABLE FOR ANY SPECIAL, 
INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. 
</div>

<div style="background:#FFFFAA">
    <h2>Introduction </h2>
    <p> 
     <br>
The goal of this challenge is to use Reinforcement Learning in Power Grid management by designing RL agents to automate the control of the power grid. The dataset used in this challenge is from <a href="https://github.com/MarvinLer/pypownet">pypownet</a>, made by Marvin Lerousseau, it is a simulator that is able to emulate a power grid of any size and electrical properties subject to a set of temporal injections for discretized time-steps.

References and credits: <br>
Founder of pypownet was Marvin Lerousseau. The competition protocol was designed by Isabelle Guyon. Our mentors are Balthazar Donon and Antoine Marot. Pypownet, 2017. https://github.com/MarvinLer/pypownet. The baseline methods were inspired by work performed by Kimang Khun.
 <br> 
</div>

In [2]:
model_dir = 'example_submission/'
problem_dir = 'ingestion_program/'  
score_dir = 'scoring_program/'
input_dir = 'public_data/'
output_dir = 'output/'
from sys import path; path.append(model_dir); path.append(problem_dir); path.append(score_dir);
path.append(input_dir); path.append(output_dir);
%matplotlib inline
# Uncomment the next lines to auto-reload libraries (this causes some problem with pickles in Python 3)
%load_ext autoreload
%autoreload 2
import seaborn as sns; sns.set()
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)


<div style="background:#FFFFAA">
    <h1> Step 1: Exploratory data analysis </h1>
<p>
We provide data with the starting kit.
    <br>
</div>

## Electrical grid
<div >
<img src="./ExampleGrid.JPG", width=750, ALIGN="left", border=20>
    <br>
    <br>
(courtesy of Marvin Lerousseau)
</div>

During the challenge, a grid of 14 substations is given. 20 lines connected the nodes of the network.

For the following example, we take the case where there are 11 loads and 5 prods and particularly the hard level. Furthermore, the information shown are only those of January.

In [3]:
data_dir = 'public_data/hard'              # Change this to the directory where you put the input data
!ls $data_dir*

chronics  configuration.yaml  reference_grid.m


For convenience, we load the data as a "pandas" data frame, so we can use "pandas" to explore the data.

<div style="background:#FFFFAA">
<h1>Step 2: Building an Agent</h1>
</div>

<div style="background:#FFFFAA">
    <h2>Loading data with pypownet</h2>
    <p>
We reload the data with the environment class of pypownet
   <br>
    
To win, flows in a line have to stay under a threshold. Above this threshold, the line will overheat and after a certain amount of overheating, the line will break. Thermal limits are already defined in pypownet.
</div>

In [4]:
import os
import pypownet.environment
import pypownet.runner
data_dir = 'public_data'  
environment = pypownet.environment.RunEnv(parameters_folder=os.path.abspath(data_dir),
                                              game_level="hard",
                                              chronic_looping_mode='natural', start_id=0,
                                              game_over_mode="soft")

Using custom reward signal CustomRewardSignal of file /home/slaerd/Grid/starting_kit/public_data/reward_signal.py


<div style="background:#FFFFAA">
    <h2>Building an agent</h2>
    <p>
We provide examples of agent (for reinforcement learning) in the `starting-kit/example_submission` directory. It is a quite stupid agent: it does nothing. Replace it with your own agent.
    </div>

## Scoring the results of an agent

<div style="background:#FFFFAA">
    <br>
    <p>
<b>The metric chosen for your challenge</b> is identified in the "metric.txt" file found in the `scoring_function/` directory. The function "get_metric" searches first for a metric having that name in my_metric.py, then in libscores.py, then in sklearn.metric.
    <br>
The aim of a reinforcement learning problem is to maximize the reward function.

When running the agent, two values are given back : the first one is the reward of the last timestep and the second one is the cumulative reward for all the iterations of the run of the agent. The reward indicates if the game is going towards a game over or not.

Specifically, our reward function is composed of 5 subrewards. They describe the proportion of isolated productions, loads, the cost of an action, an indication of the amount of changes between the current grid and the initial grid and lastly information on the lines capacity usage. 
    </div>

In [5]:
from scoring_program import libscores
from libscores import get_metric
metric_name, scoring_function = get_metric()
print('Using scoring metric:', metric_name)
# Uncomment the next line to display the code of the scoring metric
#??scoring_function

Using scoring metric: reward


In [6]:
import time
start = time.time()
end = time.time()
print(end-start)

1.621246337890625e-05


In [7]:
class CustomAgent(pypownet.agent.Agent):
    """
    An example of a baseline controler that randomly switches the status of one random power line per timestep (if the
    random line is previously online, switch it off, otherwise switch it on).
    """

    def __init__(self, environment):
        super().__init__(environment)
        self.verbose = True

    def act(self, observation):
        # Sanity check: an observation is a structured object defined in the environment file.
        assert isinstance(observation, pypownet.environment.Observation)
        action_space = self.environment.action_space

        # Create template of action with no switch activated (do-nothing action)
        action = action_space.get_do_nothing_action()

        # Select lines to switch
        if True :
            lines_load = observation.get_lines_capacity_usage()
            nb_lines = len(lines_load)
            assert nb_lines == action_space.lines_status_subaction_length
            for i in range(nb_lines):
                lines_status = action_space.get_lines_status_switch_from_id(action,i)
                if lines_status == 0:
                    action_space.set_lines_status_switch_from_id(action=action,line_id=i,new_switch_value=0)
                if lines_load[i] > 1:
                    action_space.set_lines_status_switch_from_id(action=action,line_id=i,new_switch_value=1)
                    action_name = 'switching status of line %d' % i
                    if self.verbose:
                        print('Action chosen: ', action_name, '; expected reward %.4f' % reward)


        # Test the reward on the environment
        reward_aslist = self.environment.simulate(action, do_sum=False)
        reward = sum(reward_aslist)
        if self.verbose:
            print('reward: [', ', '.join(['%.2f' % c for c in reward_aslist]), '] =', reward)


        return action

        # No learning (i.e. self.feed_reward does pass)

In [8]:
import logging
import sys
import time
start = time.time()
NUMBER_ITERATIONS = 50

submission_dir = 'example_submission'
sys.path.append(submission_dir)

if not os.path.exists(output_dir):
    os.makedirs(output_dir)
log_path = os.path.abspath(os.path.join(output_dir, 'runner.log'))


open(log_path, 'w').close()
submitted_controler = CustomAgent(environment)
# Instanciate a runner, that will save the run statistics within the log_path file, to be parsed and processed
# by the scoring program
phase_runner = pypownet.runner.Runner(environment, submitted_controler, verbose=True, vverbose=False,
                                      log_filepath=log_path)
phase_runner.ch.setLevel(logging.ERROR)
# Run the planned experiment of this phase with the submitted model
score = phase_runner.loop(iterations=NUMBER_ITERATIONS)
print("cumulative rewards : {}".format(score))
end = time.time()
print(end-start)

reward: [ -0.00, -0.00, 0.00, -0.00, -0.25 ] = -0.24796949553161035
reward: [ -0.00, -0.00, 0.00, -0.00, -0.23 ] = -0.2336597951442309
reward: [ -0.00, -0.00, 0.00, -0.00, -0.22 ] = -0.21913744390058015
reward: [ -0.00, -0.00, 0.00, -0.00, -0.20 ] = -0.20384386816027014
reward: [ -0.00, -0.00, 0.00, -0.00, -0.19 ] = -0.19069972264196308
reward: [ -0.00, -0.00, 0.00, -0.00, -0.18 ] = -0.17744492936380044
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.1636797869892105
reward: [ -0.00, -0.00, 0.00, -0.00, -0.15 ] = -0.1521856844563219
reward: [ -0.00, -0.00, 0.00, -0.00, -0.14 ] = -0.1421617684159357
reward: [ -0.00, -0.00, 0.00, -0.00, -0.13 ] = -0.1349605806565615
reward: [ -0.00, -0.00, 0.00, -0.00, -0.13 ] = -0.12884976842121523
reward: [ -0.00, -0.00, 0.00, -0.00, -0.12 ] = -0.12406685688698502
reward: [ -0.00, -0.00, 0.00, -0.00, -0.12 ] = -0.12088946521997883
reward: [ -0.00, -0.00, 0.00, -0.00, -0.12 ] = -0.11828113986188321
reward: [ -0.00, -0.00, 0.00, -0.00, -0.12 ] = -0.115

<div style="background:#FFFFAA">
    <b> Save the best agent </b> it should be a class Submission and save in "example_submission/submission.py".  Uncomment the line <i>%%writefile example_submission/submission.py to save the agent. </i>
</div>

In [9]:
#%%writefile example_submission/submission.py
import pypownet.agent
import pypownet.environment
import numpy as np
import os

class Submission(pypownet.agent.Agent):
    """
    An example of a baseline controler that randomly switches the status of one random power line per timestep (if the
    random line is previously online, switch it off, otherwise switch it on).
    """

    def __init__(self, environment):
        super().__init__(environment)
        self.verbose = True

    def act(self, observation):
        # Sanity check: an observation is a structured object defined in the environment file.
        assert isinstance(observation, pypownet.environment.Observation)
        action_space = self.environment.action_space

        # Create template of action with no switch activated (do-nothing action)
        action = action_space.get_do_nothing_action()

        # Select lines to switch
        if True :
            lines_load = observation.get_lines_capacity_usage()
            nb_lines = len(lines_load)
            assert nb_lines == action_space.lines_status_subaction_length
            for i in range(nb_lines):
                lines_status = action_space.get_lines_status_switch_from_id(action,i)
                if lines_status == 0:
                    action_space.set_lines_status_switch_from_id(action=action,line_id=i,new_switch_value=0)
                if lines_load[i] > 1:
                    action_space.set_lines_status_switch_from_id(action=action,line_id=i,new_switch_value=1)
                    action_name = 'switching status of line %d' % i
                    if self.verbose:
                        print('Action chosen: ', action_name, '; expected reward %.4f' % reward)


        # Test the reward on the environment
        reward_aslist = self.environment.simulate(action, do_sum=False)
        reward = sum(reward_aslist)
        if self.verbose:
            print('reward: [', ', '.join(['%.2f' % c for c in reward_aslist]), '] =', reward)


        return action

        # No learning (i.e. self.feed_reward does pass)

<div style="background:#FFFFAA">
<h1> Step 3: Making a submission </h1> 

<h2> Unit testing </h2> 

It is <b><span style="color:red">important that you test your submission files before submitting them</span></b>. All you have to do to make a submission is modify the file <code>submission.py</code> in the <code>starting_kit/example_submission/</code> directory, then run this test to make sure everything works fine. This is the actual program that will be run on the server to test your submission. 
<br>
Keep the sample code simple.
</div>

In [10]:
!python $problem_dir/ingestion.py $input_dir $input_dir/res $problem_dir $model_dir

input dir: /home/slaerd/Grid/starting_kit/public_data
output dir: /home/slaerd/Grid/starting_kit/public_data/res
program dir: /home/slaerd/Grid/starting_kit/ingestion_program
submission dir: /home/slaerd/Grid/starting_kit/example_submission
input content ['__pycache__', 'easy', 'hard', 'level0', 'medium', 'res', 'reward_signal.py']
output content ['runner.log']
program content ['__pycache__', 'data_converter.py', 'data_io.py', 'data_manager.py', 'ingestion.py', 'metadata']
submission content ['__pycache__', 'baseline_agents.py', 'metadata', 'my_agents.py', 'submission.py']
Using custom reward signal CustomRewardSignal of file /home/slaerd/Grid/starting_kit/public_data/reward_signal.py
log file path /home/slaerd/Grid/starting_kit/public_data/res/runner.log
reward: [ -0.00, -0.00, 0.00, -0.00, -0.35 ] = -0.3523809517091301
reward: [ -0.00, -0.00, 0.00, -0.00, -0.34 ] = -0.3428478104433785
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.3306026483518817
reward: [ -0.00, -0.00, 0.00, -0.

reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.3109716141187567
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.30799301128774004
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.30490406048699104
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.3028625107940971
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.299420432574864
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.29568448311992246
reward: [ -0.00, -0.00, 0.00, -0.00, -0.29 ] = -0.29308005586688357
reward: [ -0.00, -0.00, 0.00, -0.00, -0.29 ] = -0.2901857416227346
reward: [ -0.00, -0.00, 0.00, -0.00, -0.29 ] = -0.2867154049608963
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.2839368818906478
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.28158733778484746
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.2798794305139686
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.2778594408764464
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.2755322652753245
reward: [ -0.00, -0.00, 0.00, -0.00, -0.27 ] = -0.27329223

reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.581151127401675
reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.576714275372841
reward: [ -0.00, -0.00, 0.00, -0.00, -0.57 ] = -0.5734934921351349
reward: [ -0.00, -0.00, 0.00, -0.00, -0.57 ] = -0.5672427429579637
reward: [ -0.00, -0.00, 0.00, -0.00, -0.56 ] = -0.5619744354403654
reward: [ -0.00, -0.00, 0.00, -0.00, -0.55 ] = -0.553533392283193
reward: [ -0.00, -0.00, 0.00, -0.00, -0.55 ] = -0.547657797937016
reward: [ -0.00, -0.00, 0.00, -0.00, -0.54 ] = -0.5406820136926078
reward: [ -0.00, -0.00, 0.00, -0.00, -0.53 ] = -0.5335119956193433
reward: [ -0.00, -0.00, 0.00, -0.00, -0.53 ] = -0.5303563260869693
reward: [ -0.00, -0.00, 0.00, -0.00, -0.52 ] = -0.5219861914223489
reward: [ -0.00, -0.00, 0.00, -0.00, -0.52 ] = -0.5162703685758189
reward: [ -0.00, -0.00, 0.00, -0.00, -0.51 ] = -0.5126806317171041
reward: [ -0.00, -0.00, 0.00, -0.00, -0.51 ] = -0.5068555959459712
reward: [ -0.00, -0.00, 0.00, -0.00, -0.50 ] = -0.5028036313590798

reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.37956366692285404
reward: [ -0.00, -0.00, 0.00, -0.00, -0.39 ] = -0.38966849346400184
reward: [ -0.00, -0.00, 0.00, -0.00, -0.40 ] = -0.4004675593977906
reward: [ -0.00, -0.00, 0.00, -0.00, -0.41 ] = -0.41300931938113605
reward: [ -0.00, -0.00, 0.00, -0.00, -0.43 ] = -0.42811600643018843
reward: [ -0.00, -0.00, 0.00, -0.00, -0.44 ] = -0.44356827353913897
reward: [ -0.00, -0.00, 0.00, -0.00, -0.46 ] = -0.45759136598821987
reward: [ -0.00, -0.00, 0.00, -0.00, -0.47 ] = -0.4699591175056855
reward: [ -0.00, -0.00, 0.00, -0.00, -0.49 ] = -0.4854711877525568
reward: [ -0.00, -0.00, 0.00, -0.00, -0.50 ] = -0.4983569462396026
reward: [ -0.00, -0.00, 0.00, -0.00, -0.51 ] = -0.5087093151093603
reward: [ -0.00, -0.00, 0.00, -0.00, -0.52 ] = -0.5203323324484452
reward: [ -0.00, -0.00, 0.00, -0.00, -0.53 ] = -0.5314887172423381
reward: [ -0.00, -0.00, 0.00, -0.00, -0.55 ] = -0.5455945964241892
reward: [ -0.00, -0.00, 0.00, -0.00, -0.56 ] = -0.556379

reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.6083819142253803
reward: [ -0.00, -0.00, 0.00, -0.00, -0.61 ] = -0.6058172310063421
reward: [ -0.00, -0.00, 0.00, -0.00, -0.60 ] = -0.6040493266942694
reward: [ -0.00, -0.00, 0.00, -0.00, -0.60 ] = -0.5998436915630699
reward: [ -0.00, -0.00, 0.00, -0.00, -0.60 ] = -0.5972424796558556
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5933706078770313
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5905430088530623
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5888444357203081
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.587199978996801
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5855074602229582
reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.5846624451568957
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5855074602229582
reward: [ -0.00, -0.00, 0.00, -0.00, -0.58 ] = -0.5846324529722529
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5855222468252094
reward: [ -0.00, -0.00, 0.00, -0.00, -0.59 ] = -0.5891390502433

reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3766151163882667
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.37098204202826185
reward: [ -0.00, -0.00, 0.00, -0.00, -0.36 ] = -0.3646764057626063
reward: [ -0.00, -0.00, 0.00, -0.00, -0.36 ] = -0.3580151976336323
reward: [ -0.00, -0.00, 0.00, -0.00, -0.35 ] = -0.35102339707994035
reward: [ -0.00, -0.00, 0.00, -0.00, -0.34 ] = -0.3439129569501285
reward: [ -0.00, -0.00, 0.00, -0.00, -0.34 ] = -0.3387455934297459
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.3340334307324269
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.3309267414097954
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.3274585277069998
reward: [ -0.00, -0.00, 0.00, -0.00, -0.33 ] = -0.32591792055352087
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.32311008566985544
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.32007198819419763
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.3177855442113562
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.3172010

reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.381073148506773
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3826039442885082
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3840173143033173
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.38395617392663617
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3829640650034789
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3809979739538649
reward: [ -0.00, -0.00, 0.00, -0.00, -0.38 ] = -0.3785360980045293
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.3740195831383303
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.3698890118484136
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.3685543204697926
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.36627694056413945
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.3650098285369734
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.3654237311203524
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.3666623631720627
reward: [ -0.00, -0.00, 0.00, -0.00, -0.37 ] = -0.36762188918

reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.3088697185802798
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.3161706751539632
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.31817749381896027
reward: [ -0.00, -0.00, 0.00, -0.00, -0.32 ] = -0.31843470548160635
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.3144363685622022
reward: [ -0.00, -0.00, 0.00, -0.00, -0.31 ] = -0.3066916626087776
reward: [ -0.00, -0.00, 0.00, -0.00, -0.30 ] = -0.29996232016995805
reward: [ -0.00, -0.00, 0.00, -0.00, -0.29 ] = -0.29274295247960264
reward: [ -0.00, -0.00, 0.00, -0.00, -0.29 ] = -0.28750520994851925
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.2833138710002281
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.2807400528911487
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.2781960992421501
reward: [ -0.00, -0.00, 0.00, -0.00, -0.28 ] = -0.27568201005323206
reward: [ -0.00, -0.00, 0.00, -0.00, -0.27 ] = -0.2722149733487606
reward: [ -0.00, -0.00, 0.00, -0.00, -0.27 ] = -0.265593

reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.1634628789219339
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.16203275438377174
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.16148726838981511
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.16055463458101796
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15965411622475847
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15806773411732825
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15748246628086035
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15692621959791483
reward: [ -0.00, -0.00, 0.00, -0.00, -0.16 ] = -0.15588855986243907
reward: [ -0.00, -0.00, 0.00, -0.00, -0.15 ] = -0.15460227042830083
reward: [ -0.00, -0.00, 0.00, -0.00, -0.15 ] = -0.15365585506033724
reward: [ -0.00, -0.00, 0.00, -0.00, -0.15 ] = -0.15298678569374344
reward: [ -0.00, -0.00, 0.00, -0.00, -0.15 ] = -0.15190109712551794
reward: [ -0.00, -0.00, 0.00, -0.00, -0.15 ] = -0.15105410771645467
reward: [ -0.00, -0.00, 0.00, -0.00, -0.15 ] = -0

<div style="background:#FFFFAA">
Also test the scoring program:
    </div>

In [11]:
scoring_output_dir = 'output'
!python $score_dir/evaluate.py $input_dir $scoring_output_dir

public_data/
output
step : 1000, cumulative rewards : -453.92


<div style="background:#FFFFAA">
    <h1> Preparing the submission </h1>

Zip the contents of `sample_code_submission/` (without the directory), or download the challenge public_data and run the command in the previous cell, after replacing sample_data by public_data.
Then zip the contents of `sample_result_submission/` (without the directory).
<b><span style="color:red">Do NOT zip the data with your submissions</span></b>.

In [12]:
import datetime 
from data_io import zipdir
the_date = datetime.datetime.now().strftime("%y-%m-%d-%H-%M")
sample_code_submission = 'sample_code_submission_' + the_date + '.zip' 
zipdir(sample_code_submission, model_dir) 
print("Submit one of these files:\n" + sample_code_submission + "\n")

Submit one of these files:
sample_code_submission_19-03-08-16-25.zip

