<div>
<img src="./utils/LogoL2RPN.jpg", width=150, ALIGN="left", border=10>
<h1>L2RPN Starting Kit </h1> 


ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". The CDS, CHALEARN, AND/OR OTHER ORGANIZERS OR CODE AUTHORS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL AUTHORS AND ORGANIZERS BE LIABLE FOR ANY SPECIAL, 
INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. 
</div>

<div>
    <h2>Introduction </h2>
    <p> 
     <br>
The goal of this challenge is to use Reinforcement Learning approaches for managing a powergrid. The Reinforcement Learning agents will have to automate the control of the powergrid. We use the power network simulator <a href="https://github.com/rte-france/Grid2Op">Grid2Op</a>. It is a simulator that is able to emulate a powergrid of any size and its electrical properties depending on the temporal injections (electricity production and consumption) at each time step.

References and credits: <br>
The creator of Grid2Op was Benjamin Donnot. The competition was designed by Isabelle Guyon, Antoine Marot, Benjamin Donnot and Balthazar Donon. Luca Veyrin, Camillo Romero, Marvin Lerousseau and Kimang Khun are distinguished contributors to the L2RPN challenge.
 <br> 
</div>

Import the required libraries :

In [None]:
import sys
import logging
import os
#%matplotlib inline
# Uncomment the next lines to auto-reload libraries (this causes some problem with pickles in Python 3)
%load_ext autoreload
%autoreload 2
#import seaborn as sns; sns.set()
import warnings
import numpy as np
import pandas as pd
# import evaluate
%matplotlib inline
import matplotlib.pyplot as plt
import sys
import subprocess
import json
import datetime 
import zipfile
from grid2op import make
warnings.simplefilter(action='ignore', category=FutureWarning)
from utils import problem_dir, score_dir, input_data_check_dir, output_dir

Let's define a path for our submission :

In [None]:
model_dir = 'example_submissions/submission'

if not os.path.exists(model_dir):
    os.makedirs(model_dir)

Note that the zipped file that you will upload to Codalab can be named how you want, but **the submission folder absolutely has to be named "submission", as we did here.**

<div>
    <h1> 1 - Loading the environment </h1>
</div>

The first time that you build the environment, grid2Op will automatically download all the corresponding data (about 300Mo).

We will first create an agent and (re) explain some of the basics of grid2op.

In [None]:
env = make("l2rpn_case14_sandbox")

## Maintenance & Hazards

Grid2op handles maintenance and hazards in the scenarios that the agents will have to solve. However, in this sandbox "competition" there is no hazard and maintenance operations. Even then, overloads can still occur and have to be managed.

<div>
<h1>2 - Building an Agent</h1>
</div>

We provide simple examples of agents in the `starting-kit/example_submissions` directory. We will show an example here using the most simple agent: the "do nothing" agent, that never modifies anything. To make your own agent, you should create a subclass of the `grid2op.Agent.BaseAgent.BaseAgent` class and implement your own act method as shown below.
    
**NB** For the real competition, a repository containing several baseline agents will be open source. We are actively working on it but are currently facing some open source license issues for this sandbox competition.

To make a submission on the challenge, **you must create a folder containing a `__init__.py` script in which a `make_agent` function (and optionally a `reward` variable as we will see later) will have to be defined. That function should return an instance of your agent.**

The `make_agent` function takes two argument, `env` and `submission_dir` and must return an instance of your agent.

That instance (your agent) must implement an `act` method, which has to take the arguments `observation`, `reward` and `done` and must return the chosen action.

Your final submission folder **must** look like:
```bash
submission
├── some_other_script.py
├── __init__.py
```

In particular, **the folder must be named "submission"**.
The folder can also include other directories and any data that your scripts need.

It is also possible to use symbolic links that point to files or folders elsewhere. These files or folders will be copied to replace the symbolic links when the folder is zipped. This allows you to work in your development directory and simply add symbolic links in your submission directory that point to your files or folders where you work. This way, you do not have to copy/paste them to your submission directory, that will be done automatically when the folder is zipped.

Once that folder is zipped, it will look like:
```bash
any_name.zip
├── submission
│   ├── some_other_script.py
│   ├── __init__.py
├── metadata
```

Below is an example. Let's create three scripts in our folder : `__init__.py`, `submission.py` in which we will define the needed variables for the submission, and `agents.py` in which we will define our agent.
We will name the folder `submission`, which is necessary as we discussed.

**Agents** : This is the `agents.py` script where we define our agent class as a subclass of `grid2op.Agent.BaseAgent.BaseAgent`. It implements the `act` method. Here is an example of agent that does nothing (it is equivalent to the `DoNothing` agent) :

In [None]:
%%writefile example_submissions/submission/agents.py

from grid2op.Agent import BaseAgent

class MyAgent(BaseAgent):
    """
    The template to be used to create an agent: any controller of the power grid is expected to be a subclass of this
    grid2op.Agent.BaseAgent.
    """
    def __init__(self, action_space):
        """Initialize a new agent."""
        BaseAgent.__init__(self, action_space=action_space)

    def act(self, observation, reward, done):
        """The action that your agent will choose depending on the observation, the reward, and whether the state is terminal"""
        # do nothing for example (with the empty dictionary) :
        return self.action_space({})

**Variables for the submission** : In the `submission.py` script, we will define the few variables that we need for the submission (the `make_agent` function, and the `reward` variable if we want to as is explained later in this notebook).

In [None]:
%%writefile example_submissions/submission/submission.py

from .agents import MyAgent
from grid2op.Reward import ConstantReward

def make_agent(env, submission_dir):
    """
    This function will be used by codalab to create your agent. It should accept exactly an environment and a path
    to your submission directory and return a valid agent.
    """
    agent = MyAgent(env.action_space)
    return agent

# reward must be a subclass of grid2op.Reward.BaseReward.BaseReward:
reward = ConstantReward # you can also create your own reward class

**`__init.py__`** : This is the script that will be read by Codalab. Here we simply load the required variables for the submission :

In [None]:
%%writefile example_submissions/submission/__init__.py

from .submission import make_agent, reward

We can see that our folder is correctly set :

In [None]:
os.listdir(model_dir)

## Baselines

We are actively working on some baselines that you will be able to reuse easily. Stay tuned on our discord server at https://discord.gg/cYsYrPT !

Now that we have correctly defined our agent, we will give a short explanation of how rewards works in grid2op, and then we will proceed to the submission.

## 3 - The Reward and the Score of an agent

### Reward
The Reward is the quantity that your agent will aim to maximize. This is a rather personal choice, you can choose any reward function that you think is adequate.

Grid2op allows for a large variety of such reward functions. You can visit [rewards in grid2op](https://grid2op.readthedocs.io/en/latest/reward.html) for more information about rewards.

In this competition, you can use any of the provided reward functions, or your own, to train your agent and to assess its performance. To do that, you need to have the reward class that you want to use as a `reward` variable in your `__init__.py`. That class must be a subclass of `grid2op.Reward.BaseReward.BaseReward`. See the official help for the competition on our discord https://discord.gg/cYsYrPT if you need some help. 

More examples will be provided for the complete competition.

### Score
The Score is the quantity that is used to compare your agent with the agents of the other participants.

To begin with, we will recall that transporting electricity always generates some energy losses $\mathcal{E}_{loss}(t)$ due to the Joule effect in resistive powerlines at any time $t$:
\begin{equation}
    \mathcal{E}_{loss}(t)=\sum\limits_{l=1}^{n_{l}}r_l*{y_l}(t)^2
\end{equation}
Where $r_l$ is the resistance of powerline $l$ and $y_l$ is the current flowing through it.

At any time $t$, the operator of the grid is responsible for compensating those energy losses by purchasing on the energy market the corresponding amount of production at the marginal price $p(t)$. We can therefore define the following energy loss cost $c_{loss}(t)$:

\begin{equation}
c_{loss}(t)=\mathcal{E}_{loss}(t)*p(t)
\end{equation}

Then we should consider that the decisions made by the operator can induce costs, especially when they require market players to perform specific actions, as they should be paid in return. Topological actions (modifying the structure of the grid) are mostly free, since the grid belongs to the powergrid operator, and no energy cost is involved. However, energy producers are affected by the redispatching actions (having some generators produce more energy and others produce less) and should get paid. As the grid operators ask to redispatch some energy $\mathcal{E}_{redispatch}(t)$, some power plants will increase their production by $\mathcal{E}_{redispatch}(t)$ while others will compensate by decreasing their production by the same amount to keep the power grid balanced. Hence, the grid operator will pay both producers for this redispatched energy at an additional cost $c_{redispatching(t)}$, higher than the marginal price $p(t)$ by some factor $\alpha$:

\begin{equation}
c_{redispatching}(t)=2*\mathcal{E}_{redispatch}*\alpha p(t),\ \alpha \geqslant1
\end{equation}

Indeed, the first producer has to be paid an extra $\mathcal{E}_{redispatch}*\alpha p(t)$ because he has to produce $\mathcal{E}_{redispatch}$ more energy than it had planned to, and the second producer also has to be paid an extra $\mathcal{E}_{redispatch}*\alpha p(t)$ to compensate for the $\mathcal{E}_{redispatch}$ energy that it did not produce and sell.

If no flexibility is identified or integrated into the grid, operational costs related to redispatching can dramatically increase due to renewable energy sources (since the production from these energy sources can vary significantly throughout a year) as was the case recently in Germany with **an avoidable 1 billion €/year increase**.

Hence, we can define our overall operational cost $c_{\text{operations}}(t)$:
\begin{equation}
c_{\text{operations}}(t)=c_{\text{loss}}(t)+c_{\text{redispatching}}(t)
\end{equation}

Formally, we can define an "episode" $e$ successfully managed by an agent up to a time $t_{\text{end}}$ (on a scenario of maximum length $T_e$) by:
\begin{equation}
e = \left(o_1, a_1, o_2, a_2,\dots, a_{t_{\text{end}}-1}, o_{t_{\text{end}} }\right)
\end{equation}
where $o_t$ represents the observation at time $t$, and $a_t$ the action that the agent took at time t. In particular, $o_1$ is the first observation and $o_{t_{\text{end}}}$ is the last one. The scenario ended at time $t_{end}$, either because there was a game over or because the agent reached the end of the scenario.
An agent can either manage to operate the grid for the entire scenario ($t_{\text{end}} = T_e$) or fail after some time $t_{\text{end}}$ because of a blackout. In case of a blackout, the cost $c_{\text{blackout}}(t)$ at a given time t would be proportional to the amount of consumption that was not supplied, $\text{Load}(t)$, at a price higher than the marginal price $p(t)$ by some factor $\beta$:
\begin{equation}
c_{\text{blackout}}(t)=\text{Load}(t)*\beta*p(t), \ \beta \geqslant1
\end{equation}
Notice that $\text{Load}(t) >> \mathcal{E}_{\text{redispatch}}(t)$ or $\mathcal{E}_{\text{loss}}(t)$
which means that the cost of a blackout is a lot higher than the cost of operating the grid as expected. It is even higher if we further consider the secondary effects on the economy. More information can be found thanks to <a href="https://www.blackout-simulator.com/">this blackout cost simulator</a>. Furthermore, a blackout does not last forever and power grids restart at some point, but for the sake of simplicity while preserving most of the realism, all these additional complexities are not considered here so the scenario will be terminated in case of a game over.

Now we can define our cost $c$ for an episode:
\begin{equation}
c(e)=\sum\limits_{t=1}^{t_{\text{end}}} c_{\text{operations}}(t) + \sum\limits_{t=t_{\text{end}}}^{T_{e}}c_{\text{blackout}}(t)
\end{equation}

The participants are encouraged to operate the grid for as long as possible, and will be penalized for a blackout even after the game is over, until $T_e$, as this is a critical system and safety is paramount.

Finally, participants will be tested on $N$ hidden scenarios of different lengths, varying from one day to one week, and on various situations that proved difficult to our baselines. This will be the way to test the agent's behavior in various representative conditions. The overall score to minimize over all the scenarios will be :

\begin{equation}
Score=\sum\limits_{i=1}^{N}c(e_i)
\end{equation}

### Rescaling the scores
For the `DoNothing` agent this score was really high on our scenarios, around 33 billions. Since this is less readable, we decided to apply a linear transformation such that:
- the score is 100 for the best possible agent (an agent that handles all the scenarios, without using redispatching actions, with minimal losses of $1%$ for all the scenarios)
- the score is 0 for the `DoNothing` agent

This means that:
- the score should be **maximized** rather than minimized
- having a score of 100 is probably out of reach
- having a positive score is already pretty good and means that your agent is better than the `DoNothing` agent

### Note on the hidden scenarios

For this sandbox competition, hidden scenarios are defined as followed:
- there are 3 scenarios that last only 1 day
- there are 6 scenarios that last 2 days
- there is 1 scenario that lasts 3 days

Scenarios have been picked carefully to offer different levels of difficulty. Keep in mind that while the provided data (chronics) with which you can train your agent always start at midnight, the data that will be used to evaluate your agent (at test time) can start at arbitrary times.

The duration between two consecutive time steps is fixed and is 5 minutes.

<div>
<h1> 4 - Making a submission </h1> 

We will see in the next notebook how to submit our agent to Codalab.