# Polder pumping station - machine learning tutorial
## Polder pumping station
Because a polder closely resembles a bath tub, it will slowly fill with water because of rainfall if it's not emptied by a pump. 

![Polder](https://raw.githubusercontent.com/SGnoddeHHDelfland/PolderGemaalBesturing/main/images/polder1.png)

This is done by a polder pumping station - or _poldergemaal_ in Dutch. 
This is a small building with a big pump that can remove water from the polder to canals that eventually transfer it to sea.


If the polder is emptied too much, the land in the polder has too little water for nature and farmers. Therefore, an ideal water level is determined by the water board. The pumping station is managed in such way, that this level is followed as closely as possible, by turning the pumps on and off. This resembles the working of for example an oven or central heating in a house.

_This is what a polder pumping station looks like, although it comes in many shapes and sizes._

![Gemaal 1](https://raw.githubusercontent.com/SGnoddeHHDelfland/PolderGemaalBesturing/main/images/gemaal1.jpg)

_Schematic pumping station._

![Gemaal 2](https://raw.githubusercontent.com/SGnoddeHHDelfland/PolderGemaalBesturing/main/images/gemaal2.png)

In this tutorial, you will create a control system which tries to follow this ideal water level as closely as possible, while it's raining in the polder. This rain is generated randomly. The water level we're aiming at is 0.5 m. 


## (Deep) Reinforcement learning
We'll be using a technique called deep reinforcement learning. Based on simulations, the method tries to decide what is the best action it can do. It uses so-called neural networks, which are originally based on the workings of human brains. These kinds of models have been used to create the best chess robots in the world and for the first time beat the best human in the game of Go. 

_A Neural Network_

![nn](https://raw.githubusercontent.com/SGnoddeHHDelfland/PolderGemaalBesturing/main/images/nn.png)


The model consists of a number of parts: 
- Environment: the world the agent interacts with
- Agent: the model that alters the environment
- State: the state the world (environment) is in at a moment
- Reward: this is important; in order to learn to do what we want, the system should deliver rewards. These should be high when the model does what we want it to do, and low when it's doing things that we don't want it to do.

![reinforcement learning](https://raw.githubusercontent.com/SGnoddeHHDelfland/PolderGemaalBesturing/main/images/reinforcementlearning.png)

## Reward
In this tutorial, the first question you will try to solve, is the reward function. You will try to make a function that is high when the water level is around the level we want (so 0.5 m) and low when the water level is away from that number. In this example, we'll flip this and make a penalty: the further away from 0.5, the higher the penalty and the more negative the reward becomes. 

## Python
The programming is done in the programming called Python. This is a relatively easy programming language and is widely used by data scientist.
We're using a so-called Notebook to run the Python code in the browser. Press the play button or press `shift`+`enter` to run a cell. You can basically run all cells after one another, but there's one cell which you'll have to alter before running.

## Hints
There is a number of hints near the first question. Run a cell with the hint to see the hint.

Below is the first hint, but first run the cell below:

In [None]:
import requests
path_json = r"https://raw.githubusercontent.com/SGnoddeHHDelfland/PolderGemaalBesturing/main/hints.json"
response = requests.get(path_json)
hints = response.json()
def print_hints(number):
    print(hints['hints'][number])


This will be your first hint, which we've basically already talked about.

In [None]:
print_hints(0)

Run this cell to install the model we're using. This is installed on Google, not on your pc. Don't mind all the text that will show up below it.

In [None]:
!pip install tensorforce

Just run the cell below

In [None]:
from tensorforce.environments import Environment
from tensorforce.agents import Agent
import numpy as np
import plotly.express as px
import pandas as pd
from random import uniform

### Question 1
Create a penalty function behind the `penalty = `. This should be a function of the water level (`water_level`).

Note: if you want to use a power (tot de macht), this is written as `**` in Python, instead of `^` for example.

In [None]:
def calculate_penalty(water_level):
    penalty = ...
    return penalty

Hint 2

In [None]:
print_hints(1)

Hint 3

In [None]:
print_hints(2)

Hint 4

In [None]:
print_hints(3)

In [None]:
class PolderEnvironment(Environment):
    """Simple polder environment. It is a polder with a single pump attached to it. 
    If the pump is activated, the waterlevel in the polder will lower. 
    Rainfall will cause the waterlevel in the polder to rise. 
    The amount of rainfall is random. 
    The goal will be to keep the waterlevel between 0.0 and 1.0m NAP and to keep this up for at least 100 timesteps.
    """
    def __init__(self):
        super().__init__()
        self.water_level = np.random.uniform(low=0.0, high=1.0, size=(1,))

    def states(self):
        return dict(type='float', shape=(1,),  min_value=0.0, max_value=1.0)

    def actions(self):
        return dict(type='int', num_values=2)

    # Optional, should only be defined if environment has a natural maximum
    # episode length
    def max_episode_timesteps(self):
        return 500


    # Optional
    def close(self):
        super().close()


    def reset(self):
        """Reset state."""
        self.timestep = 0
        self.water_level = np.random.uniform(low=0.0, high=1.0, size=(1,))
        return self.water_level


    def response(self, action):
        """Respond to an action."""
        return self.water_level - (action * 0.2)


    def reward_compute(self):
        penalty = calculate_penalty(self.water_level)
        return -penalty

        # TODO 0 weghalen

    def rain(self):
        return uniform(0.0,0.25)

    # back-up
    def terminal(self):
        if self.water_level > 1.0 or self.water_level < 0.0:
            return True 
        return False

    def execute(self, actions):
        ## Check the action is either 0 or 1 -- pump on or off.
        assert actions == 0 or actions == 1

        ## Increment timestamp
        self.timestep += 1
        
        ## Update the current_temp
        self.water_level = self.response(actions)
        self.water_level += self.rain()

        ## Compute the reward
        reward = self.reward_compute()[0]

        ## The only way to go terminal is to exceed max_episode_timestamp.
        ## terminal == False means episode is not done
        ## terminal == True means it is done.
        terminal = self.terminal()
        
        return self.water_level, terminal, reward

Create the environment and agent (just run these cells)

In [None]:
environment = environment = Environment.create(
    environment=PolderEnvironment,
    max_episode_timesteps=500)

In [None]:
agent = Agent.create(
    agent='tensorforce', environment=environment, update=64,
    optimizer=dict(optimizer='adam', learning_rate=1e-3),
    objective='policy_gradient', reward_estimation=dict(horizon=1)
)

This is the actual training, so it can take a while

In [None]:
for _ in range(200):
    states = environment.reset()
    terminal = False
    while not terminal:
        actions = agent.act(states=states)
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)

Run this cell to see whether it worked and see the results. If it didn't work, you'll have to change you penalty function and you'll have to rerun all the cells from the changed cell onwards.

In [None]:
### Initialize
environment.reset()
environment.water_level = np.random.uniform(low=0.0, high=1.0, size=(1,))
states = environment.water_level

internals = agent.initial_internals()
terminal = False

### Run an episode
temp = [environment.water_level[0]]
while not terminal:
    actions, internals = agent.act(states=states, internals=internals, independent=True)
    states, terminal, reward = environment.execute(actions=actions)
    temp += [states[0]]

fig = px.line(df, x=df.index, y='water level', title='Reinforcement learning poldergemaal: water level over time',
             labels={
                     "index": "Time"})
fig.update_yaxes(range = [0,1])
fig.show()