# Tuning a PID controller for the Lunar Lander with a genetic algorithm

In [40]:
import gymnasium as gym
import numpy as np
import pygmo as pg

In this exercise, you will implement a PID controller to safely land a shuttle. The parameters of the controller will be optimized using a _genetic algorithm_.

<img src="https://www.gymlibrary.dev/_images/lunar_lander.gif" width="400" align="center">



The [_Lunar Lander_](https://gymnasium.farama.org/environments/box2d/lunar_lander/) simulator is part of the _gym_ collection. Please familiarize with the environment before solving the exercise. 

*NOTE*: you may need to run `mamba install box2d-py` and/or `mamba install "gymnasium[box2d]"` and `mamba install pygmo`.

In [None]:
env = gym.make("LunarLander-v3", continuous=True)
state, _ = env.reset()

### PID controller

A _Proportional Integrative Derivative (PID)_ controller continously computes an error $e(t)$ as the difference between a desired setpoint (SP) and a measured process variable (PV). The controller attempts to minimize the error over time using a control $a(t)$ of the form
$$a(t) := K_p e(t) + K_i \int_0^t e(\tau) d\tau + K_d \frac{\textrm{d}e(t)}{\textrm{d}t}.$$

In our setting, the PVs are the altitude and the angle of the lander. The SP is made of:
- an altitude setpoint equal to $|x_{p}|$, where $x_p$ is the $x$-coordinate of the actual position of the lander. This coincides with the horizontal distance to the target, which is at the coordinates $(0,0)$.
- an angle setpoint equal to $\frac{\pi}{4}(x_p + v_x)$, where $v_x$ is the $x$-component of the velocity of the lander.

Here, the we will actually implement a PD controller (without integral component).

### Observation and action spaces

At each timestep the lander has access to its current state, consisting of
- the coordinates of the lander in $x$ and $y$;
- the coordinates of its linear velocity in $x$ and $y$;
- its angle;
- its angular velocity;
-  two booleans that represent whether each leg is in contact with the ground or not (touch sensors on each leg).

As regards the actions, the first component of an action determines the throttle of the main engine, while the second component specifies the throttle of the lateral boosters.

1. Write a function called `pd` that takes the state of the lander (8 components) and the array of PD
   parameters (4 components) as inputs (both are `numpy` arrays) and returns a `numpy` array containing the action (2 throttle
   components).

In [42]:
def pd(state, params):
    # ...
        
    # gym wants the controls as a np.array in which each entry
    # belongs to the interval [-1,1]
    a = np.array([a_y, a_angle])
    a = np.clip(a, -1, +1)
    
    # If the legs are on the ground we made it, kill engines
    if(touch_sensor_1 or touch_sensor_2):
        a[:] = 0   
    return a

2. Write a function called `play` that plays one episode of the `LunarLander`
   enviroment, until reaching termination or truncation. The function must take the
   array of the PD parameters and a `gymnasium` environment as
   inputs and return the corresponding total reward (score). The actions during the
   game must be chosen using the `pd` function.

In [43]:
def play(params, env):
    
    # ...
    
    return score

3. Write a [`pygmo`](https://esa.github.io/pygmo2/index.html) problem class (see docs
   [here](https://esa.github.io/pygmo2/tutorials/coding_udp_simple.html)) for the tuning
   of the PD controller parameters ($K_p$ and $K_d$ for altitude and angle, 4 in total).
   The `fitness` method must return the average score over 5 episodes, for a given individual (set of PD parameters).
   Limit the ranges of each of the paramteres to the interval $[-10,10]$.

4. Solve the optimization problem using a genetic algorithm (suggested initial settings: 20 generations, 20 individuals). Print the parameters of the PD at the end of the optimization.

In [46]:
# play one game with the best parameters
env_2 = gym.make("LunarLander-v2", render_mode="human", continuous=True)
env_2.reset()
_ = play(best_PD_params, env_2)
env_2.close()

5. Compute the average score over 1000 games obtained with the best PD controller.