# SARSA for continues learning tasks

SARSA stands for **S**tate **A**ction **R**eward **S**tate **A**ction. The way we update the action values that our agent takes is encoded in the name of the algorithm: we make an action in a state, observe the reward and then take some precomputed values from the next state that we would end up in. 

The tabular SARSA update is: 

$$Q(S, A) \leftarrow Q(S, A) + \alpha \left( R + \gamma Q(S^{*}, A^{*}) - Q(S, A) \right)$$

The update rule works fine for specific tabular cases, but we need to augment it when we have an infinite number of states. 

# Continues learning task - stabilizing a moon lander 

Let us imagine that we are tasked of developing an algorithm to stabilize a moon lander. We want that our moon landercould self-drive unto a landing pad that is on the ground. 

The state that our moon lander is in can be ploted on a 2d plot and is defined by the $x$ and $y$ coordinates ($x$, $y$). The coordinates of the goal state (the landing pad) are ($x^{*}$, $y^{*}$)

The physics that our moon lander moves are quite simplified. It can only do 1 action per time step and those actions are: 

$$A = \{-1, 0, 1\}$$ 

Where: 

-1 means that the rockets fire to the right, moving the moon lander to the left.

0 means that our agent did not fire the rockets.

+1 means that the rockets fire to the left, moving the moon lander to the right. 

The trajectory that our agent takes during each time step is: 

$$x_{t + 1} = x_{t} + a_{t} * 0.1 + unif_{[-0.1, 0.1]}$$

$$y_{t + 1} = y_{t} - 0.05$$

The moon lander always starts at a random location generated by: 

$$ (unif_{[-10, 10]}, 50) $$

The landing pad is in the point $(0, 0)$


In [None]:
# Numpy randomness generator
import numpy as np 

class MoonLander: 

    def __init__(
            self, 
            rocket_strength: float = 0.1,
            wind_strenght: float = 0.1,
            actions: list = [-1, 0, 1]
            ):
        self.rocket_strength = rocket_strength
        self.wind_strength = wind_strenght

        # Initiating the lander 
        self.init_lander()

    def init_lander(self, starting_y: int = 50): 
        starting_x = np.random.uniform(-10, 10) 
        self.starting_point = (starting_x, starting_y)
        