# Environment setup

- [The book github](https://github.com/TikhonJelvis/RL-book)

- First, move to the directory with the codebase:

   ```cd rl-book```

- Then, create and activate a Python vitrual environment

   ```python3 -m venv .venv```

   ```source .venv/bin/activate```
   
   ```conda create -n {env_name}```

   ```conda activate {env_name}```

- Once the environment is activated, you can install the right versions of each Python dependency.

   ```pip install -r requirements.txt```

- Once the environment is set up, you can confirm that it works by running the frameworks automated tests.

   ```python -m unittest discover```



## Classes and interfaces

- There are always two parts to answering this questions:

    - Understanding the domain concept that you are modeling.

    - Figuring out how to express that concept with features and patterns provided by your programming language.

- One approach would be to keep Probability implicit. Whenever we have a random variable, we could call a function and get a random result.

In [1]:
from random import randint

def six_sided():
    return randint(1, 6)

def roll_dice():
    return six_sided() + six_sided()

- This works, but it's pretty limited. We can't do anything except get one outcome at a time. This only captures a slice of how we think about Probability: there's randomness but we never even mentioned probability distributions.

### A distribution interface

- Let's define an abstaction for probability distributions. It depends on what kind of distribution we're working with. 

    - If we know something about the structure of a distribution - perhaps it's a Poisson distribution where $\lambda=5$, perhaps it's an empirical distribution with set probabilities for each outcome - we could do produce an exact Probability Distribution Function (PDF) or Cumumlative Distribution Function (CDF), calcaulate expectations and do various operations efficiently.

    - What if the distribution comes from a complicated simulation? At the extreme, we might not be able to do anything except draw samples from the distribution.

- Sampling is the least common denominator. Any abstraction we start with for a probability distribution needs to cover sampling, and any abstraction that requires and any abstraction that requires more than just sampling will not let us handle all the distributions we care about.

In [2]:
from abc import ABC, abstractmethod

class Distribution(ABC):
    @abstractmethod
    def sample(self):
        pass

- This class defines an interface : a definition of what we require for something to qualify as a distribution. Any kind of distribution we implement in the future will be able to generate samples; when we write functions that sample distributions, they cam require their inputs to inherit from `Distribtution`.

- We've made `Distribuition` an abstract base class (ABC), with `sample` as an abstact method. Abstract classes and abstract methods are features that Python provides to help us define interfaces for abstractions. We can define the `Distribiution` class to structure the rest of our probability distribution code before we define any specific distributions.

### A concrete distribution

- An interface can be approached from two sides:

    - Something that requires the interface. This will be code that uses operations specified in the interface and work with any value that satisfies those requirements.

    - Something that provides the interface. This will be some value that supports the operations specified in the interface.

- To use our `Distribution` class, we can start by providing a concrete class that implements the interface. Let's model dice.

In [3]:
import random

class Die(Distribution):
    def __init__(self, sides):
        self.sides = sides
    def sample(self):
        return random.randint(1, self.sides)
    
six_sided = Die(6)
def roll_dice():
    return six_sided.sample() + six_sided.sample()

In [4]:
print(six_sided)

<__main__.Die object at 0x106897c90>


- With a class we can fix this. To change the class is printed, we can override `__repr__`:

In [5]:
class Die(Distribution):
    def __init__(self, sides):
        self.sides = sides
    def sample(self):
        return random.randint(1, self.sides)
    def __repr__(self) -> str:
        return f"Die(sides={self.sides})"

In [6]:
print(Die(6))

Die(sides=6)


#### Dataclasses

- Two `Die` object with the same number of sides have the same behavior and represent the same probablility distribution, but with the default version of `__eq__`, two `Die` objects declared separately will never be equal:

In [7]:
six_sided = Die(6)
six_sided == six_sided

True

In [8]:
six_sided == Die(6)

False

In [9]:
Die(6) == Die(6)

False

In [10]:
class Die(Distribution):
    def __init__(self, sides):
        self.sides = sides
    def sample(self):
        return random.randint(1, self.sides)
    def __repr__(self) -> str:
        return f"Die(sides={self.sides})"
    def __eq__(self, other):
        if isinstance(other, Die):
            return self.sides == other.sides
        return False

In [11]:
Die(6) == Die(6)

True

In [12]:
Die(6) == None

False

- Python 3.7 introduces a feature that fixes all of these problems: `dataclases`. The `dataclasses` module provides a decorator that lets up write a class that behaves like `Die` without needing to manually implement `__init__`, `__repr__`, or `__eq__`.

In [13]:
from dataclasses import dataclass

@dataclass
class Die(Distribution):
    sides: int
    def sample(self):
        return random.randint(1, self.sides)

In [14]:
Die(6) == Die(6)

True

#### Immutability

- Changing state can create invisible conncections between seemingly separate parts of the codebase, which becomes hard to mentally track. 

- It is better to have the language prevent us from doing the wrong thing than relying on pure convention. Normal Python classes don't have a convenient  way to stop attributes from changing, but luckily dataclasses do:

    - With `frozen=True`, attempting to change sides will raise an exception.

In [15]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Die(Distribution):
    
    sides: int
    def sample(self):
        return random.randint(1, self.sides)

In [16]:
d = Die(6)
# an exception is raised
# d.sides = 10

- An object that we cannot change is called immutable. Instead of changing the object inplace, we can return a fresh copy with the attribute changed; `dataclassses` provides a `replace` function that makes this easy: 

In [17]:
import dataclasses

d6 = Die(6)
d20 = dataclasses.replace(d6, sides=20)
d20

Die(sides=20)

`frozen=True` has an important bonus: we can use immutable objects as dictionary keys and set elements. Without `frozen=True`, we would get a `TypeError` because non-frozen dataclases do not implement `__hash__`:

In [18]:
from dataclasses import dataclass

@dataclass
class Die(Distribution):
    
    sides: int
    def sample(self):
        return random.randint(1, self.sides)

In [19]:
d = Die(6)
# an excepion occurs
# {d: 'abc'}

In [20]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Die(Distribution):
    
    sides: int
    def sample(self):
        return random.randint(1, self.sides)

In [21]:
d = Die(6)
{d: 'abc'}

{Die(sides=6): 'abc'}

### Type variables

- The `distribution` class defines an interface for any distribution.

- To deal with different types from `sample`, we need type variables. Type variables are also known as 'generics' because they let us write classes that generically work for any type.

- To add annotations to the abstract `Distribution` class, we will need to define a type variable for the outcoimes for the distribution, then tell Python that `Distribution` is "generic" in that type:

In [22]:
from typing import Generic, TypeVar

# A type variable named "A"
A = TypeVar("A")

# Distribution is "generic in A"
class Distribution(ABC, Generic[A]):
    # sampling must produce a value of type A
    def sample(self) -> A:
        pass

- In this code, we defined a type variable A and specified that `Distribution` uses A by inheriting from `Generic[A]`. We can not write type annotations for distributions with specific types of outcomes:

In [23]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Die(Distribution[int]):
    
    sides: int
    def sample(self):
        return random.randint(1, self.sides)

- This lets us write specialized functions that only work with certain kinds of distributions. Let's say we wanted to write a function that approximated the expected value fo a distribution by sampling repeatedly and calculating the mean. This function works for distributions that have numeric outcomes - `float` or `int`- but not other kinds of distributions. We can annotate this explicitly by using `Distribution[float]:`

In [24]:
import statistics

def expected_value(d: Distribution[float], n: int=100) -> float:
    return statistics.mean(d.sample() for _ in range(n))

### Functionality

- One of the practical advantages of defining general-purpose abstration in our code is that it gives us a place to add functionality that will work for any instance of the abstraction. 

- One of the most common operations for a probability distribution that we can sample i drawing $n$ samples.

- We could just write a loop every time we needed to do this.

```python
samples = []
for _ in range(100):
    samples += [Distribution.sample()]

samples
```

- We can add a method for it instead.

In [28]:
from typing import Generic, TypeVar
from collections.abc import Sequence

# A type variable named "A"
A = TypeVar("A")

# Distribution is "generic in A"
class Distribution(ABC, Generic[A]):
    
    def sample_n(self, n:int) -> Sequence[A]:
        return [self.sample() for _ in range(n)]
    
    # sampling must produce a value of type A
    def sample(self) -> A:
        pass

- The impression here is different - it's using a list comprehension rather than a normal `for` loop. The more important distinction happens when we use the method; instead of needing a `for` loop or list comprehension each time, we can jusy write:

In [30]:
distribution = Distribution()

samples = distribution.sample_n(100)

- This pattern of implmenting general-purpose functions on our abstractions becomes a lot more useful as the functions themselves become more complicated.

- There is another advantage to defining methods like `sample_n`: some kinds of distributions might have more efficient or more accurate ways to implement the same logic. If that's the case, we would override `sample_n` to use the better implementation.

In [35]:
import numpy as np

@dataclass
class Gaussian(Distribution[float]):
    mu: float
    sigma: float
    
    def sample(self) -> float:
        return np.random.normal(loc=self.mu, scale=self.sigma)
    
    def sample_n(self, n: int) -> Sequence[float]:
        return np.random.normal(loc=self.mu, scale=self.sigma, size=n)


- `numpy` is optimized for array operations, which means that there is an up-front cost to calling `numpy.random.normal` the first time, but it can quickly generate additional samples after that. The performance impact is significant.

In [36]:
import timeit

d = Gaussian(mu=0, sigma=1)
timeit.timeit(lambda: [d.sample() for _ in range(100)])

43.988386292010546

In [37]:
timeit.timeit(lambda: d.sample_n(100))

2.1296967500820756

## Abstraction over computation

- Classes do give us one way to model behavior: methods. A common analogy is that objects act as "nouns" and methods act as "verbs".

    - 1. If we implement a new type of distribution with a custom sample method, we get `sample_n` for free for that distribution.

    - 2. If we implement a new type of distribution that has a way to get $n$ samples fasater than calling sample $n$ times, we can override the method to use the faster algorithm.

    - If we made `sample_n` function we could get 1, but not 2. If we left `sample_n` as an abstract method, we'd get 2, but not 1. Having a non-abstract method on the abstract class gives us the best of both worlds.

### First-class functions

- One way we could work around this would be to represent functions as objects with a single method. We'd be able to pass them around just like normal values and, when we needed to actually perform the action or computation.

- First class functions give us a new way to abstract over computation. Methods let us talk about the same kind of behavior for differnt objects. 

- A simple example might be repeating the same action $n$ times. Without an abstraction, we might do this with a `for` loop:

```python
for _ in range(10):
    do_something()
```

- We could factor this logic into a function that look `n` and `do_something` as arguments:

```python
def repeat(action: Callable, n: int):
    for _ in range(n):
        actoin()

repeat(do_something, 10)
```

- If we wanted the type of a function that took an `int` and a `str` as input and returned a bool, we would write `Callable[[int, str], bool]`.

- Let's look at the expeted_value function we defined earlier.

In [39]:
# Let's look at the expeted_value function we defined earlier.
def expected_value(d: Distribution[float], n: int) -> float:
    return statistics.mean(d.sample() for _ in range(n))

- An alternative would be to provide the mapping as an argument to the `expected_value` function:

In [None]:
from typing import Callable

def expected_value(
    d: Distribution[A],
    f: Callable[[A], float],
    n: int
) -> float:
    return statistics.mean(f(d.sample()) for _ in range(n))

- It's the same `mean` calculation as previously, except we apply `f` to each outcome. This small changes has made the function far more flexible: we can now call `expected_value` on any sort of `Distribution`m bit just `Distribution[float]`.

```python
def payoff(coin: Coin) -> float:
    return 1.0 if coin == 'heads' else 0.0
```

- The idea key to renember is that functions are values that we can pass around or store just like any other object.

#### Lambdas

- Lambdas are function literals. We can write a `lambda` expression to get a function without giving it a name.

```python
expected_values(coin_flip, lambda coin: 1.0 if coin == "heads" else 0.0)
```

### Iterative algorithms

- For example, we can approximate the square root of $a$ by starting with some initial guess $x_0$ and repeatedly calculating $x_{n+1}$:

$$x_{n+1}=\frac{x_n+\frac{a}{x_n}}{2}$$

In [40]:
def sqrt(a: float) -> float:
    x = a / 2 # initial guess
    x_n = 0
    while abs(x_n - x) > 0.01:
        x_n = (x + (a / x)) / 2
    return x_n

- The first improvement we can make is to turn the `0.01` into an extra parameter.

In [41]:
def sqrt(a: float, threshold: float) -> float:
    x = a / 2   # initial guess
    x_n = 0
    while abs(x_n - x) > threshold:
        x_n = (x + (a / x)) / 2
    return x_n

#### Iterators and generators

- We need some way to abstract over iteration in some way that lets us separate producing values iteratively from cosumuing them.

- Luckily, Python provides powerful facitities for doing exactly this: iterators and generators. Iterators give us a way of consuming values and generators give us a way of producing values.

In [42]:
for x in [3, 2, 1]: print(x)
for x in {3, 2, 1}: print(x)
for x in range(3): print(x)

3
2
1
1
2
3
0
1
2


- Note how the iterator for the set({3, 2, 1}) prints 1 2 3 rather 3 2 1 - sets do not preserve the order in which elements are added.

- When we iterate over a dictionary, we will print the keys rather than the values because that is the default iterator. To get values or key-value pairs we'd need to use the `values` and `items` methods respectively.

In [43]:
d = {'a': 1, 'b': 2, 'c': 3}
for k in d: print(k)
for v in d.values(): print(v)
for k, v in d.items(): print(k, v)

a
b
c
1
2
3
a 1
b 2
c 3


- Python's `list` function can convert any iterator into a list.

In [44]:
range(5)

range(0, 5)

In [45]:
list(range(5))

[0, 1, 2, 3, 4]

- The Python standard library has a set of operations like this in the `itertools` modules. 

In [46]:
# itertools.takewhile lets us stop iterating as soon as some condition stops holding
import itertools

elements = [1, 3, 2, 5, 3]
list(itertools.takewhile(lambda x: x < 5, elements))

[1, 3, 2]

- How do we define our own iterators? In the most general sense, a Python Iterator is any object that implements a `__next__()` method.

- However, Python has a more convenient way to create an iterator by creating a generator using `yield` keyworld. `yield` acts similar to return from a function, except instead of stopping the function altogether, it outputs yielded value to an iterator and pauses the function until the yielded element is consumed by the caller.


- Instead of looping and stopping based on some condition, we'll write a version of `sqrt` that returns an iterator with each iteration of the algorithm as a value:

In [48]:
from collections.abc import Iterator

def sqrt(a: float) -> Iterator[float]:
    x = a /2    # initial guess
    while True:
        x = (x + (a / x)) / 2
        yield x

- With this version, we update $x$ at each iteration and then `yield` the updated value. The caller of the cunction gets an iterator that contains an infinete number of iterations. 

- We probably want the threshold-based convergence logic. Since we now have a first-class abstraction for iteration, we can write a general-purpose converge function that takes an iterator and returns a version of that same iterator that stops as soon as two values are sufficiently close.

In [47]:
from collections.abc import Iterator

def converge(values: Iterator[float], threshold: float) -> Iterator[float]:
    for a, b in itertools.pairwise(values):
        yield a 
        if abs(a - b) < threshold:
            break

- Each function takes an iterator as an input and returns an iterator as an output. We get a major advantage when it is iterator $\rightarrow$ iterator operations compose.

- We don't need to write a new version of `sqrt` or even `converge` to do this; instead, we can use converge with `itertools.islice`:

In [49]:
n = 10000000
results = converge(sqrt(n), 0.001)
capped_results = itertools.islice(results, 10000)

- This is a powerful programming style because we can write and test each opration - `sqrt`, `converge`, `islice` - in isolation and get complex behavior by combining them in the right way. If we were writing the same logic without iterators, we would need a single loop that calculated each step of `sqrt`, checked for convergence and kept a counter to stop after 10,000 steps.