## Salmon Fishing MDP

Creating the following MDP

| $s$    | $a$      | $s^{\prime}$ | $T(s,a,s^{\prime})$ | $R(s,a,s^{\prime})$ |
|--------|----------|--------------|---------------------|---------------------|
| Empty  | Re-breed | Empty        | 0.05                | -4                  |
| Empty  | Re-breed | Low          | 0.85                | -4                  |
| Empty  | Re-breed | Medium       | 0.10                | -4                  |
| Empty  | Not Fish | Empty        | 1                   | 0                   |
| Empty  | Fish     | Empty        | 1                   | 0                   |
| Low    | Re-Breed | Medium       | 0.80                | -2.50               |
| Low    | Re-Breed | High         | 0.20                | -2.50               |
| Low    | Not Fish | Empty        | 0.05                | 0                   |
| Low    | Not Fish | Low          | 0.70                | 0                   |
| Low    | Not Fish | Medium       | 0.25                | 0                   |
| Low    | Fish     | Empty        | 0.65                | 0.60                |
| Low    | Fish     | Low          | 0.35                | 0.60                |
| Medium | Re-Breed | High         | 1                   | -1                  |
| Medium | Not Fish | Low          | 0.05                | 0                   |
| Medium | Not Fish | Medium       | 0.90                | 0                   |
| Medium | Not Fish | High         | 0.05                | 0                   |
| Medium | Fish     | Low          | 0.35                | 1.70                |
| Medium | Fish     | Medium       | 0.60                | 1.70                |
| Medium | Fish     | High         | 0.05                | 1.70                |
| High   | Re-Breed | High         | 1                   | -0.5                |
| High   | Not Fish | High         | 0.80                | 0                   |
| High   | Not Fish | Medium       | 0.20                | 0                   |
| High   | Fish     | Medium       | 0.65                | 2.5                 |
| High   | Fish     | High         | 0.35                | 2.5                 |



In [7]:
# creating a dictionary of the transitions probabilities and
# rewards for each action state pair
from Actions import ReBreed, Fish, NotFish
reBreedAction = ReBreed()
notFishAction = NotFish()
fishAction    = Fish()
availableActions = [reBreedAction, notFishAction, fishAction]
availableStates = ["empty", "low", "medium", "high"]

### Value Iteration and policy selection

Below we perform value iteration and optimum policy selection for the simple states Empty, Low, Medium, and High, with the defined actions. 

In [22]:
import numpy as np
num_iteration = 50
gammaToVMap = {}
gamma = np.arange(0.1,1,0.1)
gamma = np.append(gamma, [0.95,0.99])
print(gamma)
for g in gamma:
    V = {
    "empty" : (0,0), #expected response of a reward and action
    "low"   : (0,0),
    "medium": (0,0),
    "high"  : (0,0)
    }
    for i in range(num_iteration):
        for currState in availableStates:
            Vs_i = 0
            maxAction = None
            for currAction in availableActions:
                sPrimeMap = currAction.getTransitionAndRewardProbabilities(currState)
                currV = 0
                for nextState, probabilities in sPrimeMap.items():
                    t, r = probabilities
                    currV += t*(r + g*V[nextState][0])
                if currV>Vs_i:
                    Vs_i = currV
                    maxAction = currAction
            V[currState] = (Vs_i, maxAction)
    gammaToVMap[g] = V



[0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  0.95 0.99]


In [27]:
gammaToVMap[gamma[5]]

{'empty': (0, None),
 'low': (0.8131448865916906, <Actions.NotFish at 0x1064f0610>),
 'medium': (3.1441602281546004, <Actions.Fish at 0x1064f3df0>),
 'high': (4.716737327823155, <Actions.Fish at 0x1064f3df0>)}

### Extending the problem

Now lets expand the problem in the following states

* **States** 
  * Let $\mathbf{S}$ be the set \{0, 1000, 2000, ..., 50000\} and $s \in \mathbf{s}$
* **Actions**
  * Let the action of rebreeding increase the population of fish by the population growth function $\frac{KP_0e^{rt}}{K + P_0(e^{rt}-1)}$, where K is the carrying cappacity (50,000), r is the growth rate, and t is the time
  * The action of fishing causes the population to decrease by half
  * The action of not fishing causes the population to change to $\mathcal{N}(P_0,500)$