# **3.a.i - 3.a.iii**

Q-learning-lambda has been implemented and the outputs for 2.i - 2.iii can be viewed [HERE](./sample_output/q_lambda/)

The lambda algorithms take a very long time to complete. You can modify the number of iterations or modify the state space size to shorten the time in globals.py


Unfotunately, the lambda algorithms require iterating through a large state space for each step. When using the original problems, it would take nearly a minute to move on step. So you can reduce the state space if you want
as long as you change these variables in globals.py:
```
NFLOORS = 3
START_FLOORS = [1]
START_PROB = [1]
EXIT_FLOORS = [2,3]
EXIT_PROB = [.5, .5]
FLOORS = [1, 2, 3]
```


# **3.b.i - 3.b.iii**

SARSA-lambda has been implemented and the outputs for 2.i - 2.iii can be viewed [here](./sample_output/q_sarsa_lambda/)

Unfotunately, the lambda algorithms require iterating through a large state space for each step. When using the original problems, it would take nearly a minute to move on step. So you can reduce the state space if you want
as long as you change these variables in globals.py:

```
NFLOORS = 3
START_FLOORS = [1]
START_PROB = [1]
EXIT_FLOORS = [2,3]
EXIT_PROB = [.5, .5]
FLOORS = [1, 2, 3]
```


# **3.c.i - 3.c.iii**

### **The issue with Eligibility traces for this problem**
The eligbility traces absolutely tanked the learning performance. To elucidate, for $Sarsa(\lambda)$ or $Q(\lambda)$ learning, we initialize an eligibility trace that is the same size as the Q table to keep a recird of the occurence of the state-action pairs:

$|e(s,a)| = |Q(s,a)|$

And recall, given our state space and action space, we have a Q table with a large amount of state-action pairs:

$|Q\_table| = |State\_space| * |Action\_space| = 3111696 * 16 = 49,787,136$

So this means we have two tables of size 49,787,136. This isnt a problem alone with initialization or updates, since disctionaries in python have a fast lookup speed, however, the issue lies in the latter half of the $\lambda$ algorithms:

```
for all s
    for all a
        update Q
        update E
```

This requires us to loop through the entire set of state-action pairs, which is of size 49,787,136, for every step just to update our Q and Eligibility tables. For my PC, this tooknabout 30 seconds each step, making my agent learning, given the state space size, practically impossisble. 

### **Possible Solutions?**
One way would be to reduce the size fo the state space. I made it possible for the user to modify the problem desctiption so they could use less floors if they wanted to for problems 2.i and 2.ii in the globals.py file:

```
NFLOORS = 3
START_FLOORS = [1]
START_PROB = [1]
EXIT_FLOORS = [2,3]
EXIT_PROB = [.5, .5]
FLOORS = [1, 2, 3]
```
This would reduce the problem to 3 floors instead of 6, which would make the learning 37 x faster. For 2.iii, you would need to modify environment code beyond on top of modifying the globals.py


You could also use helper classes to keep track of state information rather than using the state space. For example, instead of keeping track of passengers in the satte space, you could just make a class, which is just as valid, as long as you define this class mathematically to be apart of you state space.

### **Algorithm Learning Curves**

To get any useful inforamtion, I chose to reduce the state space size and reduce the number of iterations. Here is the alpha rewards graph for 2.i for Q lambda and SARSA lambda, respectively:

<img src="./sample_output/q_lambda/avg_rewards_alpha_2i.png" width="400"/>
<img src="./sample_output/sarsa_lambda/avg_rewards_alpha_2i.png" width="400"/>


In conclusion, I wouldnt recommend using the lambda algorithms for large states spaces unless you have used methods to reduce the effect of the curse of dimensionality, otherwise, your learning time will be substantionally increased.