Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

103000096 HW2 #46

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 105 additions & 59 deletions Lab2-MDPs.ipynb

Large diffs are not rendered by default.

Binary file added imgs/initial.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/policy iteration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/policy-iteration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/value iteration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/value-iteration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 11 additions & 2 deletions report.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
# Homework1 report
# Homework2 report
In this lab, I implement some parts of value iteration, policy iteratioin and tabular Q-learning. Following is what I think worth mentioned.

TA: try to elaborate the algorithms that you implemented and any details worth mentioned.
1. In problem 1, I use three loops to update the value function. The first two loops are for actions and states, the last loop is to iterate the list of next states possibility tuple. In problem 2 policy iteration, evaluation of the state function only requires two loops. Therefore, for problems have large action space, state function might be a good choice for speech consideration.

2. For this problem setting, as shown below, policy iteration converges faster than value iteration. I'm not sure if this holds for other problems. Figure in the left is for value iteration, figure in the right is for policy iteration.
![alt text](imgs/value-iteration.png)
![alt text](imgs/policy-iteration.png)

3. The reward of each state and optimal policy for value iteration could be visualized as following.
![alt text](imgs/initial.png)
![alt text](imgs/result.png)