CEDL2017 · panzhufeng · Jan 19, 2018 · Jan 19, 2018 · Jan 19, 2018 · Jan 19, 2018
diff --git a/Lab2-MDPs.ipynb b/Lab2-MDPs.ipynb
diff --git a/imgs/initial.png b/imgs/initial.png
diff --git a/imgs/policy iteration.png b/imgs/policy iteration.png
diff --git a/imgs/policy-iteration.png b/imgs/policy-iteration.png
diff --git a/imgs/result.png b/imgs/result.png
diff --git a/imgs/value iteration.png b/imgs/value iteration.png
diff --git a/imgs/value-iteration.png b/imgs/value-iteration.png
diff --git a/report.md b/report.md
@@ -1,3 +1,12 @@
-# Homework1 report
+# Homework2 report
+In this lab, I implement some parts of value iteration, policy iteratioin and tabular Q-learning. Following is what I think worth mentioned.
 
-TA: try to elaborate the algorithms that you implemented and any details worth mentioned.
+1. In problem 1, I use three loops to update the value function. The first two loops are for actions and states, the last loop is to iterate the list of next states possibility tuple. In problem 2 policy iteration, evaluation of the state function only requires two loops. Therefore, for problems have large action space, state function might be a good choice for speech consideration.
+
+2. For this problem setting, as shown below, policy iteration converges faster than value iteration. I'm not sure if this holds for other problems. Figure in the left is for value iteration, figure in the right is for policy iteration.
+![alt text](imgs/value-iteration.png) 
+![alt text](imgs/policy-iteration.png)
+
+3. The reward of each state and optimal policy for value iteration could be visualized as following.
+![alt text](imgs/initial.png)
+![alt text](imgs/result.png)