Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

homework2-MDPs 106062510 #36

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,196 changes: 1,196 additions & 0 deletions .ipynb_checkpoints/Lab2-MDPs-checkpoint.ipynb

Large diffs are not rendered by default.

196 changes: 139 additions & 57 deletions Lab2-MDPs.ipynb

Large diffs are not rendered by default.

Binary file added imgs/problem1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/problem1_code.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/problem2_1_code.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/problem2_2_code.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/problem3_code.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/problem3_code_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/problem3_code_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 20 additions & 0 deletions report.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,23 @@
# Homework1 report

TA: try to elaborate the algorithms that you implemented and any details worth mentioned.
## Problem 1: implement value iteration
<p>First, I'm required to implement value iteration(value update, Bellman update/back-up) which was mentioned in the class. </p>
<img src='./imgs/problem1.png' width = '50%' height = '50%'>
<p>Here's my implementation</p>
<img src='./imgs/problem1_code.png' width='50%' height = '50%'>
## Problem 2: Policy Iteration
### Problem 2a: state value function
<p>Here, we're going to the exact value function. First find a(idendity matrix - gamma*P) and b(sum of P*R), then use numpy.linalg.solve to find the value function V</p>
<img src='./imgs/problem2_1_code.png' width='50%' height = '50%'>
### Problem 2b: state-action value function
<p>Since we have the state value function, then we can calculate state-action value function which is denoted as Qpi</p>
<img src='./imgs/problem2_2_code.png' width='50%' height = '50%'>
## Problem 3: Sampling-based Tabular Q-Learning
<p>If the environment is given as a blackbox physics simulator, then we won't be able to read off the whole transition model.So in this problem we're going to solve using sampling-based tabular Q-learning.</p>
<p>First, we randomly explore(select actions) the environmen5t or take some action based on existing experiences(greedy)</p>
<img src='./imgs/problem3_code.png' width='50%' height = '50%'>
<p>Then implement the Q-balue update function</p>
<img src='./imgs/problem3_code_2.png' width='50%' height = '50%'>
<p>Finally, we can combine them together to complete the agent. For each iteration, we make action, get the reward from environment, update Q-function, update current state, ... next iteration</p>
<img src='./imgs/problem3_code_3.png' width='50%' height = '50%'>