Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exercise 10.5 #93

Open
ShawnHymel opened this issue Aug 4, 2022 · 0 comments
Open

Exercise 10.5 #93

ShawnHymel opened this issue Aug 4, 2022 · 0 comments

Comments

@ShawnHymel
Copy link

ShawnHymel commented Aug 4, 2022

I also found the wording of this question confusing. My best guess is to be "how would the differential TD(0) algorithm be different from tabular TD(0)?" Like you, I also came up with the update formula for the weight vector. (10.10) gives us the TD error, assuming we have the average reward estimate R_bar. From there, I think the only thing you're missing to create the differential TD(0) algorithm is the update for R_bar, which uses the TD error.

In tabular TD(0), we have a single line that updates V(S). For differential TD(0), I think we need to expand that to the following 3 lines to update the weights vector.

IMG-0032

Let me know if you think that sounds reasonable.

Also, since you have done a lot of work to produce these solutions, you might want to see if Rich Sutton would honor the offer to provide book solutions if you email him your answers :) He said he would on his site! http://incompleteideas.net/book/solutions.html. Your answers have been invaluable as I work through the textbook, and I'd also be curious to know how close you are to the book solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant