Exercise 10.5 #93

ShawnHymel · 2022-08-04T19:50:51Z

I also found the wording of this question confusing. My best guess is to be "how would the differential TD(0) algorithm be different from tabular TD(0)?" Like you, I also came up with the update formula for the weight vector. (10.10) gives us the TD error, assuming we have the average reward estimate R_bar. From there, I think the only thing you're missing to create the differential TD(0) algorithm is the update for R_bar, which uses the TD error.

In tabular TD(0), we have a single line that updates V(S). For differential TD(0), I think we need to expand that to the following 3 lines to update the weights vector.

Let me know if you think that sounds reasonable.

Also, since you have done a lot of work to produce these solutions, you might want to see if Rich Sutton would honor the offer to provide book solutions if you email him your answers :) He said he would on his site! http://incompleteideas.net/book/solutions.html. Your answers have been invaluable as I work through the textbook, and I'd also be curious to know how close you are to the book solutions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exercise 10.5 #93

Exercise 10.5 #93

ShawnHymel commented Aug 4, 2022 •

edited

Exercise 10.5 #93

Exercise 10.5 #93

Comments

ShawnHymel commented Aug 4, 2022 • edited

ShawnHymel commented Aug 4, 2022 •

edited