Exercise4.9 #81

soonjune · 2021-04-12T11:54:00Z

I think there is something wrong with the implementation. The optimal policy should be 1 for all states when p_h = 0.55. There is a big bet around 80 and I cannot find any reason for this behavior.

ufownl · 2021-04-17T17:04:17Z

I got the same result as @soonjune .

GeorgGroenendaal · 2022-10-11T15:23:54Z

Does anybody know why this happens, I get similar results with this implementation and my own. Is this because it does not result in a stable solution?

StupidI · 2022-11-10T03:47:13Z

Does anybody know why this happens, I get similar results with this implementation and my own. Is this because it does not result in a stable solution?

Due to the limited accuracy, it is not reliable to compare equality, some value is actually the same. So we should limit the precision when we compare the value of q(s,a)——> just change the part of "np.argmax"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exercise4.9 #81

Exercise4.9 #81

soonjune commented Apr 12, 2021

ufownl commented Apr 17, 2021

GeorgGroenendaal commented Oct 11, 2022 •

edited

StupidI commented Nov 10, 2022

Exercise4.9 #81

Exercise4.9 #81

Comments

soonjune commented Apr 12, 2021

ufownl commented Apr 17, 2021

GeorgGroenendaal commented Oct 11, 2022 • edited

StupidI commented Nov 10, 2022

GeorgGroenendaal commented Oct 11, 2022 •

edited