You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reading [2021-01-05 15:52:36-08:00, revision 2f3d6e6] in 7.5. value iteration and 7.6. asynchronous value iteration.
Maybe I misunderstand, but it seems that the yellow-highlighted sections do exactly the same job. The rest being identical.
One uses list comprehension while the other uses a for loop.
But both actually iterate through the whole state space.
In GaussSeidelValueIteration, should not backup be applied only on a subset of the state space in each iteration?
PS: In addition a small detail: in the GaussSeidelValueIteration algorithm, many variables are instantiated from the P::MDP (S, A, T, R, γ = P.S, P.A, P.T, P.R, P.γ) but only P.S is used. In ValueIteration and PolicyIteration, only what is needed is instantiated, which makes it easier to understand I think.
The text was updated successfully, but these errors were encountered:
Thanks for filing this issue! You are right that we are pulling out way more than we need to from the problem structure. I'll fix this in my next commit. We want the algorithms as simple as possible, so thank you for pointing this out.
The algorithms are actually doing slightly different things. Value iteration calls backup on all of the states before updating U. Gauss Seidel updates U as it sweeps through the space (hence the need for the for loop).
reading [2021-01-05 15:52:36-08:00, revision 2f3d6e6] in
7.5. value iteration
and7.6. asynchronous value iteration
.Maybe I misunderstand, but it seems that the yellow-highlighted sections do exactly the same job. The rest being identical.
One uses list comprehension while the other uses a
for
loop.But both actually iterate through the whole state space.
In
GaussSeidelValueIteration
, should notbackup
be applied only on a subset of the state space in each iteration?PS: In addition a small detail: in the
GaussSeidelValueIteration
algorithm, many variables are instantiated from theP::MDP
(S, A, T, R, γ = P.S, P.A, P.T, P.R, P.γ
) but onlyP.S
is used. InValueIteration
andPolicyIteration
, only what is needed is instantiated, which makes it easier to understand I think.The text was updated successfully, but these errors were encountered: