GaussSeidelValueIteration and ValueIteration algorithms are identical? #35

chauvinSimon · 2021-01-19T11:32:30Z

reading [2021-01-05 15:52:36-08:00, revision 2f3d6e6] in 7.5. value iteration and 7.6. asynchronous value iteration.

Maybe I misunderstand, but it seems that the yellow-highlighted sections do exactly the same job. The rest being identical.
One uses list comprehension while the other uses a for loop.
But both actually iterate through the whole state space.

In GaussSeidelValueIteration, should not backup be applied only on a subset of the state space in each iteration?

PS: In addition a small detail: in the GaussSeidelValueIteration algorithm, many variables are instantiated from the P::MDP (S, A, T, R, γ = P.S, P.A, P.T, P.R, P.γ) but only P.S is used. In ValueIteration and PolicyIteration, only what is needed is instantiated, which makes it easier to understand I think.

The text was updated successfully, but these errors were encountered:

mykelk · 2021-01-19T15:45:43Z

Thanks for filing this issue! You are right that we are pulling out way more than we need to from the problem structure. I'll fix this in my next commit. We want the algorithms as simple as possible, so thank you for pointing this out.

The algorithms are actually doing slightly different things. Value iteration calls backup on all of the states before updating U. Gauss Seidel updates U as it sweeps through the space (hence the need for the for loop).

mykelk closed this as completed Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GaussSeidelValueIteration and ValueIteration algorithms are identical? #35

GaussSeidelValueIteration and ValueIteration algorithms are identical? #35

chauvinSimon commented Jan 19, 2021

mykelk commented Jan 19, 2021

GaussSeidelValueIteration and ValueIteration algorithms are identical? #35

GaussSeidelValueIteration and ValueIteration algorithms are identical? #35

Comments

chauvinSimon commented Jan 19, 2021

mykelk commented Jan 19, 2021