Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GaussSeidelValueIteration and ValueIteration algorithms are identical? #35

Closed
chauvinSimon opened this issue Jan 19, 2021 · 1 comment
Closed

Comments

@chauvinSimon
Copy link

reading [2021-01-05 15:52:36-08:00, revision 2f3d6e6] in 7.5. value iteration and 7.6. asynchronous value iteration.

GaussSeidelValueIteration

Maybe I misunderstand, but it seems that the yellow-highlighted sections do exactly the same job. The rest being identical.
One uses list comprehension while the other uses a for loop.
But both actually iterate through the whole state space.

In GaussSeidelValueIteration, should not backup be applied only on a subset of the state space in each iteration?

PS: In addition a small detail: in the GaussSeidelValueIteration algorithm, many variables are instantiated from the P::MDP (S, A, T, R, γ = P.S, P.A, P.T, P.R, P.γ) but only P.S is used. In ValueIteration and PolicyIteration, only what is needed is instantiated, which makes it easier to understand I think.

@mykelk
Copy link
Contributor

mykelk commented Jan 19, 2021

Thanks for filing this issue! You are right that we are pulling out way more than we need to from the problem structure. I'll fix this in my next commit. We want the algorithms as simple as possible, so thank you for pointing this out.

The algorithms are actually doing slightly different things. Value iteration calls backup on all of the states before updating U. Gauss Seidel updates U as it sweeps through the space (hence the need for the for loop).

@mykelk mykelk closed this as completed Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants