Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

笔记提出疑问 #3

Closed
FulChou opened this issue Oct 25, 2020 · 3 comments
Closed

笔记提出疑问 #3

FulChou opened this issue Oct 25, 2020 · 3 comments

Comments

@FulChou
Copy link

FulChou commented Oct 25, 2020

image

这里的表述好像有点问题,不过从前面也能够大抵理解意思。可能有漏字的现象
不理解的地方是:
一个好的 policy π 能够 绝大多数 V(s) 达到很大,这个可以理解。但是能否解释一下,为什么有一个π能够让所有的V(s)最大吗? 我好像明白了,也就是每一步我的action 都是最优的,那么我就能保证每一个V(s)都是极大的。但是这样显然有个问题。 就是我可能为了以后某个S的value极大,暂时放弃目前的最优action。 也就是说,不能够一味的采取贪婪的策略。 所以我觉得表述让每个状态 V(state) 最大让我有点困惑。

@qiwang067
Copy link
Contributor

qiwang067 commented Oct 25, 2020

感谢提问,建议您看下本章节的 value iteration,里面有具体的过程,应该能解答您的疑惑

@FulChou
Copy link
Author

FulChou commented Oct 25, 2020

我明白了,多谢

@qiwang067
Copy link
Contributor

不客气

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants