Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 强化学习.md #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/强化学习.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@

### DQN的两个关键trick分别是什么?

- [ ] TODO
- Replay buffer: 经验回放,训练过程中从经验池中随机采样更新网络参数,一方面打破样本相关性,另一方面提高样本效率(一个样本可能多次参与网络参数更新)
- Fixed Q-targets: 在更新Q网络参数时,用以计算$q_{target}$的网络参数是上一次迭代前的网络参数$\theta_{i-1}$,当前q值是根据网络参数为$\theta_{i}$的Q网络得出,这也是一种打乱相关性的机理。

### DQN 都有哪些变种?DQN有哪些改进方向?

Expand Down