Potential problems

Edit: turned it into a general thread instead

1. The AGZ spreadsheet mentions only one filter for the value head. In this implementation, two filters are used. Any reason to it? I don't think it's going to have a big impact, but I'm just putting it out there.

2. The target policies that are created during simulated games are taken from the prior probabilities p. These are calculated by the neural net.  From the AGZ cheatsheet I believe that the target policies should instead be the search probabilities, which are given by the number of visits of a move and the temperature parameter.

Some notes:
1. During MCTS search, there are lots of zero Q-values and often patches of Q-values that are almost 1 appear. (This might just be due to a bad network)

2. The MCTS batched search yields more Q-values, but the search depth will be considerably lowered. Chosen moves are only at max depth 4 from the current position and usually 2 or 3. Running 64 simulations with batch size 1 can give chosen moves with up to depth 66 from the current position, but of course, it will be slower. Unsure on what is a good balance. Hard to tune.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential problems #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential problems #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions