Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCTS moveToLeaf() loops for ever #14

Closed
fmicheloni opened this issue Apr 7, 2018 · 0 comments
Closed

MCTS moveToLeaf() loops for ever #14

fmicheloni opened this issue Apr 7, 2018 · 0 comments

Comments

@fmicheloni
Copy link

fmicheloni commented Apr 7, 2018

Hello everybody,

I'm trying to reuse this codebase for another game in which pieces can move freely on the board. At a certain point, during the first game, the method moveToLeaf() in the class MCTS starts looping for ever.

It seems like the condition while not currentNode.isLeaf(): is never satisfied.

Do you have any hint for finding why this issue occurs?

Thank in advance,

Fabrizio

P.S.: see my fork for full code -> https://github.com/fmicheloni/DeepReinforcementLearning


EDIT:

Here a few logs of what's happening:

2018-04-07 13:57:44,052 INFO PLAYER TURN...-1
2018-04-07 13:57:44,052 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action with highest Q + U...192

2018-04-07 13:57:44,053 INFO PLAYER TURN...-1
2018-04-07 13:57:44,053 INFO action: 190 (1)... N = 0, P = 0.209210, nu = 0.000000, adjP = 0.209210, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,053 INFO action: 102 (4)... N = 0, P = 0.196694, nu = 0.000000, adjP = 0.196694, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 118 (6)... N = 0, P = 0.182340, nu = 0.000000, adjP = 0.182340, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 122 (3)... N = 0, P = 0.205482, nu = 0.000000, adjP = 0.205482, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 123 (4)... N = 0, P = 0.206274, nu = 0.000000, adjP = 0.206274, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action with highest Q + U...190

2018-04-07 13:57:44,054 INFO PLAYER TURN...-1
2018-04-07 13:57:44,054 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,054 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action with highest Q + U...192

2018-04-07 13:57:44,055 INFO PLAYER TURN...-1
2018-04-07 13:57:44,055 INFO action: 190 (1)... N = 0, P = 0.209210, nu = 0.000000, adjP = 0.209210, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 102 (4)... N = 0, P = 0.196694, nu = 0.000000, adjP = 0.196694, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 118 (6)... N = 0, P = 0.182340, nu = 0.000000, adjP = 0.182340, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,055 INFO action: 122 (3)... N = 0, P = 0.205482, nu = 0.000000, adjP = 0.205482, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 123 (4)... N = 0, P = 0.206274, nu = 0.000000, adjP = 0.206274, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action with highest Q + U...190

2018-04-07 13:57:44,056 INFO PLAYER TURN...-1
2018-04-07 13:57:44,056 INFO action: 192 (3)... N = 0, P = 0.235297, nu = 0.000000, adjP = 0.235297, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 211 (1)... N = 0, P = 0.257799, nu = 0.000000, adjP = 0.257799, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 122 (3)... N = 0, P = 0.252349, nu = 0.000000, adjP = 0.252349, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action: 123 (4)... N = 0, P = 0.254554, nu = 0.000000, adjP = 0.254554, W = 0.000000, Q = 0.000000, U = 0.000000, Q+U = 0.000000
2018-04-07 13:57:44,056 INFO action with highest Q + U...192

Those two actions keep looping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant