Tweaked UCB calculation for uniform exploratin of actions in vanilla MCTS #99

BoZenKhaa · 2022-09-20T13:36:28Z

With the current UCB implementation in vanilla MCTS, the first two iterations explore the same action twice:
[from the example notebook with n_iter=2]

This is because the current implementation will set UCB value of an action to the default q value when sn == 1 && n(sanode) == 0 (parent node visited once and action not yet used). This means that if the action used in a state during the first iteration returned a positive reward, it would be picked again in the second iteration since it has a higher value than the unexplored actions.

I think the usual UCB implementation first explores all available actions in some order before focusing on a specific action.

With this patch, this seems to be happening:
[same as previous, but with the patch applied]

codecov · 2022-09-20T13:44:45Z

Codecov Report

Base: 86.12% // Head: 86.62% // Increases project coverage by +0.50% 🎉

Coverage data is based on head (95ef9e1) compared to base (5984055).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #99      +/-   ##
==========================================
+ Coverage   86.12%   86.62%   +0.50%     
==========================================
  Files          10       10              
  Lines         490      486       -4     
==========================================
- Hits          422      421       -1     
+ Misses         68       65       -3

Impacted Files	Coverage Δ
src/vanilla.jl	`90.27% <100.00%> (+1.76%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

BoZenKhaa · 2022-09-20T14:08:39Z

I think the conditions could be further simplified such as

        if  n(sanode) == 0
            return sanode # if unexplored action is found, return it
        else
            UCB = q(sanode) + c*sqrt(log(sn)/n(sanode))
        end

and moving the c==0 outside of the UCB...

So all together:

function best_sanode_UCB(snode::StateNode, c::Float64)
    if c==0
        return argmax(q, children(snode))
    end

    best_UCB = -Inf
    best=first(children(snode))
    sn = total_n(snode)
    for sanode in children(snode)
        if n(sanode) == 0
            return sanode # if unexplored action is found, return it
        else
            UCB = q(sanode) + c*sqrt(log(sn)/n(sanode))
        end
	
        if UCB > best_UCB
            best_UCB = UCB
            best = sanode
        end
    end
    return best
end

zsunberg

This looks good to me. Thanks!

BoZenKhaa and others added 2 commits September 20, 2022 15:11

Tweaked UCB calculation for uniform exploratin of actions

59fc79d

Merge branch 'JuliaPOMDP:master' into master

15bc94f

BoZenKhaa added 3 commits September 20, 2022 22:06

modified UCB function

e60decd

Merge branch 'master' of https://github.com/JuliaPOMDP/MCTS.jl

bd9d9b2

Merge branch 'master' of github.com:BoZenKhaa/MCTS.jl

55ebc9f

BoZenKhaa requested a review from zsunberg September 20, 2022 20:09

formatting

95ef9e1

zsunberg approved these changes Sep 25, 2022

View reviewed changes

zsunberg merged commit dd3531b into JuliaPOMDP:master Sep 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweaked UCB calculation for uniform exploratin of actions in vanilla MCTS #99

Tweaked UCB calculation for uniform exploratin of actions in vanilla MCTS #99

BoZenKhaa commented Sep 20, 2022 •

edited

Loading

codecov bot commented Sep 20, 2022 •

edited

Loading

BoZenKhaa commented Sep 20, 2022

zsunberg left a comment

Tweaked UCB calculation for uniform exploratin of actions in vanilla MCTS #99

Tweaked UCB calculation for uniform exploratin of actions in vanilla MCTS #99

Conversation

BoZenKhaa commented Sep 20, 2022 • edited Loading

codecov bot commented Sep 20, 2022 • edited Loading

Codecov Report

BoZenKhaa commented Sep 20, 2022

zsunberg left a comment

Choose a reason for hiding this comment

BoZenKhaa commented Sep 20, 2022 •

edited

Loading

codecov bot commented Sep 20, 2022 •

edited

Loading