-
Notifications
You must be signed in to change notification settings - Fork 8
Asking DQN-MCTS baseline code #2
Comments
Hi @WenyuHan-LiNa, thanks for your interest in our work! Unfortunately I am not able to share the code for our MCTS implementation, but if you just want to do a standard DQN plus MCTS, that should be fairly straightforward to set up if you have (1) a standard DQN implementation and (2) a standard MCTS implementation, both of which you should be able to find multiple examples of elsewhere online. The main thing you will need to do is to modify the MCTS code to call your neural network at each node to estimate the Q-values, and then to use the action returned by the search rather than the one corresponding to the maximal Q-value. We tried to provide a lot of details in the appendices of both https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular Algorithm A.1). If you have any specific questions I am happy to try to clarify! |
Hi Jessica,
Thank you for your reply. I will try to implement myself. The information
you provide is very useful to me.
Best regards,
Wenyu Han
…On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick ***@***.***> wrote:
Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your
interest in our work!
Unfortunately I am not able to share the code for our MCTS implementation,
but if you just want to do a standard DQN plus MCTS, that should be fairly
straightforward to set up if you have (1) a standard DQN implementation and
(2) a standard MCTS implementation, both of which you should be able to
find multiple examples of elsewhere online. The main thing you will need to
do is to modify the MCTS code to call your neural network at each node to
estimate the Q-values, and then to use the action returned by the search
rather than the one corresponding to the maximal Q-value. We tried to
provide a lot of details in the appendices of both
https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and
https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular
Algorithm A.1).
If you have any specific questions I am happy to try to clarify!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ>
.
|
Hi Jessica,
Did you do Rollout in MCTS or just assign value for each node based on the
Q network. Because in the standard MCTS, value for each node is assigned
when finishing the rollout at each iteration of MCTS. This makes me
confused about that.
Best,
Wenyu Han
…On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote:
Hi Jessica,
Thank you for your reply. I will try to implement myself. The information
you provide is very useful to me.
Best regards,
Wenyu Han
On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick <
***@***.***> wrote:
> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your
> interest in our work!
>
> Unfortunately I am not able to share the code for our MCTS
> implementation, but if you just want to do a standard DQN plus MCTS, that
> should be fairly straightforward to set up if you have (1) a standard DQN
> implementation and (2) a standard MCTS implementation, both of which you
> should be able to find multiple examples of elsewhere online. The main
> thing you will need to do is to modify the MCTS code to call your neural
> network at each node to estimate the Q-values, and then to use the action
> returned by the search rather than the one corresponding to the maximal
> Q-value. We tried to provide a lot of details in the appendices of both
> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and
> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular
> Algorithm A.1).
>
> If you have any specific questions I am happy to try to clarify!
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ>
> .
>
|
Hi Wenyu, we indeed just used the value from the Q-network and did not
perform rollouts during MCTS. This is similar to how it's done by other
neurally guided forms of MCTS like AlphaZero.
Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49:
… Hi Jessica,
Did you do Rollout in MCTS or just assign value for each node based on the
Q network. Because in the standard MCTS, value for each node is assigned
when finishing the rollout at each iteration of MCTS. This makes me
confused about that.
Best,
Wenyu Han
On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote:
> Hi Jessica,
>
> Thank you for your reply. I will try to implement myself. The information
> you provide is very useful to me.
>
> Best regards,
> Wenyu Han
>
> On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick <
> ***@***.***> wrote:
>
>> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your
>> interest in our work!
>>
>> Unfortunately I am not able to share the code for our MCTS
>> implementation, but if you just want to do a standard DQN plus MCTS,
that
>> should be fairly straightforward to set up if you have (1) a standard
DQN
>> implementation and (2) a standard MCTS implementation, both of which you
>> should be able to find multiple examples of elsewhere online. The main
>> thing you will need to do is to modify the MCTS code to call your neural
>> network at each node to estimate the Q-values, and then to use the
action
>> returned by the search rather than the one corresponding to the maximal
>> Q-value. We tried to provide a lot of details in the appendices of both
>> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and
>> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular
>> Algorithm A.1).
>>
>> If you have any specific questions I am happy to try to clarify!
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <
#2 (comment)
>,
>> or unsubscribe
>> <
https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ
>
>> .
>>
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ>
.
|
Hi Jessica,
Thank you for sharing this information. For training the DQN network, we
still follow the standard way (means sample from replay buffer and update Q
network), right?
Best,
Wenyu Han
On Sat, May 29, 2021 at 11:47 AM Jessica B. Hamrick <
***@***.***> wrote:
… Hi Wenyu, we indeed just used the value from the Q-network and did not
perform rollouts during MCTS. This is similar to how it's done by other
neurally guided forms of MCTS like AlphaZero.
Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49:
> Hi Jessica,
>
> Did you do Rollout in MCTS or just assign value for each node based on
the
> Q network. Because in the standard MCTS, value for each node is assigned
> when finishing the rollout at each iteration of MCTS. This makes me
> confused about that.
>
> Best,
> Wenyu Han
>
> On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote:
>
> > Hi Jessica,
> >
> > Thank you for your reply. I will try to implement myself. The
information
> > you provide is very useful to me.
> >
> > Best regards,
> > Wenyu Han
> >
> > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick <
> > ***@***.***> wrote:
> >
> >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your
> >> interest in our work!
> >>
> >> Unfortunately I am not able to share the code for our MCTS
> >> implementation, but if you just want to do a standard DQN plus MCTS,
> that
> >> should be fairly straightforward to set up if you have (1) a standard
> DQN
> >> implementation and (2) a standard MCTS implementation, both of which
you
> >> should be able to find multiple examples of elsewhere online. The main
> >> thing you will need to do is to modify the MCTS code to call your
neural
> >> network at each node to estimate the Q-values, and then to use the
> action
> >> returned by the search rather than the one corresponding to the
maximal
> >> Q-value. We tried to provide a lot of details in the appendices of
both
> >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and
> >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in
particular
> >> Algorithm A.1).
> >>
> >> If you have any specific questions I am happy to try to clarify!
> >>
> >> —
> >> You are receiving this because you were mentioned.
> >> Reply to this email directly, view it on GitHub
> >> <
>
#2 (comment)
> >,
> >> or unsubscribe
> >> <
>
https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ
> >
> >> .
> >>
> >
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <
#2 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANKETMSZRJE7KCBQJZWRPP3TQED7NANCNFSM45TPQCLQ>
.
|
Yes, that's how we did it in the original construction paper ("Structured
Agents for Physical Construction"). However, we also later found using SAVE
(https://arxiv.org/abs/1912.02807) worked much better than just pure
Q-learning. SAVE works by also adding an additional "amortization loss" to
the standard Q-learning loss. It's probably easier to start with Q-learning
but I'd encourage you to try the amortization loss too (it's a pretty
straightforward change---you just need to store the Q-values computed by
MCTS into the replay buffer).
…On Sat, May 29, 2021 at 8:37 PM Wenyu Han ***@***.***> wrote:
Hi Jessica,
Thank you for sharing this information. For training the DQN network, we
still follow the standard way (means sample from replay buffer and update Q
network), right?
Best,
Wenyu Han
On Sat, May 29, 2021 at 11:47 AM Jessica B. Hamrick <
***@***.***> wrote:
> Hi Wenyu, we indeed just used the value from the Q-network and did not
> perform rollouts during MCTS. This is similar to how it's done by other
> neurally guided forms of MCTS like AlphaZero.
>
> Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49:
>
> > Hi Jessica,
> >
> > Did you do Rollout in MCTS or just assign value for each node based on
> the
> > Q network. Because in the standard MCTS, value for each node is
assigned
> > when finishing the rollout at each iteration of MCTS. This makes me
> > confused about that.
> >
> > Best,
> > Wenyu Han
> >
> > On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote:
> >
> > > Hi Jessica,
> > >
> > > Thank you for your reply. I will try to implement myself. The
> information
> > > you provide is very useful to me.
> > >
> > > Best regards,
> > > Wenyu Han
> > >
> > > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick <
> > > ***@***.***> wrote:
> > >
> > >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for
your
> > >> interest in our work!
> > >>
> > >> Unfortunately I am not able to share the code for our MCTS
> > >> implementation, but if you just want to do a standard DQN plus MCTS,
> > that
> > >> should be fairly straightforward to set up if you have (1) a
standard
> > DQN
> > >> implementation and (2) a standard MCTS implementation, both of which
> you
> > >> should be able to find multiple examples of elsewhere online. The
main
> > >> thing you will need to do is to modify the MCTS code to call your
> neural
> > >> network at each node to estimate the Q-values, and then to use the
> > action
> > >> returned by the search rather than the one corresponding to the
> maximal
> > >> Q-value. We tried to provide a lot of details in the appendices of
> both
> > >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and
> > >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in
> particular
> > >> Algorithm A.1).
> > >>
> > >> If you have any specific questions I am happy to try to clarify!
> > >>
> > >> —
> > >> You are receiving this because you were mentioned.
> > >> Reply to this email directly, view it on GitHub
> > >> <
> >
>
#2 (comment)
> > >,
> > >> or unsubscribe
> > >> <
> >
>
https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ
> > >
> > >> .
> > >>
> > >
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub
> > <
>
#2 (comment)
> >,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ
> >
> > .
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#2 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ANKETMSZRJE7KCBQJZWRPP3TQED7NANCNFSM45TPQCLQ
>
> .
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUL5CLM4D63LB7P5QVSMLTQE67XANCNFSM45TPQCLQ>
.
|
Hi Jessica,
Thank you for this advice, I will start with standard DQN and try
amortization loss later. Very useful suggestion.
Thanks a lot,
Wenyu Han
On Sat, May 29, 2021 at 4:52 PM Jessica B. Hamrick ***@***.***>
wrote:
… Yes, that's how we did it in the original construction paper ("Structured
Agents for Physical Construction"). However, we also later found using SAVE
(https://arxiv.org/abs/1912.02807) worked much better than just pure
Q-learning. SAVE works by also adding an additional "amortization loss" to
the standard Q-learning loss. It's probably easier to start with Q-learning
but I'd encourage you to try the amortization loss too (it's a pretty
straightforward change---you just need to store the Q-values computed by
MCTS into the replay buffer).
On Sat, May 29, 2021 at 8:37 PM Wenyu Han ***@***.***> wrote:
> Hi Jessica,
>
> Thank you for sharing this information. For training the DQN network, we
> still follow the standard way (means sample from replay buffer and
update Q
> network), right?
>
> Best,
> Wenyu Han
>
> On Sat, May 29, 2021 at 11:47 AM Jessica B. Hamrick <
> ***@***.***> wrote:
>
> > Hi Wenyu, we indeed just used the value from the Q-network and did not
> > perform rollouts during MCTS. This is similar to how it's done by other
> > neurally guided forms of MCTS like AlphaZero.
> >
> > Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49:
> >
> > > Hi Jessica,
> > >
> > > Did you do Rollout in MCTS or just assign value for each node based
on
> > the
> > > Q network. Because in the standard MCTS, value for each node is
> assigned
> > > when finishing the rollout at each iteration of MCTS. This makes me
> > > confused about that.
> > >
> > > Best,
> > > Wenyu Han
> > >
> > > On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote:
> > >
> > > > Hi Jessica,
> > > >
> > > > Thank you for your reply. I will try to implement myself. The
> > information
> > > > you provide is very useful to me.
> > > >
> > > > Best regards,
> > > > Wenyu Han
> > > >
> > > > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick <
> > > > ***@***.***> wrote:
> > > >
> > > >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for
> your
> > > >> interest in our work!
> > > >>
> > > >> Unfortunately I am not able to share the code for our MCTS
> > > >> implementation, but if you just want to do a standard DQN plus
MCTS,
> > > that
> > > >> should be fairly straightforward to set up if you have (1) a
> standard
> > > DQN
> > > >> implementation and (2) a standard MCTS implementation, both of
which
> > you
> > > >> should be able to find multiple examples of elsewhere online. The
> main
> > > >> thing you will need to do is to modify the MCTS code to call your
> > neural
> > > >> network at each node to estimate the Q-values, and then to use the
> > > action
> > > >> returned by the search rather than the one corresponding to the
> > maximal
> > > >> Q-value. We tried to provide a lot of details in the appendices of
> > both
> > > >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and
> > > >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in
> > particular
> > > >> Algorithm A.1).
> > > >>
> > > >> If you have any specific questions I am happy to try to clarify!
> > > >>
> > > >> —
> > > >> You are receiving this because you were mentioned.
> > > >> Reply to this email directly, view it on GitHub
> > > >> <
> > >
> >
>
#2 (comment)
> > > >,
> > > >> or unsubscribe
> > > >> <
> > >
> >
>
https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ
> > > >
> > > >> .
> > > >>
> > > >
> > >
> > > —
> > > You are receiving this because you commented.
> > > Reply to this email directly, view it on GitHub
> > > <
> >
>
#2 (comment)
> > >,
> > > or unsubscribe
> > > <
> >
>
https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ
> > >
> > > .
> > >
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <
>
#2 (comment)
> >,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/ANKETMSZRJE7KCBQJZWRPP3TQED7NANCNFSM45TPQCLQ
> >
> > .
> >
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <
#2 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAAUL5CLM4D63LB7P5QVSMLTQE67XANCNFSM45TPQCLQ
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANKETMR3SSREMHI4A4GME33TQFHZTANCNFSM45TPQCLQ>
.
|
Hi Jessica,
Hope this email finds you well! I have a few questions as following:
Is this possible to use the same MCTS with DQN as you did for the
construction paper ("Structured
Agents for Physical Construction") to solve POMDP problems. Because I am
working on a POMDP, I want to find an MCTS-based method as my baseline.
However, when I implement the MCTS for my problem, I find that MCTS
requires a transition model which transits from one state to another
state. In POMDP, the agent can only access observation.
This means I cannot use MCTS for POMDP. Further, I cannot use MCTS with DQN
for this problem.
Do I understand correctly? Do you have any suggestions for how to use MCTS
with Q learning for solving POMDP?
Best,
Wenyu Han
…On Sat, May 29, 2021 at 5:56 PM Wenyu Han ***@***.***> wrote:
Hi Jessica,
Thank you for this advice, I will start with standard DQN and try
amortization loss later. Very useful suggestion.
Thanks a lot,
Wenyu Han
On Sat, May 29, 2021 at 4:52 PM Jessica B. Hamrick <
***@***.***> wrote:
> Yes, that's how we did it in the original construction paper ("Structured
> Agents for Physical Construction"). However, we also later found using
> SAVE
> (https://arxiv.org/abs/1912.02807) worked much better than just pure
> Q-learning. SAVE works by also adding an additional "amortization loss" to
> the standard Q-learning loss. It's probably easier to start with
> Q-learning
> but I'd encourage you to try the amortization loss too (it's a pretty
> straightforward change---you just need to store the Q-values computed by
> MCTS into the replay buffer).
>
> On Sat, May 29, 2021 at 8:37 PM Wenyu Han ***@***.***> wrote:
>
> > Hi Jessica,
> >
> > Thank you for sharing this information. For training the DQN network, we
> > still follow the standard way (means sample from replay buffer and
> update Q
> > network), right?
> >
> > Best,
> > Wenyu Han
> >
> > On Sat, May 29, 2021 at 11:47 AM Jessica B. Hamrick <
> > ***@***.***> wrote:
> >
> > > Hi Wenyu, we indeed just used the value from the Q-network and did not
> > > perform rollouts during MCTS. This is similar to how it's done by
> other
> > > neurally guided forms of MCTS like AlphaZero.
> > >
> > > Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49:
> > >
> > > > Hi Jessica,
> > > >
> > > > Did you do Rollout in MCTS or just assign value for each node based
> on
> > > the
> > > > Q network. Because in the standard MCTS, value for each node is
> > assigned
> > > > when finishing the rollout at each iteration of MCTS. This makes me
> > > > confused about that.
> > > >
> > > > Best,
> > > > Wenyu Han
> > > >
> > > > On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote:
> > > >
> > > > > Hi Jessica,
> > > > >
> > > > > Thank you for your reply. I will try to implement myself. The
> > > information
> > > > > you provide is very useful to me.
> > > > >
> > > > > Best regards,
> > > > > Wenyu Han
> > > > >
> > > > > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick <
> > > > > ***@***.***> wrote:
> > > > >
> > > > >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for
> > your
> > > > >> interest in our work!
> > > > >>
> > > > >> Unfortunately I am not able to share the code for our MCTS
> > > > >> implementation, but if you just want to do a standard DQN plus
> MCTS,
> > > > that
> > > > >> should be fairly straightforward to set up if you have (1) a
> > standard
> > > > DQN
> > > > >> implementation and (2) a standard MCTS implementation, both of
> which
> > > you
> > > > >> should be able to find multiple examples of elsewhere online. The
> > main
> > > > >> thing you will need to do is to modify the MCTS code to call your
> > > neural
> > > > >> network at each node to estimate the Q-values, and then to use
> the
> > > > action
> > > > >> returned by the search rather than the one corresponding to the
> > > maximal
> > > > >> Q-value. We tried to provide a lot of details in the appendices
> of
> > > both
> > > > >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and
> > > > >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in
> > > particular
> > > > >> Algorithm A.1).
> > > > >>
> > > > >> If you have any specific questions I am happy to try to clarify!
> > > > >>
> > > > >> —
> > > > >> You are receiving this because you were mentioned.
> > > > >> Reply to this email directly, view it on GitHub
> > > > >> <
> > > >
> > >
> >
> #2 (comment)
> > > > >,
> > > > >> or unsubscribe
> > > > >> <
> > > >
> > >
> >
> https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ
> > > > >
> > > > >> .
> > > > >>
> > > > >
> > > >
> > > > —
> > > > You are receiving this because you commented.
> > > > Reply to this email directly, view it on GitHub
> > > > <
> > >
> >
> #2 (comment)
> > > >,
> > > > or unsubscribe
> > > > <
> > >
> >
> https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ
> > > >
> > > > .
> > > >
> > >
> > > —
> > > You are receiving this because you were mentioned.
> > > Reply to this email directly, view it on GitHub
> > > <
> >
> #2 (comment)
> > >,
> > > or unsubscribe
> > > <
> >
> https://github.com/notifications/unsubscribe-auth/ANKETMSZRJE7KCBQJZWRPP3TQED7NANCNFSM45TPQCLQ
> > >
> > > .
> > >
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub
> > <
> #2 (comment)
> >,
> > or unsubscribe
> > <
> https://github.com/notifications/unsubscribe-auth/AAAUL5CLM4D63LB7P5QVSMLTQE67XANCNFSM45TPQCLQ
> >
> > .
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ANKETMR3SSREMHI4A4GME33TQFHZTANCNFSM45TPQCLQ>
> .
>
|
Hello authors, I am very interested in your work. I am working on a DRL related work. Now, I am planning to add a DQN with MCTS to my project as you did. Would you please share the code or some implementation details about this MCTS baseline in your paper? Thank you ahead!
The text was updated successfully, but these errors were encountered: