Asking DQN-MCTS baseline code #2

WenyuHan-LiNa · 2021-05-27T05:49:19Z

Hello authors, I am very interested in your work. I am working on a DRL related work. Now, I am planning to add a DQN with MCTS to my project as you did. Would you please share the code or some implementation details about this MCTS baseline in your paper? Thank you ahead！

jhamrick · 2021-05-27T10:49:11Z

Hi @WenyuHan-LiNa, thanks for your interest in our work!

Unfortunately I am not able to share the code for our MCTS implementation, but if you just want to do a standard DQN plus MCTS, that should be fairly straightforward to set up if you have (1) a standard DQN implementation and (2) a standard MCTS implementation, both of which you should be able to find multiple examples of elsewhere online. The main thing you will need to do is to modify the MCTS code to call your neural network at each node to estimate the Q-values, and then to use the action returned by the search rather than the one corresponding to the maximal Q-value. We tried to provide a lot of details in the appendices of both https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular Algorithm A.1).

If you have any specific questions I am happy to try to clarify!

WenyuHan-LiNa · 2021-05-27T15:30:35Z

Hi Jessica, Thank you for your reply. I will try to implement myself. The information you provide is very useful to me. Best regards, Wenyu Han

…

On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick ***@***.***> wrote: Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your interest in our work! Unfortunately I am not able to share the code for our MCTS implementation, but if you just want to do a standard DQN plus MCTS, that should be fairly straightforward to set up if you have (1) a standard DQN implementation and (2) a standard MCTS implementation, both of which you should be able to find multiple examples of elsewhere online. The main thing you will need to do is to modify the MCTS code to call your neural network at each node to estimate the Q-values, and then to use the action returned by the search rather than the one corresponding to the maximal Q-value. We tried to provide a lot of details in the appendices of both https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular Algorithm A.1). If you have any specific questions I am happy to try to clarify! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ> .

WenyuHan-LiNa · 2021-05-29T14:49:20Z

Hi Jessica, Did you do Rollout in MCTS or just assign value for each node based on the Q network. Because in the standard MCTS, value for each node is assigned when finishing the rollout at each iteration of MCTS. This makes me confused about that. Best, Wenyu Han

…

On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote: Hi Jessica, Thank you for your reply. I will try to implement myself. The information you provide is very useful to me. Best regards, Wenyu Han On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick < ***@***.***> wrote: > Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your > interest in our work! > > Unfortunately I am not able to share the code for our MCTS > implementation, but if you just want to do a standard DQN plus MCTS, that > should be fairly straightforward to set up if you have (1) a standard DQN > implementation and (2) a standard MCTS implementation, both of which you > should be able to find multiple examples of elsewhere online. The main > thing you will need to do is to modify the MCTS code to call your neural > network at each node to estimate the Q-values, and then to use the action > returned by the search rather than the one corresponding to the maximal > Q-value. We tried to provide a lot of details in the appendices of both > https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and > https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular > Algorithm A.1). > > If you have any specific questions I am happy to try to clarify! > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ> > . >

jhamrick · 2021-05-29T15:46:51Z

Hi Wenyu, we indeed just used the value from the Q-network and did not perform rollouts during MCTS. This is similar to how it's done by other neurally guided forms of MCTS like AlphaZero. Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49:

…

Hi Jessica, Did you do Rollout in MCTS or just assign value for each node based on the Q network. Because in the standard MCTS, value for each node is assigned when finishing the rollout at each iteration of MCTS. This makes me confused about that. Best, Wenyu Han On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote: > Hi Jessica, > > Thank you for your reply. I will try to implement myself. The information > you provide is very useful to me. > > Best regards, > Wenyu Han > > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick < > ***@***.***> wrote: > >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your >> interest in our work! >> >> Unfortunately I am not able to share the code for our MCTS >> implementation, but if you just want to do a standard DQN plus MCTS, that >> should be fairly straightforward to set up if you have (1) a standard DQN >> implementation and (2) a standard MCTS implementation, both of which you >> should be able to find multiple examples of elsewhere online. The main >> thing you will need to do is to modify the MCTS code to call your neural >> network at each node to estimate the Q-values, and then to use the action >> returned by the search rather than the one corresponding to the maximal >> Q-value. We tried to provide a lot of details in the appendices of both >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular >> Algorithm A.1). >> >> If you have any specific questions I am happy to try to clarify! >> >> — >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on GitHub >> < #2 (comment) >, >> or unsubscribe >> < https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ > >> . >> > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ> .

WenyuHan-LiNa · 2021-05-29T19:37:20Z

Hi Jessica, Thank you for sharing this information. For training the DQN network, we still follow the standard way (means sample from replay buffer and update Q network), right? Best, Wenyu Han On Sat, May 29, 2021 at 11:47 AM Jessica B. Hamrick < ***@***.***> wrote:

…

Hi Wenyu, we indeed just used the value from the Q-network and did not perform rollouts during MCTS. This is similar to how it's done by other neurally guided forms of MCTS like AlphaZero. Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49: > Hi Jessica, > > Did you do Rollout in MCTS or just assign value for each node based on the > Q network. Because in the standard MCTS, value for each node is assigned > when finishing the rollout at each iteration of MCTS. This makes me > confused about that. > > Best, > Wenyu Han > > On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote: > > > Hi Jessica, > > > > Thank you for your reply. I will try to implement myself. The information > > you provide is very useful to me. > > > > Best regards, > > Wenyu Han > > > > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick < > > ***@***.***> wrote: > > > >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your > >> interest in our work! > >> > >> Unfortunately I am not able to share the code for our MCTS > >> implementation, but if you just want to do a standard DQN plus MCTS, > that > >> should be fairly straightforward to set up if you have (1) a standard > DQN > >> implementation and (2) a standard MCTS implementation, both of which you > >> should be able to find multiple examples of elsewhere online. The main > >> thing you will need to do is to modify the MCTS code to call your neural > >> network at each node to estimate the Q-values, and then to use the > action > >> returned by the search rather than the one corresponding to the maximal > >> Q-value. We tried to provide a lot of details in the appendices of both > >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and > >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in particular > >> Algorithm A.1). > >> > >> If you have any specific questions I am happy to try to clarify! > >> > >> — > >> You are receiving this because you were mentioned. > >> Reply to this email directly, view it on GitHub > >> < > #2 (comment) > >, > >> or unsubscribe > >> < > https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ > > > >> . > >> > > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > < #2 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANKETMSZRJE7KCBQJZWRPP3TQED7NANCNFSM45TPQCLQ> .

jhamrick · 2021-05-29T20:52:29Z

Yes, that's how we did it in the original construction paper ("Structured Agents for Physical Construction"). However, we also later found using SAVE (https://arxiv.org/abs/1912.02807) worked much better than just pure Q-learning. SAVE works by also adding an additional "amortization loss" to the standard Q-learning loss. It's probably easier to start with Q-learning but I'd encourage you to try the amortization loss too (it's a pretty straightforward change---you just need to store the Q-values computed by MCTS into the replay buffer).

…

On Sat, May 29, 2021 at 8:37 PM Wenyu Han ***@***.***> wrote: Hi Jessica, Thank you for sharing this information. For training the DQN network, we still follow the standard way (means sample from replay buffer and update Q network), right? Best, Wenyu Han On Sat, May 29, 2021 at 11:47 AM Jessica B. Hamrick < ***@***.***> wrote: > Hi Wenyu, we indeed just used the value from the Q-network and did not > perform rollouts during MCTS. This is similar to how it's done by other > neurally guided forms of MCTS like AlphaZero. > > Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49: > > > Hi Jessica, > > > > Did you do Rollout in MCTS or just assign value for each node based on > the > > Q network. Because in the standard MCTS, value for each node is assigned > > when finishing the rollout at each iteration of MCTS. This makes me > > confused about that. > > > > Best, > > Wenyu Han > > > > On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote: > > > > > Hi Jessica, > > > > > > Thank you for your reply. I will try to implement myself. The > information > > > you provide is very useful to me. > > > > > > Best regards, > > > Wenyu Han > > > > > > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick < > > > ***@***.***> wrote: > > > > > >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for your > > >> interest in our work! > > >> > > >> Unfortunately I am not able to share the code for our MCTS > > >> implementation, but if you just want to do a standard DQN plus MCTS, > > that > > >> should be fairly straightforward to set up if you have (1) a standard > > DQN > > >> implementation and (2) a standard MCTS implementation, both of which > you > > >> should be able to find multiple examples of elsewhere online. The main > > >> thing you will need to do is to modify the MCTS code to call your > neural > > >> network at each node to estimate the Q-values, and then to use the > > action > > >> returned by the search rather than the one corresponding to the > maximal > > >> Q-value. We tried to provide a lot of details in the appendices of > both > > >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and > > >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in > particular > > >> Algorithm A.1). > > >> > > >> If you have any specific questions I am happy to try to clarify! > > >> > > >> — > > >> You are receiving this because you were mentioned. > > >> Reply to this email directly, view it on GitHub > > >> < > > > #2 (comment) > > >, > > >> or unsubscribe > > >> < > > > https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ > > > > > >> . > > >> > > > > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > < > #2 (comment) > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ > > > > . > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #2 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ANKETMSZRJE7KCBQJZWRPP3TQED7NANCNFSM45TPQCLQ > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAUL5CLM4D63LB7P5QVSMLTQE67XANCNFSM45TPQCLQ> .

WenyuHan-LiNa · 2021-05-29T21:56:26Z

Hi Jessica, Thank you for this advice, I will start with standard DQN and try amortization loss later. Very useful suggestion. Thanks a lot, Wenyu Han On Sat, May 29, 2021 at 4:52 PM Jessica B. Hamrick ***@***.***> wrote:

…

Yes, that's how we did it in the original construction paper ("Structured Agents for Physical Construction"). However, we also later found using SAVE (https://arxiv.org/abs/1912.02807) worked much better than just pure Q-learning. SAVE works by also adding an additional "amortization loss" to the standard Q-learning loss. It's probably easier to start with Q-learning but I'd encourage you to try the amortization loss too (it's a pretty straightforward change---you just need to store the Q-values computed by MCTS into the replay buffer). On Sat, May 29, 2021 at 8:37 PM Wenyu Han ***@***.***> wrote: > Hi Jessica, > > Thank you for sharing this information. For training the DQN network, we > still follow the standard way (means sample from replay buffer and update Q > network), right? > > Best, > Wenyu Han > > On Sat, May 29, 2021 at 11:47 AM Jessica B. Hamrick < > ***@***.***> wrote: > > > Hi Wenyu, we indeed just used the value from the Q-network and did not > > perform rollouts during MCTS. This is similar to how it's done by other > > neurally guided forms of MCTS like AlphaZero. > > > > Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49: > > > > > Hi Jessica, > > > > > > Did you do Rollout in MCTS or just assign value for each node based on > > the > > > Q network. Because in the standard MCTS, value for each node is > assigned > > > when finishing the rollout at each iteration of MCTS. This makes me > > > confused about that. > > > > > > Best, > > > Wenyu Han > > > > > > On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote: > > > > > > > Hi Jessica, > > > > > > > > Thank you for your reply. I will try to implement myself. The > > information > > > > you provide is very useful to me. > > > > > > > > Best regards, > > > > Wenyu Han > > > > > > > > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick < > > > > ***@***.***> wrote: > > > > > > > >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for > your > > > >> interest in our work! > > > >> > > > >> Unfortunately I am not able to share the code for our MCTS > > > >> implementation, but if you just want to do a standard DQN plus MCTS, > > > that > > > >> should be fairly straightforward to set up if you have (1) a > standard > > > DQN > > > >> implementation and (2) a standard MCTS implementation, both of which > > you > > > >> should be able to find multiple examples of elsewhere online. The > main > > > >> thing you will need to do is to modify the MCTS code to call your > > neural > > > >> network at each node to estimate the Q-values, and then to use the > > > action > > > >> returned by the search rather than the one corresponding to the > > maximal > > > >> Q-value. We tried to provide a lot of details in the appendices of > > both > > > >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and > > > >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in > > particular > > > >> Algorithm A.1). > > > >> > > > >> If you have any specific questions I am happy to try to clarify! > > > >> > > > >> — > > > >> You are receiving this because you were mentioned. > > > >> Reply to this email directly, view it on GitHub > > > >> < > > > > > > #2 (comment) > > > >, > > > >> or unsubscribe > > > >> < > > > > > > https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ > > > > > > > >> . > > > >> > > > > > > > > > > — > > > You are receiving this because you commented. > > > Reply to this email directly, view it on GitHub > > > < > > > #2 (comment) > > >, > > > or unsubscribe > > > < > > > https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ > > > > > > . > > > > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > < > #2 (comment) > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/ANKETMSZRJE7KCBQJZWRPP3TQED7NANCNFSM45TPQCLQ > > > > . > > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > < #2 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAAUL5CLM4D63LB7P5QVSMLTQE67XANCNFSM45TPQCLQ > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANKETMR3SSREMHI4A4GME33TQFHZTANCNFSM45TPQCLQ> .

WenyuHan-LiNa · 2021-06-15T12:43:28Z

Hi Jessica, Hope this email finds you well! I have a few questions as following: Is this possible to use the same MCTS with DQN as you did for the construction paper ("Structured Agents for Physical Construction") to solve POMDP problems. Because I am working on a POMDP, I want to find an MCTS-based method as my baseline. However, when I implement the MCTS for my problem, I find that MCTS requires a transition model which transits from one state to another state. In POMDP, the agent can only access observation. This means I cannot use MCTS for POMDP. Further, I cannot use MCTS with DQN for this problem. Do I understand correctly? Do you have any suggestions for how to use MCTS with Q learning for solving POMDP? Best, Wenyu Han

…

On Sat, May 29, 2021 at 5:56 PM Wenyu Han ***@***.***> wrote: Hi Jessica, Thank you for this advice, I will start with standard DQN and try amortization loss later. Very useful suggestion. Thanks a lot, Wenyu Han On Sat, May 29, 2021 at 4:52 PM Jessica B. Hamrick < ***@***.***> wrote: > Yes, that's how we did it in the original construction paper ("Structured > Agents for Physical Construction"). However, we also later found using > SAVE > (https://arxiv.org/abs/1912.02807) worked much better than just pure > Q-learning. SAVE works by also adding an additional "amortization loss" to > the standard Q-learning loss. It's probably easier to start with > Q-learning > but I'd encourage you to try the amortization loss too (it's a pretty > straightforward change---you just need to store the Q-values computed by > MCTS into the replay buffer). > > On Sat, May 29, 2021 at 8:37 PM Wenyu Han ***@***.***> wrote: > > > Hi Jessica, > > > > Thank you for sharing this information. For training the DQN network, we > > still follow the standard way (means sample from replay buffer and > update Q > > network), right? > > > > Best, > > Wenyu Han > > > > On Sat, May 29, 2021 at 11:47 AM Jessica B. Hamrick < > > ***@***.***> wrote: > > > > > Hi Wenyu, we indeed just used the value from the Q-network and did not > > > perform rollouts during MCTS. This is similar to how it's done by > other > > > neurally guided forms of MCTS like AlphaZero. > > > > > > Wenyu Han ***@***.***> schrieb am Sa., 29. Mai 2021, 15:49: > > > > > > > Hi Jessica, > > > > > > > > Did you do Rollout in MCTS or just assign value for each node based > on > > > the > > > > Q network. Because in the standard MCTS, value for each node is > > assigned > > > > when finishing the rollout at each iteration of MCTS. This makes me > > > > confused about that. > > > > > > > > Best, > > > > Wenyu Han > > > > > > > > On Thu, May 27, 2021 at 11:30 AM Wenyu Han ***@***.***> wrote: > > > > > > > > > Hi Jessica, > > > > > > > > > > Thank you for your reply. I will try to implement myself. The > > > information > > > > > you provide is very useful to me. > > > > > > > > > > Best regards, > > > > > Wenyu Han > > > > > > > > > > On Thu, May 27, 2021 at 6:49 AM Jessica B. Hamrick < > > > > > ***@***.***> wrote: > > > > > > > > > >> Hi @WenyuHan-LiNa <https://github.com/WenyuHan-LiNa>, thanks for > > your > > > > >> interest in our work! > > > > >> > > > > >> Unfortunately I am not able to share the code for our MCTS > > > > >> implementation, but if you just want to do a standard DQN plus > MCTS, > > > > that > > > > >> should be fairly straightforward to set up if you have (1) a > > standard > > > > DQN > > > > >> implementation and (2) a standard MCTS implementation, both of > which > > > you > > > > >> should be able to find multiple examples of elsewhere online. The > > main > > > > >> thing you will need to do is to modify the MCTS code to call your > > > neural > > > > >> network at each node to estimate the Q-values, and then to use > the > > > > action > > > > >> returned by the search rather than the one corresponding to the > > > maximal > > > > >> Q-value. We tried to provide a lot of details in the appendices > of > > > both > > > > >> https://arxiv.org/pdf/1904.03177.pdf (see Appendix E) and > > > > >> https://arxiv.org/pdf/1912.02807.pdf (see Appendix A and in > > > particular > > > > >> Algorithm A.1). > > > > >> > > > > >> If you have any specific questions I am happy to try to clarify! > > > > >> > > > > >> — > > > > >> You are receiving this because you were mentioned. > > > > >> Reply to this email directly, view it on GitHub > > > > >> < > > > > > > > > > > #2 (comment) > > > > >, > > > > >> or unsubscribe > > > > >> < > > > > > > > > > > https://github.com/notifications/unsubscribe-auth/ANKETMWFJ2OFGWUUKWCGIBLTPYPTNANCNFSM45TPQCLQ > > > > > > > > > >> . > > > > >> > > > > > > > > > > > > > — > > > > You are receiving this because you commented. > > > > Reply to this email directly, view it on GitHub > > > > < > > > > > > #2 (comment) > > > >, > > > > or unsubscribe > > > > < > > > > > > https://github.com/notifications/unsubscribe-auth/AAAUL5AKCXG5YOG5M5H25CTTQD5HZANCNFSM45TPQCLQ > > > > > > > > . > > > > > > > > > > — > > > You are receiving this because you were mentioned. > > > Reply to this email directly, view it on GitHub > > > < > > > #2 (comment) > > >, > > > or unsubscribe > > > < > > > https://github.com/notifications/unsubscribe-auth/ANKETMSZRJE7KCBQJZWRPP3TQED7NANCNFSM45TPQCLQ > > > > > > . > > > > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > < > #2 (comment) > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AAAUL5CLM4D63LB7P5QVSMLTQE67XANCNFSM45TPQCLQ > > > > . > > > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ANKETMR3SSREMHI4A4GME33TQFHZTANCNFSM45TPQCLQ> > . >

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asking DQN-MCTS baseline code #2

Asking DQN-MCTS baseline code #2

WenyuHan-LiNa commented May 27, 2021

jhamrick commented May 27, 2021

WenyuHan-LiNa commented May 27, 2021 via email

WenyuHan-LiNa commented May 29, 2021 via email

jhamrick commented May 29, 2021 via email

WenyuHan-LiNa commented May 29, 2021 via email

jhamrick commented May 29, 2021 via email

WenyuHan-LiNa commented May 29, 2021 via email

WenyuHan-LiNa commented Jun 15, 2021 via email

Asking DQN-MCTS baseline code #2

Asking DQN-MCTS baseline code #2

Comments

WenyuHan-LiNa commented May 27, 2021

jhamrick commented May 27, 2021

WenyuHan-LiNa commented May 27, 2021 via email

WenyuHan-LiNa commented May 29, 2021 via email

jhamrick commented May 29, 2021 via email

WenyuHan-LiNa commented May 29, 2021 via email

jhamrick commented May 29, 2021 via email

WenyuHan-LiNa commented May 29, 2021 via email

WenyuHan-LiNa commented Jun 15, 2021 via email