You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have some problem with the BEAR_IS in your algos.py file.
As is known to us, DDPG is actually one-step Q-learning in continuous tasks and BEAR also takes such architechture. Now that it makes no sense to use importance sampling in BEAR because the difference between current policy and behavioral policy doesn't result in the inaccuracy of the estimation of Q-value.
So Can you explain why you wrote a importance sampling version of BEAR in your project?
The text was updated successfully, but these errors were encountered:
Hello, I have some problem with the BEAR_IS in your algos.py file.
As is known to us, DDPG is actually one-step Q-learning in continuous tasks and BEAR also takes such architechture. Now that it makes no sense to use importance sampling in BEAR because the difference between current policy and behavioral policy doesn't result in the inaccuracy of the estimation of Q-value.
So Can you explain why you wrote a importance sampling version of BEAR in your project?
The text was updated successfully, but these errors were encountered: