New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use faster queue for replay buffers #131
Conversation
😰 import numpy as np
import timeit
from chainerrl import replay_buffer
def rand_state():
return np.random.rand(1, 3, 50, 20).astype(np.float32)
def rand_action():
return np.random.rand(1, 40).astype(np.float32)
def rand_reward():
return np.random.rand()
def f(capacity, batch_size, steps, replay_start_size):
rbuf = replay_buffer.ReplayBuffer(capacity=capacity)
s = rand_state()
a = rand_action()
for i in range(steps):
next_s = rand_state()
next_a = rand_action()
rbuf.append(s, a, rand_reward(), next_s, next_a)
s = next_s
a = next_a
if i >= replay_start_size:
rbuf.sample(batch_size)
print(min(timeit.Timer(
'f(10000, 64, 100000, 1000)',
setup="from __main__ import f; gc.enable()").
repeat(repeat=3, number=1))) |
before: 337.00381314000697 |
Good job! But this solution seems unecessarily complex to me. How about using a ring buffer like this? https://github.com/matthiasplappert/keras-rl/blob/master/rl/memory.py#L35 I think we can assume |
|
q = deque(maxlen=10000)
for _ in range(100000):
q.append(1)
if len(q) > 1000:
random.sample(q, 64)
q = RandomAccessQueue(maxlen=10000)
for _ in range(100000):
q.append(1)
if len(q) > 1000:
q._sample(64) |
to sample distinct elements
H306
time (secs) after the debug:
Non-repetitive sampling by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except two comments. Great work!
chainerrl/misc/collections.py
Outdated
|
||
return self._queue_front.pop() | ||
|
||
def _sample(self, k): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is called from ReplayBuffer, thus should be public.
Use one leading underscore only for non-public methods and instance variables.
https://www.python.org/dev/peps/pep-0008/#method-names-and-instance-variables
tests/misc_tests/test_collections.py
Outdated
cdfs_r = (np.arange(n) + 1) / n | ||
|
||
# Kolmogorov-Smirnov statistic | ||
d = max(np.amax(np.abs(cdfs - cdfs_x)) for cdfs_x in [cdfs_l, cdfs_r]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although your implementation looks correct, scipy has scipy.stats.kstest
and I think it's better to use the existing well-tested implementation.
You can get p-value by scipy.stats.kstest(xs, 'norm', args=(mean, std))[1]
.
@@ -0,0 +1,129 @@ | |||
import itertools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add future imports (at least range
is affected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Fixed. |
TODO:
This will happen to resolve #36 (see also #128)