Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use faster queue for replay buffers #131

Merged
merged 20 commits into from Oct 13, 2017
Merged

Conversation

toslunar
Copy link
Member

@toslunar toslunar commented Aug 9, 2017

TODO:

  • Add tests
  • Measure performance

This will happen to resolve #36 (see also #128)

@coveralls
Copy link

coveralls commented Aug 9, 2017

Coverage Status

Coverage decreased (-0.3%) to 71.167% when pulling 34c80ab on toslunar:replace-deque into e095992 on chainer:master.

@coveralls
Copy link

coveralls commented Aug 10, 2017

Coverage Status

Coverage increased (+0.005%) to 71.518% when pulling d7299fc on toslunar:replace-deque into e095992 on chainer:master.

@toslunar
Copy link
Member Author

😰
before: 13.168848936009454
after: 16.709212244983064

import numpy as np
import timeit

from chainerrl import replay_buffer


def rand_state():
    return np.random.rand(1, 3, 50, 20).astype(np.float32)


def rand_action():
    return np.random.rand(1, 40).astype(np.float32)


def rand_reward():
    return np.random.rand()


def f(capacity, batch_size, steps, replay_start_size):
    rbuf = replay_buffer.ReplayBuffer(capacity=capacity)
    s = rand_state()
    a = rand_action()
    for i in range(steps):
        next_s = rand_state()
        next_a = rand_action()
        rbuf.append(s, a, rand_reward(), next_s, next_a)
        s = next_s
        a = next_a
        if i >= replay_start_size:
            rbuf.sample(batch_size)


print(min(timeit.Timer(
    'f(10000, 64, 100000, 1000)',
    setup="from __main__ import f; gc.enable()").
    repeat(repeat=3, number=1)))

@toslunar
Copy link
Member Author

before: 337.00381314000697
after: 171.62339709801017
for f(100000, 64, 1000000, 10000).

@toslunar toslunar changed the title [WIP] Use faster queue for replay buffers Use faster queue for replay buffers Aug 10, 2017
@muupan
Copy link
Member

muupan commented Aug 11, 2017

Good job! But this solution seems unecessarily complex to me. How about using a ring buffer like this? https://github.com/matthiasplappert/keras-rl/blob/master/rl/memory.py#L35

I think we can assume maxlen is specified for replay buffers, otherwise training would face out-of-memory eventually.

@toslunar
Copy link
Member Author

EpisodicReplayBuffer limits the number of the transitions in the buffer. It's not easy to determine the best maxlen of self.episodic_memory = RingBuffer(maxlen). Of course, there is sufficiently large one (and this doesn't cost memory much, relatively to other parts): self.episodic_memory = RingBuffer(capacity).

@toslunar
Copy link
Member Author

  • collections.deque + random.sample: 8.064196354011074
q = deque(maxlen=10000)
for _ in range(100000):
    q.append(1)
    if len(q) > 1000:
        random.sample(q, 64)
  • RandomAccessQueue
    -- at d7299fc: 13.529830269049853
    -- at fc39b20: 10.133344302070327
    -- at b8bec91: 5.198489462025464
q = RandomAccessQueue(maxlen=10000)
for _ in range(100000):
    q.append(1)
    if len(q) > 1000:
        q._sample(64)

to sample distinct elements
@coveralls
Copy link

coveralls commented Aug 17, 2017

Coverage Status

Coverage increased (+0.02%) to 71.536% when pulling 88c9fac on toslunar:replace-deque into e095992 on chainer:master.

@coveralls
Copy link

coveralls commented Aug 17, 2017

Coverage Status

Coverage increased (+0.02%) to 71.536% when pulling 88c9fac on toslunar:replace-deque into e095992 on chainer:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 71.526% when pulling 874b4ad on toslunar:replace-deque into e095992 on chainer:master.

@coveralls
Copy link

coveralls commented Aug 18, 2017

Coverage Status

Coverage increased (+0.1%) to 71.624% when pulling 312c025 on toslunar:replace-deque into e095992 on chainer:master.

@coveralls
Copy link

coveralls commented Aug 18, 2017

Coverage Status

Coverage increased (+0.1%) to 71.624% when pulling fd14a7b on toslunar:replace-deque into e095992 on chainer:master.

@coveralls
Copy link

coveralls commented Aug 22, 2017

Coverage Status

Coverage increased (+0.1%) to 71.624% when pulling 1d9b658 on toslunar:replace-deque into e095992 on chainer:master.

@toslunar
Copy link
Member Author

time (secs) after the debug:

(steps, maxlen) baseline: collections.deque + random.sample collections.deque + _sample_n_k RandomAccessQueue + _sample_n_k
(100000, 10000) 7.3263045670464635 5.774126871023327 7.128595568938181
(300000, 30000) 26.707710441900417 21.629545310977846 20.986778931925073
(1000000, 100000) 197.8464105380699 74.75397148309276

Non-repetitive sampling by [q[i] for i in _sample_n_k(len(q),64)] is faster than by random.sample(q, 64). With this faster sampling, RandomAccessQueue is faster than collections.deque when maxlen is larger than around 30000.

Copy link
Member

@muupan muupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except two comments. Great work!


return self._queue_front.pop()

def _sample(self, k):
Copy link
Member

@muupan muupan Sep 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called from ReplayBuffer, thus should be public.

Use one leading underscore only for non-public methods and instance variables.

https://www.python.org/dev/peps/pep-0008/#method-names-and-instance-variables

cdfs_r = (np.arange(n) + 1) / n

# Kolmogorov-Smirnov statistic
d = max(np.amax(np.abs(cdfs - cdfs_x)) for cdfs_x in [cdfs_l, cdfs_r])
Copy link
Member

@muupan muupan Sep 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although your implementation looks correct, scipy has scipy.stats.kstest and I think it's better to use the existing well-tested implementation.

You can get p-value by scipy.stats.kstest(xs, 'norm', args=(mean, std))[1].

@@ -0,0 +1,129 @@
import itertools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add future imports (at least range is affected)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@toslunar
Copy link
Member Author

toslunar commented Sep 4, 2017

Fixed.

@coveralls
Copy link

coveralls commented Sep 4, 2017

Coverage Status

Coverage increased (+0.2%) to 71.665% when pulling eba0baa on toslunar:replace-deque into e095992 on chainer:master.

@coveralls
Copy link

coveralls commented Sep 4, 2017

Coverage Status

Coverage increased (+0.2%) to 71.665% when pulling eba0baa on toslunar:replace-deque into e095992 on chainer:master.

@muupan muupan merged commit e84ea5f into chainer:master Oct 13, 2017
@toslunar toslunar deleted the replace-deque branch October 16, 2017 01:44
@muupan muupan added this to the v0.3 milestone Nov 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The tutorial code causes TypeError on python 3.4
4 participants