Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make contiguous case for chainerx::AddAt faster #8299

Merged
merged 2 commits into from
Oct 28, 2019

Conversation

emcastillo
Copy link
Member

@emcastillo emcastillo commented Oct 16, 2019

Reworked it similar to #8295

IndexableArrays uses the 1D specialized template whenever it is possible.

For 40 runs of a pathological case, execution time decreased from 0.18 seconds to 0.02

Please note that addat is not currently exposed to the python API.

def bench(name, ary_fn):
    np.random.seed(42)
    x = np.zeros((20, 96, 132)).astype(np.float32)
    y = np.random.rand(20, 96, 36, 11).astype(np.float32)
    indices = np.random.randint(132, size=(36, 11)).astype(np.int32)
    a = chx.array(x, device='cuda:0')
    b = chx.array(y, device='cuda:0')
    indices = chx.array(indices, device='cuda:0')

    for i in range(10):
        cuda = chx.addat(a,indices,b,axis=2)

    cupy.cuda.device.Device().synchronize()
    start = time.time()
    for i in range(40):
        cuda = chx.addat(a,indices,b,axis=2)
    cupy.cuda.device.Device().synchronize()
    take = time.time() - start
    print(name, take)

@emcastillo emcastillo added the ChainerX Related to ChainerX. label Oct 16, 2019
@emcastillo emcastillo changed the title [RFC] Make contiguous case for chainerx::AddAt faster [WIP] Make contiguous case for chainerx::AddAt faster Oct 17, 2019
@emcastillo emcastillo changed the title [WIP] Make contiguous case for chainerx::AddAt faster Make contiguous case for chainerx::AddAt faster Oct 24, 2019
@emcastillo emcastillo force-pushed the fast_addat branch 2 times, most recently from b78a2b4 to 0833218 Compare October 24, 2019 00:54
@emcastillo
Copy link
Member Author

PTAL 😄

@asi1024 asi1024 added the cat:performance Performance in terms of speed or memory consumption. label Oct 24, 2019
@asi1024
Copy link
Member

asi1024 commented Oct 24, 2019

Jenkins and flexCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit d85cc1d, target branch master) failed with status FAILURE.

@asi1024
Copy link
Member

asi1024 commented Oct 28, 2019

Jenkins and flexCI, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit d85cc1d, target branch master) succeeded!

@asi1024
Copy link
Member

asi1024 commented Oct 28, 2019

LGTM.

@asi1024 asi1024 merged commit bd5f2cc into chainer:master Oct 28, 2019
@asi1024 asi1024 added this to the v7.0.0 milestone Oct 28, 2019
@emcastillo emcastillo deleted the fast_addat branch October 29, 2019 01:58
shinh added a commit to shinh/chainer-compiler that referenced this pull request Oct 29, 2019
Thanks to chainer/chainer#8299
contiguous case is significantly faster.
@niboshi niboshi mentioned this pull request Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:performance Performance in terms of speed or memory consumption. ChainerX Related to ChainerX.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants