Implement shared_intermediates context manager #43

fritzo · 2018-08-20T05:51:52Z

Resolves #9

Description

This implements a shared_intermediates context manager, within which multiple calls to contract will share intermediate computations. This is very useful e.g. in message passing algorithms. The first draft of this PR is copied directly from Pyro, where we are using sharing for probabilistic inference in graphical models.

The implementation uses a special internal backend opt_einsum.backends.shared. When a sharing context is opened

with shared_intermediates():
    contract(..., backend='foo')
    contract(..., backend='foo')

a special handle_sharing() context is activated inside the contract function, temporarily switching to the shared backend, and setting the original backend foo in a global shared._CURRENT_BACKEND variable. The sharing backend then performs all the original foo operations and also memoizes them.

One design choice is to memoize by id() rather than by value. This makes sense from a computational perspective (equality comparison is expensive), but requires a bit more care by users.

A second design choice is to expose the memoization cache to users. This makes it easy to share intermediates by passing the cache around, as is necessary when it is difficult to perform all computation in a context:

with shared_intermediates() as cache:
    contract(...)
...pass control elsewhere...
with shared_intermediates(cache):  # <-- reuse previous cache
    contract(...)
del cache  # <-- intermediates will be garbage collected

Todos

Cache type conversions e.g. numpy->torch->numpy so args can be compared by id.
fix failing tests
add tests that pass around the cache

add tests where we tests the normal ordering:

with shared_intermeidates() as cache:
    a = np.random.rand(4, 4)
    b = np.random.rand(4, 4)
    contract("ab,bc->ac", a, b)

assert get_cache_key("zc,az->ac", b, a) in cache
...

test against the Pyro rm-einsum-shared branch
test nested sharing
write docs
move _alpha_canonicalize() to parser.py
test caching with constant expressions

Questions

Could a maintainer suggest where the handle_sharing() context should be moved? I've attempted to insert it into ContractExpression.__call__(), but I'm a bit lost.

Status

Ready to go

fritzo · 2018-08-20T05:52:25Z

opt_einsum/contract.py

-                   "The internal error was: '%s'" % original_msg, )
-            err.args = msg
-            raise
+        with handle_sharing(backend) as backend:


This is really the only line changed; I've simply moved the following lines inside this context.

fritzo · 2018-08-20T05:52:41Z

cc @eb8680

jcmgray · 2018-08-20T11:14:08Z

Nice, looking forward to testing this and putting it to use in the wild! I do have a suggestion to simplify this a bit further (sorry for suggesting all these refactors!).

Basically, instead of making it a backend, and then storing the real desired backend in a temporary and using the handle_sharing, the cached versions of these could could just be turned on directly in the core _tensordot, _einsum etc. Maybe as decorators defined (along with shared_intermediaries) in a shared.py file.

e.g.

# opt_einsum/shared.py

def shared_intermediaries(...):
    # and other stuff

def tensordot_cache_wrap(tensordot_fn):

    def tensordot_cached(x, y, axes, backend):
        if not _SHARING_STACK:
            return tensordot_fn(x, y, axes, backend)

        ...  # parse and get global cache etc
        key = (id(x), id(y), axes, backend)
        if key not in cache:
            cache[key] = tensordot_fn(x, y, axes, backend)

        return cache[key]

    return tensordot_cached

The advantage is that in contract.py we then just need

@tensordot_cache_wrap
def _tensordot(x, y, axes, backend)
    ....

I.e. the caching logic is really factored out and handle_shared and _CURRENT_BACKEND are not needed. Additionally, the calls would be explicitly routed through _einsum, _tensordot & _transpose (in future they might be modified). The decorator approach is obviously optional - just a nice away of keeping the core contract logic clean -- but the key thing is that shared_intermediaries might not need to use the backend api itself.

Also the tests seem to be failing on travis currently? but otherwise looking good!

jcmgray · 2018-08-20T11:17:19Z

opt_einsum/backends/shared.py

+
+_SHARING_STACK = []
+_CURRENT_BACKEND = []
+


Could these just be explicit global variables? Or does the list functionality enable nested sharing with different caches or something?

_SHARING_STACK indeed allows nesting / separate caches. I'll switch _CURRENT_BACKEND to an explicit global variable.

jcmgray · 2018-08-20T11:19:26Z

opt_einsum/backends/shared.py

+    cache = _SHARING_STACK[-1]
+    cache['tensor', id(x)] = x
+    cache['tensor', id(y)] = y
+


Do we need to cache the tensors on their own, are they every retrieved?

This is required to prevent the input tensors from being garbage collected and their ids being reused, which would lead to an incorrect cache lookup. I'll add a comment to this effect.

Ah ok! I hadn't realised that about id.

Fun example:

>>> a = np.random.rand(4, 4) >>> id(a[0]) 4560101008 >>> id(a[0]) 4585119616 >>> id(a[0]) 4585119616 >>> id(a[0]) 4585119616 >>> id(a[1]) 4585119616 >>> id(a[2]) 4585119616

jcmgray · 2018-08-20T11:25:40Z

opt_einsum/backends/shared.py

+    canonical_inputs = ','.join(input_ for input_, _ in canonical)
+    canonical_equation = _alpha_canonicalize('{}->{}'.format(canonical_inputs, output))
+    canonical_operands = tuple(d for _, d in canonical)
+    key = 'einsum', backend, canonical_equation, tuple(map(id, canonical_operands))


Does this bit enable it so that einsum('ab,bc->ca', x, y) matches e.g. einsum('jk,ij->ki', y, x)? If so, nice!

Yes, we're accounting for a little bit of commutativity in the cache lookup. We did this in Pyro to improve our sharing.

+1 for normal ordering. I think this takes care of most edge cases. The only missing edge case that I can think of is if a user does take identical views in different contexts which would require checking of the __array_interface__ syntax. Seems like a stretch to take care of in the first pass.

jcmgray · 2018-08-20T13:23:38Z

opt_einsum/backends/shared.py

+        cache = {}
+    _SHARING_STACK.append(cache)
+    yield cache
+    _SHARING_STACK.pop()


Does this need to be in a try/finally block so that opt_einsum doesn't remain in 'shared' mode if an error is raised?

The try-finally is automatically performed by @contextlib.contextmanager

Are you sure about this specifically for contextmanager? The docs don't mention it and in my tests it is needed otherwise the post yield code is never reached. Testing with these snippets:

from contextlib import contextmanager things = [] @contextmanager def TrySomething(value): things.append(value) # try: yield things[-1] # finally: things.pop()

then

with TrySomething('hello'): raise ValueError

gives things=['hello'] without the try/finally but [] with.

Gosh, it looks like I'm mistaken. I'll fix and push. Thanks for catching this!

dgasmith

Overall this looks really good. A couple minor points to look at.

Very excited to have these changes in, they will be very useful.

dgasmith · 2018-08-20T20:42:30Z

opt_einsum/backends/shared.py

+    canonical_inputs = ','.join(input_ for input_, _ in canonical)
+    canonical_equation = _alpha_canonicalize('{}->{}'.format(canonical_inputs, output))
+    canonical_operands = tuple(d for _, d in canonical)
+    key = 'einsum', backend, canonical_equation, tuple(map(id, canonical_operands))


+1 for normal ordering. I think this takes care of most edge cases. The only missing edge case that I can think of is if a user does take identical views in different contexts which would require checking of the __array_interface__ syntax. Seems like a stretch to take care of in the first pass.

dgasmith · 2018-08-20T20:42:55Z

opt_einsum/backends/shared.py

+        cache['tensor', id(d)] = d
+
+    # compute a canonical hash, modulo commutativity
+    inputs, output = equation.split('->')


Are we guaranteed to have a "->" at this stage? I think so, but worth double checking.

I believe so, but I'm not super familiar with the entire einsum syntax. At this stage we are guaranteed to have the same equation that was passed to contract(equation, ...).

dgasmith · 2018-08-20T20:44:27Z

opt_einsum/backends/shared.py

+
+def tensordot(x, y, axes=2):
+    backend = _CURRENT_BACKEND[0]
+    cache = _SHARING_STACK[-1]


Can we normal order x, y here as well using id as below? With indices explicitly labeled I think this should be ok.

I believe we can't normal-order x,y because their non-contracted dimensions are treated differently: x's are on the left and y's are on the right.

One possibility would be to check for (x, y, axes) then (y, x, axes[::-1]) and if its the second just perform the transposition (which will be something like transpose(x, [2, 3, 4, 0, 1]).

Yeah, in this first PR I've implemented lookup modulo-commutativity but not modulo-transpose. I agree that we could do this modulo-transpose lookup here and also in the einsum part. But I'm still struggling to get tests to pass even in the current PR, so I'd be happy to limit this first PR to commutativity.

+1 to getting something in, can we flag this and make an issue so that we do not forget about this point?

Yeah absolutely - it's certainly not necessary for this PR. Good to keep track of these things so they can maybe be added later however!

dgasmith · 2018-08-20T20:48:48Z

opt_einsum/backends/shared.py

+    cache = _SHARING_STACK[-1]
+    cache['tensor', id(x)] = x
+    cache['tensor', id(y)] = y
+


Fun example:

>>> a = np.random.rand(4, 4) >>> id(a[0]) 4560101008 >>> id(a[0]) 4585119616 >>> id(a[0]) 4585119616 >>> id(a[0]) 4585119616 >>> id(a[1]) 4585119616 >>> id(a[2]) 4585119616

dgasmith · 2018-08-20T20:52:38Z

opt_einsum/tests/test_sharing.py

+    print('-' * 40)
+    print('Without sharing: {} expressions'.format(num_exprs_nosharing))
+    print('With sharing: {} expressions'.format(num_exprs_sharing))
+    assert num_exprs_nosharing > num_exprs_sharing


We should consider adding additional tests where we tests the normal ordering:

with shared_intermeidates() as cache: a = np.random.rand(4, 4) b = np.random.rand(4, 4) contract("ab,bc->ac", a, b) assert get_cache_key("zc,az->ac", b, a) in cache ...

dgasmith · 2018-08-20T20:53:27Z

opt_einsum/backends/shared.py

+    # compute a canonical hash, modulo commutativity
+    inputs, output = equation.split('->')
+    inputs = inputs.split(',')
+    canonical = sorted(zip(inputs, operands), key=lambda x: id(x[1]))


It might be good to break this code out into another function. I have a feeling that tokenizing contractions will be very useful in the future.

fritzo · 2018-08-20T21:09:40Z

@jcmgray Maybe as decorators ...

I like this idea! I'll try refactoring and see if I have better luck getting tests to pass with the decorator version.

codecov-io · 2018-08-21T03:07:53Z

Codecov Report

Merging #43 into master will increase coverage by 0.22%.
The diff coverage is 98.9%.

fritzo · 2018-08-21T04:46:40Z

@jcmgray I've refactored to use decorators as you suggested, and indeed the changes are now minimally intrusive.

I've also had to decorate to_torch and to_cupy. I don't know how to wrap the other automatically-converted backends, but all backends should work with sharing if the user does manual conversion.

jcmgray · 2018-08-21T12:49:08Z

Yes thanks very much for that change, it's looking great. My notes at this point are completely optional extensions:

With regards to the conversion functions, I can't think of any reason that every to_{backend} function could't be memoized. At the point it would be called tensorflow (in non-eager mode) and theano are just building the expression, so there's no direct speed advantage, but I can imagine that it only helps the compilers if it can see that some tensors are the same object. On the other hand, there is no single function to_backend(x, backend=...) at the moment so it gets a bit messy -- could well be left for the moment.
Do we need to think about multi-threading scenarios? With one thread popping the cache early?
On a similar note, it might be nice to test the nested sharing if it's an intentional feature.
As I think @dgasmith pointed out, it also might be nice to just move the einsum pre-cache canonicalization into its own function in parser since it might be useful elsewhere.

But like I say, already looks good to go from my perspective without these niche things! Docs-wise, maybe added to the readme.md bullet point list and it's own page in the main docs, @dgasmith ?

dgasmith · 2018-08-21T13:21:01Z

A separate doc page would be good I think, probably under the "Getting Started" heading and small snippet in the current README would be good. We also should test caching with constant expressions.

@fritzo All of these issues do not need to be tackled in this particular PR, but will need to be addressed before the next release. Please feel free to turn any points into issues to be tackled. I should have some time this weekend to work on a point or two.

fritzo · 2018-08-21T16:21:17Z

LICENSE

@@ -1,6 +1,7 @@
 The MIT License (MIT)

 Copyright (c) 2014 Daniel Smith
+Copyright (c) 2018 Uber Technologies


@dgasmith my employer requires me to add a copyright line somewhere. Is it ok here, or would you like me to move it to sharing.py or somewhere else?

Yea, could we move this to sharing.py? We should probably look at changing the copyright to the "opt_einsum developers" in the future. I need to look into this angle of things a bit more.

fritzo · 2018-08-21T17:48:52Z

@jcmgray Do we need to think about multi-threading scenarios?

I really don't know, as I've never used threads in Python. What do you think?

fritzo · 2018-08-21T17:52:17Z

Ok, I think this should be ready to go when tests pass. I can open issues for remaining improvements.

dgasmith

LGTM overall

dgasmith · 2018-08-21T21:18:51Z

LICENSE

@@ -1,6 +1,7 @@
 The MIT License (MIT)

 Copyright (c) 2014 Daniel Smith
+Copyright (c) 2018 Uber Technologies


Yea, could we move this to sharing.py? We should probably look at changing the copyright to the "opt_einsum developers" in the future. I need to look into this angle of things a bit more.

dgasmith · 2018-08-21T21:20:25Z

docs/source/sharing_intermediates.rst

+
+    >>> with shared_intermediates():
+    >>>     marginals = {output: contract('{}->{}'.format(inputs, output), *factors)
+    >>>                  for output in 'abcdef'}


Might be worth showing a quick timing comparison on bigger tensors with an explicit demonstration of the contractions done.

dgasmith · 2018-08-21T21:22:20Z

opt_einsum/tests/test_sharing.py

+    assert num_exprs_nosharing['einsum'] > num_exprs_sharing['einsum']
+
+
+def compute_cost(cache):


Can this be _compute_cost?

dgasmith · 2018-08-21T21:23:51Z

opt_einsum/sharing.py

+        cache['tensor', id(tensor)] = tensor
+
+
+def _memoize(key, fn, *args, **kwargs):


Can we add docstrings on the next 4 functions? Not a lot, but just something to indicate their use.

dgasmith · 2018-08-21T21:24:41Z

@jcmgray Can you give this a final review as well?

jcmgray

All looks good to me!

jcmgray · 2018-08-21T21:33:02Z

@fritzo I really don't know, as I've never used threads in Python. What do you think?

I thought about this a bit, since it's not completely unlikely that someone at some point might parallelize some numeric code which uses opt_einsum sharing in a thread pool or something. It might be inefficient but nothing catastrophic I think -> each thread will add its own cache to the list, but only use whichever is last (so a bit randomly jumbled). Additionally, while any one thread is in a shared context, opt_einsum in all threads will cache intermediaries.

Anyway I can't think of any remedy so really just noting it! I guess it's just a minor cost of the major convenience here of using module global state.

NB. if you supply the same cache to all threads, they will all add it, but will all use it, so that sidesteps the inefficiency to an extent.

fritzo · 2018-08-22T15:45:16Z

Ok, I've addressed all review comments.

@dgasmith and @jcmgray thank you for your detailed review!

dgasmith · 2018-08-22T16:51:48Z

Everything looks good to me. Thanks for the great PR! We will look at releasing a 2.2 soon to get these changes into production.

jcmgray · 2018-08-22T17:26:27Z

This is a really cool addition, thanks @fritzo!

fritzo · 2018-08-22T20:20:36Z

@dgasmith I'd recommend using squash-and-merge for future merges to avoid leaking non-functioning commits into the commit history. That makes it much easier to git bisect, since you can restrict to merge commits that passed CI tests. I certainly didn't intend some of the intermediate commits in this PR to end up in your commit history.

dgasmith · 2018-08-22T22:54:50Z

I usually do for larger projects, but opt_einsum is small enough that there are not a ton of commits coming in and we haven't declared a git model that we use.

fritzo added 2 commits August 19, 2018 00:45

WIP Attempt to fix sharing

dd6b34a

Cache tensor inputs to avoid id clash

560726e

fritzo commented Aug 20, 2018

View reviewed changes

fritzo mentioned this pull request Aug 20, 2018

Simplify einsum shared pyro-ppl/pyro#1324

Merged

jcmgray reviewed Aug 20, 2018

View reviewed changes

Address review comments

589950b

dgasmith reviewed Aug 20, 2018

View reviewed changes

fritzo added 4 commits August 20, 2018 18:10

Refactor to use function wrappers

5ff111f

Memoize conversions to backend

72b9ef8

Make tests more idiomatic

d208e3d

Fix failing test

f5b53fc

fritzo added 5 commits August 20, 2018 20:14

Add count_cached_ops method for profiling

472dbff

Add test of commutativity

4fb5ae5

Add test of reused cache

e771973

Support full equation syntax in sharing

632c266

Move decorators to contract.py

b3f2394

fritzo mentioned this pull request Aug 21, 2018

Move einsum sharing logic upstream pyro-ppl/pyro#1327

Merged

fritzo added 2 commits August 21, 2018 08:56

Add copyright line to LICENSE

f0bebe5

Add test for nested sharing

a6983c6

fritzo commented Aug 21, 2018

View reviewed changes

fritzo added 2 commits August 21, 2018 10:25

Add docs

e96a8e4

Move alpha_canonicalize to parser.py

911eca4

Add test of sharing with constants

86d4550

fritzo force-pushed the sharing branch from 6721290 to 86d4550 Compare August 21, 2018 17:47

Add bullet point to readme

0139c75

dgasmith approved these changes Aug 21, 2018

View reviewed changes

jcmgray approved these changes Aug 21, 2018

View reviewed changes

Address review comments

e43d68b

dgasmith merged commit 7a83c49 into dgasmith:master Aug 22, 2018

This was referenced Aug 24, 2018

Support more backends in shared_intermediates #50

Closed

Support multi-threading in shared_intermediates #51

Closed

Sharing modulo commutativity, associativity, transpose #52

Open

dgasmith added the enhancement label Aug 25, 2018

dgasmith added this to the v2.2 milestone Aug 25, 2018

		assert num_exprs_nosharing['einsum'] > num_exprs_sharing['einsum']


		def compute_cost(cache):

		cache['tensor', id(tensor)] = tensor


		def _memoize(key, fn, args, *kwargs):


		_SHARING_STACK = []
		_CURRENT_BACKEND = []

Implement shared_intermediates context manager #43

Implement shared_intermediates context manager #43

Conversation

fritzo commented Aug 20, 2018 • edited

Description

Todos

Questions

Status

Choose a reason for hiding this comment

fritzo commented Aug 20, 2018

jcmgray commented Aug 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgasmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcmgray Aug 20, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fritzo commented Aug 20, 2018

codecov-io commented Aug 21, 2018 • edited

Codecov Report

fritzo commented Aug 21, 2018 • edited

jcmgray commented Aug 21, 2018

dgasmith commented Aug 21, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fritzo commented Aug 21, 2018

fritzo commented Aug 21, 2018

dgasmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgasmith commented Aug 21, 2018

jcmgray left a comment

Choose a reason for hiding this comment

jcmgray commented Aug 21, 2018

fritzo commented Aug 22, 2018

dgasmith commented Aug 22, 2018

jcmgray commented Aug 22, 2018

fritzo commented Aug 22, 2018

dgasmith commented Aug 22, 2018

fritzo commented Aug 20, 2018 •

edited

jcmgray Aug 20, 2018 •

edited

codecov-io commented Aug 21, 2018 •

edited

fritzo commented Aug 21, 2018 •

edited

dgasmith commented Aug 21, 2018 •

edited