Use `broadcast_to` instead of `broadcast_like` #288

eigenfoo · 2021-01-30T18:12:44Z

Closes #159. WIP.

Two questions:

Changing broadcast_like to broadcast_to in aesara/tensor/math_opt.py gives me a lot of BadOptimization and AssertionErrors, which I'm not knowledgeable enough to debug - can someone help me understand what's happening?
~~I've also changed broadcast_like to broadcast_to in aesara/tensor/basic_opt.py - is this something we want to do?~~

eigenfoo · 2021-01-30T18:23:43Z

Hmm - more tests are failing than I anticipated. Running pytest tests/tensor/test_math_opt.py gives me the 9 errors I was originally referring to: https://gist.github.com/eigenfoo/8c2d5e7867fc23b82c98553c5300b054

The failures don't actually call broadcast_to, but most of them involve toposort or optimize using local_pow_canonicalize, which does call broadcast_to. Does that mean broadcast_to affected the graph optimizations, which is what these failed tests are testing for? Apologies for rambling, I'm trying to understand the codebase better.

tests/tensor/test_basic.test_alloc_constant_folding (the test that failed the current test run) also calls toposort.

brandonwillard · 2021-01-30T19:04:33Z

The failures don't actually call broadcast_to, but most of them involve toposort or optimize using local_pow_canonicalize, which does call broadcast_to. Does that mean broadcast_to affected the graph optimizations, which is what these failed tests are testing for?

It looks like they're failing because those unit tests make unnecessarily strong assumptions, aren't isolated/self-contained enough, don't assert the relevant changes directly, etc. In other words, you're seeing the long standing deficiencies of our test suite.

If you could fix those issues as you address this PR's issue, you would be doing significantly more for this project than the change requested in the issue itself.

brandonwillard · 2021-01-30T19:22:12Z

Aside from strong test assumptions, there are some optimization errors that are probably worth addressing first; those may actually involve bugs.

For instance, the first optimization error is implying that the the type generated by the newly introduced BroadcastTo Op doesn't match the type of the graph it's replacing in the optimization.

eigenfoo · 2021-01-30T22:09:45Z

@brandonwillard the latest commit pymc-devs/pytensor@a00ee11 gets rid of all BadOptimization errors, as far as I can tell - all that remains are AssertionErrors.

However, I suspect I'm doing something ugly by adding the dtype parameter like this - can you take a look at the new parameter? If it all looks good, I'll move on to fixing the tests: I suspect I'll need a lot of help along the way, though 😬

brandonwillard

Can't we simply cast the result of broadcast_to to the desired dtype (e.g. using TensorVariable.astype)?

Adding a dtype argument to the Op doesn't add much in the way of functionality, and it causes the Op to deviate from the underlying numpy.broadcast_to that it models/wraps.

eigenfoo · 2021-01-30T22:27:08Z

Sure, that actually makes it easier. I assumed we wanted to mimic the broadcast_like signature, which took a dtype parameter. Sorry if that was a bad assumption.

The previous two commits revert the dtype commit, and use TensorVariable.astype to cast dtypes.

Also, do we want to remove the broadcast_like function now? The only reference is itself.

$ rg broadcast_like
aesara/tensor/basic_opt.py
170:def broadcast_like(value, template, fgraph, dtype=None):
182:            "broadcast_like currently requires the "

eigenfoo · 2021-02-07T17:26:10Z

@brandonwillard I think I've fixed all tests but two. I'm confused about the remaining failures - I've attached the output of pytest test_math_opt.py here: https://gist.github.com/eigenfoo/ddb7d2aad30550c0efbf64a1841881f9

test_stacktrace seems to be a real failure that I don't understand - what exactly is the "trace" that we're testing for? https://github.com/pymc-devs/aesara/blob/0344f1a844731eea980ada631fe180e09f397a51/aesara/graph/opt.py#L3188-L3189
What is test_local_reduce_join testing for? I can tell that it's only these two lines that are failing the test, but don't know if its safe to remove them, or if this is a real failure... https://github.com/pymc-devs/aesara/blob/0344f1a844731eea980ada631fe180e09f397a51/tests/tensor/test_math_opt.py#L3472-L3473

brandonwillard · 2021-02-07T20:00:50Z

test_stacktrace seems to be a real failure that I don't understand - what exactly is the "trace" that we're testing for?

That's probably referring to the stack traces carried by each variable in their tag. Those traces show where each variable was created, and, during optimizations, they're carried over to newly created replacement variables. Their only real purpose is for user-level debugging.

brandonwillard · 2021-02-07T20:09:57Z

2. What is test_local_reduce_join testing for? I can tell that it's only these two lines that are failing the test, but don't know if its safe to remove them, or if this is a real failure...

Those looks like brittle test conditions. Instead of coming up with a direct check for something specific and relevant in the transformed graph, many tests will simply assert the number of nodes in the expected output.

In almost all cases, these kinds of asserts are simply bad, so don't take them too seriously. If you changed an optimization, an Op, and/or one of an Op's helper functions, and any of those things are used in one of these tests, then a change in the number of nodes could be easily justified.

brandonwillard · 2021-02-12T01:37:19Z

I'll try running these tests locally within the next few days and see if I can spot any genuine issues (i.e. ones that aren't due to brittle/overly-restrictive tests).

eigenfoo · 2021-02-12T04:52:55Z

Thanks for helping @brandonwillard! Sorry for being so slow-moving on this PR - something has popped up in the real world (and not a lack of interest in finishing this work).

I've loosened the test conditions on test_local_reduce_join so that it passes now, and I've removed test_stacktrace entirely. I'm unsure why test_stacktrace fails whereas a similar test, test_local_sum_prod_mul_by_scalar_stack_trace, succeeds. I'd be happy to put the test back if I could get a pointer on what's gone wrong.

I'll wait for the test suite to finish, but I'd also appreciate a quick triage of whether there are other real test failures - otherwise I'll assume that everything that fails is a flaky test.

brandonwillard · 2021-02-12T20:34:25Z

Sorry for being so slow-moving on this PR

I only commented so that you knew I hadn't forgotten about this PR; there's absolutely no rush, though!

eigenfoo · 2021-02-14T02:44:18Z

@brandonwillard test_basic/test_stack_hessian is failing - this seems like a real failure, but I'm unsure what's going wrong. Could you give some pointers?

https://github.com/pymc-devs/aesara/pull/288/checks?check_run_id=1884844321

brandonwillard · 2021-02-17T23:35:48Z

test_basic/test_stack_hessian is failing - this seems like a real failure

Yes, that does look like a real issue. My first impression was that it had to do with in-place operations.

The new broadcast_to uses NumPy's underlying view-based broadcasting via the BroadcastTo Op, and its view_map is supposed to indicate the view relationship between the Op's first input, a, and its output. Given that the first value is 2, as expected, and all the remaining values are effectively 0, it seemed like an optimization might've been ignoring this view information and "in-placing" things that it shouldn't.

Unfortunately, the evaluated graph doesn't contain any BroadcastTo Ops, so, if the problem is related to BroadcastTo-generated views, it's manifesting in an indirect way.

Also, with aesara.config.compute_test_value turned on, the test values for the Ha and Hb graphs are incorrect, and setting aesara.config.optimizer_verbose = True shows some validation failures during optimization.

brandonwillard · 2021-02-17T23:51:35Z

OK, the issue does appear to involve bad in-placing. Those FunctionGraph validation failures are specifically caused by aesara.tensor.basic_opt.local_inplace_setsubtensor producing the following exception:

InconsistencyError('Attempting to destroy indestructible variables: [TensorConstant{(1,) of 0.0}]')

Actually, those aren't the exact problem. We probably need to find out how the BroadcastTo Op are being replaced.

brandonwillard · 2021-02-18T00:57:27Z

All right, I believe that the source of this issue is the change from broadcast_like to broadcast_to in aesara.tensor.basic_opt.local_fill_to_alloc. That's the rewrite that introduces the BroadcastTo Op in the first place.

In this case, a fill really shouldn't be replaced with a broadcasted view of the fill value; however, we might want a rewrite that turns unmodified (in-place) allocs and/or fills into BroadcastTos.

brandonwillard · 2021-02-18T03:01:12Z

Now that I think about it, we could probably use a simple optimization that removes useless BroadcastTos.

brandonwillard · 2021-02-18T03:02:51Z

Looks like there are a few more brittle tests (e.g. TestCrossEntropyCategorical1Hot.test_get_rid_of_advanced_indexing_version_of_xent and TestMinMax.test_optimization_max do the same len(topo) == x nonsense).

brandonwillard · 2021-02-18T03:13:08Z

Also, some of these optimizations might be having trouble because BroadcastTo isn't being lifted.

In the case of TestMinMax.test_optimization_min, it looks like the optimizations being tested aren't applied because they're looking for neg(max(neg(...))), but now the graph is neg(max(broadcast_to(neg(...)))). If we lift the BroadcastTo Ops, it should fix that.

ricardoV94 · 2022-05-11T07:41:58Z

Just rebased this PR

A lot of these tests test for strict equality of the toposorted graph. We should only test that the toposorted graph _contains_ expected nodes/ops.

This change allows `get_scalar_constant_value` to "dig through" `BroadcastTo` `Op`s.

This change removes an assertion on `len(topo)`, and by loosens the strict node type requirement of the last node in `topo`.

eigenfoo force-pushed the broadcast_to branch from 6262bd0 to a00ee11 Compare January 30, 2021 22:04

brandonwillard requested changes Jan 30, 2021

View reviewed changes

eigenfoo force-pushed the broadcast_to branch from 8953168 to 24d2dd1 Compare January 30, 2021 22:24

brandonwillard force-pushed the broadcast_to branch from 450e24a to 725c933 Compare February 18, 2021 02:51

brandonwillard changed the title ~~Use broadcast_to intead of broadcast_like~~ Use broadcast_to instead of broadcast_like Aug 1, 2021

Add __str__ and __repr__ to BroadcastTo

7be1a75

ricardoV94 force-pushed the broadcast_to branch 2 times, most recently from 3307a5d to 4211281 Compare May 11, 2022 07:41

Deprecate broadcast_like in favor of broadcast_to

c4c4d0e

ricardoV94 force-pushed the broadcast_to branch from 4211281 to 26fb84c Compare May 11, 2022 07:46

Adjust tests.tensor.test_math_opt.test_log1p

18c9b2c

eigenfoo added 7 commits May 11, 2022 10:02

Adjust tests.tensor.test_math_opt.test_local_mul_specialize

988f976

A lot of these tests test for strict equality of the toposorted graph. We should only test that the toposorted graph _contains_ expected nodes/ops.

Adjust tests.tensor.test_math_opt.test_local_pow_specialize

3ff5f21

Allow BroadcastTo in aesara.tensor.math_opt.local_opt_alloc

4373ffb

Adjust TestLocalSumProd.test_local_sum_prod_alloc

a8bd45a

Adjust the test criteria for conj(neg(x)) in TestFuncInverse

08d381a

Add support for BroadcastTo in get_scalar_constant_value

d4d7f95

This change allows `get_scalar_constant_value` to "dig through" `BroadcastTo` `Op`s.

Adjust tests.tensor.test_math_opt.TestLocalReduce.test_local_reduce_join

2b652e1

This change removes an assertion on `len(topo)`, and by loosens the strict node type requirement of the last node in `topo`.

ricardoV94 force-pushed the broadcast_to branch from 26fb84c to 2b652e1 Compare May 11, 2022 08:02

rlouf closed this Oct 14, 2022

rlouf deleted the broadcast_to branch October 14, 2022 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `broadcast_to` instead of `broadcast_like` #288

Use `broadcast_to` instead of `broadcast_like` #288

eigenfoo commented Jan 30, 2021 •

edited

eigenfoo commented Jan 30, 2021 •

edited

brandonwillard commented Jan 30, 2021 •

edited

brandonwillard commented Jan 30, 2021

eigenfoo commented Jan 30, 2021 •

edited

brandonwillard left a comment •

edited

eigenfoo commented Jan 30, 2021 •

edited

eigenfoo commented Feb 7, 2021 •

edited

brandonwillard commented Feb 7, 2021

brandonwillard commented Feb 7, 2021

brandonwillard commented Feb 12, 2021

eigenfoo commented Feb 12, 2021

brandonwillard commented Feb 12, 2021

eigenfoo commented Feb 14, 2021

brandonwillard commented Feb 17, 2021 •

edited

brandonwillard commented Feb 17, 2021 •

edited

brandonwillard commented Feb 18, 2021

brandonwillard commented Feb 18, 2021

brandonwillard commented Feb 18, 2021 •

edited

brandonwillard commented Feb 18, 2021

ricardoV94 commented May 11, 2022

Use broadcast_to instead of broadcast_like #288

Use broadcast_to instead of broadcast_like #288

Conversation

eigenfoo commented Jan 30, 2021 • edited

eigenfoo commented Jan 30, 2021 • edited

brandonwillard commented Jan 30, 2021 • edited

brandonwillard commented Jan 30, 2021

eigenfoo commented Jan 30, 2021 • edited

brandonwillard left a comment • edited

Choose a reason for hiding this comment

eigenfoo commented Jan 30, 2021 • edited

eigenfoo commented Feb 7, 2021 • edited

brandonwillard commented Feb 7, 2021

brandonwillard commented Feb 7, 2021

brandonwillard commented Feb 12, 2021

eigenfoo commented Feb 12, 2021

brandonwillard commented Feb 12, 2021

eigenfoo commented Feb 14, 2021

brandonwillard commented Feb 17, 2021 • edited

brandonwillard commented Feb 17, 2021 • edited

brandonwillard commented Feb 18, 2021

brandonwillard commented Feb 18, 2021

brandonwillard commented Feb 18, 2021 • edited

brandonwillard commented Feb 18, 2021

ricardoV94 commented May 11, 2022

Use `broadcast_to` instead of `broadcast_like` #288

Use `broadcast_to` instead of `broadcast_like` #288

eigenfoo commented Jan 30, 2021 •

edited

eigenfoo commented Jan 30, 2021 •

edited

brandonwillard commented Jan 30, 2021 •

edited

eigenfoo commented Jan 30, 2021 •

edited

brandonwillard left a comment •

edited

eigenfoo commented Jan 30, 2021 •

edited

eigenfoo commented Feb 7, 2021 •

edited

brandonwillard commented Feb 17, 2021 •

edited

brandonwillard commented Feb 17, 2021 •

edited

brandonwillard commented Feb 18, 2021 •

edited