Fuzz fix for MemoryPacking on trampled data #3222

kripken · 2020-10-10T21:43:16Z

I believe originally wasm did not allow overlapping segments, that is, where
one memory segment tramples the data from a previous one. But then the
spec changed its mind and we allowed it. Binaryen seems to have assumed
the original case, and not checked for trampling.

If there is a chance of trampling, we cannot optimize out zeros - the zero
may have an effect if it tramples data from a previous segment. This does
not occur in practice in LLVM output, which is why this wasn't a problem
so far, I think.

An existing testcase hit this issue, so I split it up.

TerrorJack · 2020-10-12T08:28:38Z

I just verified this fixes #3190 as well.

kripken · 2020-10-12T14:04:52Z

Great, thanks for checking @TerrorJack !

tlively

I guess the part of this that optimizes passive segments makes a similar assumption that they won't overlap with each other at runtime. We've also talked before about how this whole pass assumes that imported memories are not already scribbled on. What if we turn this pass off by default, document the assumptions it makes, then either rename it unsafe-memory-packing or add a new flag enabling unsafe memory optimizations (which we would enable in Emscripten).

Actually, now that I think about it more, I'm surprised this hasn't caused problems for anyone's pthread builds, since this optimization will remove the initialization for any zero-initialized TLS variables, which are placed into memory allocated by malloc that might contain junk data. Yikes! One "fix" we could make would be to only optimize passive segments that are dropped, since those can reasonably be assumed to take part only in one-time initialization when we assume the memory is zeroed. In contrast, the TLS passive segment is not dropped because it needs to be used arbitrarily many times.

tlively · 2020-10-12T14:29:33Z

src/passes/MemoryPacking.cpp

+  // able to optimize, but must still check for the trampling problem mentioned
+  // earlier.
+  // TODO: optimize in the trampling case
+  std::unordered_set<Address> writtenTo;


Using a hash set of individual bytes sounds like a lot of work and memory for modules with significant amounts of data. Would it be worth storing ranges and doing a binary search on them to make this O(n log n) in the number of segments rather than O(n) in the number of bytes?

That might be more efficient, yeah. More complex though. I can do some testing to see how big the overhead is first.

I'm not sure how to do a binary search on them, as they are spans? Instead I wrote a binary space partitioning approach that should handle this in logarithmic time in the number of segments, so should be no risk.

But maybe there's a better way?

cc @aardappel who has a lot of experience with BSPs (with a few more dimensions to them...)

Since we bail out whenever we find an overlap, we have an invariant that our set of previously checked segments has no overlaps. For each new segment, insert it into a list of segments sorted by start address and verify that it doesn't overlap with its predecessor or successor. If it doesn't, we know it can't possibly overlap with any other segments, either, because doing so would require those segments to overlap with either the predecessor or successor.

This would be a great interview question....

Interesting... Yes, that would work nicely.

Thinking about the TODO for actually optimizing the overlapping case (which I think we should do, but I don't want to do it in this PR), I think your idea can work there too - new segments would erase or split old ones. That does make it more complicated, but probably still less complicated than the BSP.

Pushed an update with your approach @tlively

Yes, the 3D BSPs also work by splitting data to keep the tree non-overlapping, so I bet that makes the algorithm a ton simpler/faster here too :P

kripken · 2020-10-12T15:38:02Z

It does sound like we need a flag for unsafe memory optimizations here, good point. That seems necessary for the pthreads case.

Another option is to just not optimize when the memory is imported, which would handle the scribbling issue, as usually the memory won't be imported for efficiency anyhow (but that won't help the pthreads case).

tlively

Nice! This looks good to me, but it would be good to add TODOs about the follow up work we plan to and need to do.

tlively · 2020-10-15T00:25:53Z

src/support/space.h

+    if (iter != spans.begin()) {
+      auto before = iter;
+      before--;
+      if (before != spans.end() && before->checkOverlap(span)) {
+        return true;


Could this be simplified to somthing like this? In particular I don't think you need to check before != spans.end().

Suggested change

if (iter != spans.begin()) {

auto before = iter;

before--;

if (before != spans.end() && before->checkOverlap(span)) {

return true;

if (iter != spans.begin() && std::prev(iter)->checkOverlao(span)) {

return true;

Good point, that is simpler, updated.

kripken added 2 commits October 10, 2020 14:38

Fuzz for for MemoryPacking on trampled data

0cfa746

fix

f20d5fe

kripken requested a review from tlively October 10, 2020 21:43

tlively reviewed Oct 12, 2020

View reviewed changes

tlively mentioned this pull request Oct 12, 2020

Fuzz bug in MemoryPacking #3225

Closed

kripken added 12 commits October 13, 2020 15:57

Merge remote-tracking branch 'origin/master' into fuzz5

54618b4

work

0d7eab9

more

398d27c

more

369be76

more

d26d1d1

fix

e6d2c42

fix

39b1a10

fix

8db2752

more

e9ba423

fix

14b9dcc

Merge remote-tracking branch 'origin/master' into fuzz5

c7e83cd

rewrite

ab66fc1

tlively approved these changes Oct 15, 2020

View reviewed changes

simpler

3ee3e50

tlively approved these changes Oct 15, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into fuzz5

60d1c81

kripken merged commit c2e6cb0 into master Oct 15, 2020

kripken deleted the fuzz5 branch October 15, 2020 17:03

kripken mentioned this pull request Oct 15, 2020

Optimize trampled data in MemoryPacking #3244

Open

aheejin mentioned this pull request Oct 23, 2020

Fuzzer: Add an option to fuzz with initial wasm contents #3276

Merged

Fuzz fix for MemoryPacking on trampled data #3222

Fuzz fix for MemoryPacking on trampled data #3222

Uh oh!

Conversation

kripken commented Oct 10, 2020

Uh oh!

TerrorJack commented Oct 12, 2020

Uh oh!

kripken commented Oct 12, 2020

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken commented Oct 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlively left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tlively Oct 14, 2020 •

edited

Loading

kripken commented Oct 12, 2020 •

edited

Loading