Skip to content

Conversation

@kripken
Copy link
Member

@kripken kripken commented Oct 10, 2020

I believe originally wasm did not allow overlapping segments, that is, where
one memory segment tramples the data from a previous one. But then the
spec changed its mind and we allowed it. Binaryen seems to have assumed
the original case, and not checked for trampling.

If there is a chance of trampling, we cannot optimize out zeros - the zero
may have an effect if it tramples data from a previous segment. This does
not occur in practice in LLVM output, which is why this wasn't a problem
so far, I think.

An existing testcase hit this issue, so I split it up.

@kripken kripken requested a review from tlively October 10, 2020 21:43
@TerrorJack
Copy link
Contributor

I just verified this fixes #3190 as well.

@kripken
Copy link
Member Author

kripken commented Oct 12, 2020

Great, thanks for checking @TerrorJack !

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the part of this that optimizes passive segments makes a similar assumption that they won't overlap with each other at runtime. We've also talked before about how this whole pass assumes that imported memories are not already scribbled on. What if we turn this pass off by default, document the assumptions it makes, then either rename it unsafe-memory-packing or add a new flag enabling unsafe memory optimizations (which we would enable in Emscripten).

Actually, now that I think about it more, I'm surprised this hasn't caused problems for anyone's pthread builds, since this optimization will remove the initialization for any zero-initialized TLS variables, which are placed into memory allocated by malloc that might contain junk data. Yikes! One "fix" we could make would be to only optimize passive segments that are dropped, since those can reasonably be assumed to take part only in one-time initialization when we assume the memory is zeroed. In contrast, the TLS passive segment is not dropped because it needs to be used arbitrarily many times.

// able to optimize, but must still check for the trampling problem mentioned
// earlier.
// TODO: optimize in the trampling case
std::unordered_set<Address> writtenTo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a hash set of individual bytes sounds like a lot of work and memory for modules with significant amounts of data. Would it be worth storing ranges and doing a binary search on them to make this O(n log n) in the number of segments rather than O(n) in the number of bytes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be more efficient, yeah. More complex though. I can do some testing to see how big the overhead is first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to do a binary search on them, as they are spans? Instead I wrote a binary space partitioning approach that should handle this in logarithmic time in the number of segments, so should be no risk.

But maybe there's a better way?

cc @aardappel who has a lot of experience with BSPs (with a few more dimensions to them...)

Copy link
Member

@tlively tlively Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we bail out whenever we find an overlap, we have an invariant that our set of previously checked segments has no overlaps. For each new segment, insert it into a list of segments sorted by start address and verify that it doesn't overlap with its predecessor or successor. If it doesn't, we know it can't possibly overlap with any other segments, either, because doing so would require those segments to overlap with either the predecessor or successor.

This would be a great interview question....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... Yes, that would work nicely.

Thinking about the TODO for actually optimizing the overlapping case (which I think we should do, but I don't want to do it in this PR), I think your idea can work there too - new segments would erase or split old ones. That does make it more complicated, but probably still less complicated than the BSP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed an update with your approach @tlively

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the 3D BSPs also work by splitting data to keep the tree non-overlapping, so I bet that makes the algorithm a ton simpler/faster here too :P

@kripken
Copy link
Member Author

kripken commented Oct 12, 2020

It does sound like we need a flag for unsafe memory optimizations here, good point. That seems necessary for the pthreads case.

Another option is to just not optimize when the memory is imported, which would handle the scribbling issue, as usually the memory won't be imported for efficiency anyhow (but that won't help the pthreads case).

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks good to me, but it would be good to add TODOs about the follow up work we plan to and need to do.

Comment on lines 57 to 61
if (iter != spans.begin()) {
auto before = iter;
before--;
if (before != spans.end() && before->checkOverlap(span)) {
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified to somthing like this? In particular I don't think you need to check before != spans.end().

Suggested change
if (iter != spans.begin()) {
auto before = iter;
before--;
if (before != spans.end() && before->checkOverlap(span)) {
return true;
if (iter != spans.begin() && std::prev(iter)->checkOverlao(span)) {
return true;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that is simpler, updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants