-
Notifications
You must be signed in to change notification settings - Fork 828
Fuzz fix for MemoryPacking on trampled data #3222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I just verified this fixes #3190 as well. |
|
Great, thanks for checking @TerrorJack ! |
tlively
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the part of this that optimizes passive segments makes a similar assumption that they won't overlap with each other at runtime. We've also talked before about how this whole pass assumes that imported memories are not already scribbled on. What if we turn this pass off by default, document the assumptions it makes, then either rename it unsafe-memory-packing or add a new flag enabling unsafe memory optimizations (which we would enable in Emscripten).
Actually, now that I think about it more, I'm surprised this hasn't caused problems for anyone's pthread builds, since this optimization will remove the initialization for any zero-initialized TLS variables, which are placed into memory allocated by malloc that might contain junk data. Yikes! One "fix" we could make would be to only optimize passive segments that are dropped, since those can reasonably be assumed to take part only in one-time initialization when we assume the memory is zeroed. In contrast, the TLS passive segment is not dropped because it needs to be used arbitrarily many times.
src/passes/MemoryPacking.cpp
Outdated
| // able to optimize, but must still check for the trampling problem mentioned | ||
| // earlier. | ||
| // TODO: optimize in the trampling case | ||
| std::unordered_set<Address> writtenTo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a hash set of individual bytes sounds like a lot of work and memory for modules with significant amounts of data. Would it be worth storing ranges and doing a binary search on them to make this O(n log n) in the number of segments rather than O(n) in the number of bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be more efficient, yeah. More complex though. I can do some testing to see how big the overhead is first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to do a binary search on them, as they are spans? Instead I wrote a binary space partitioning approach that should handle this in logarithmic time in the number of segments, so should be no risk.
But maybe there's a better way?
cc @aardappel who has a lot of experience with BSPs (with a few more dimensions to them...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we bail out whenever we find an overlap, we have an invariant that our set of previously checked segments has no overlaps. For each new segment, insert it into a list of segments sorted by start address and verify that it doesn't overlap with its predecessor or successor. If it doesn't, we know it can't possibly overlap with any other segments, either, because doing so would require those segments to overlap with either the predecessor or successor.
This would be a great interview question....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting... Yes, that would work nicely.
Thinking about the TODO for actually optimizing the overlapping case (which I think we should do, but I don't want to do it in this PR), I think your idea can work there too - new segments would erase or split old ones. That does make it more complicated, but probably still less complicated than the BSP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed an update with your approach @tlively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the 3D BSPs also work by splitting data to keep the tree non-overlapping, so I bet that makes the algorithm a ton simpler/faster here too :P
|
It does sound like we need a flag for unsafe memory optimizations here, good point. That seems necessary for the pthreads case. Another option is to just not optimize when the memory is imported, which would handle the scribbling issue, as usually the memory won't be imported for efficiency anyhow (but that won't help the pthreads case). |
tlively
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This looks good to me, but it would be good to add TODOs about the follow up work we plan to and need to do.
src/support/space.h
Outdated
| if (iter != spans.begin()) { | ||
| auto before = iter; | ||
| before--; | ||
| if (before != spans.end() && before->checkOverlap(span)) { | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified to somthing like this? In particular I don't think you need to check before != spans.end().
| if (iter != spans.begin()) { | |
| auto before = iter; | |
| before--; | |
| if (before != spans.end() && before->checkOverlap(span)) { | |
| return true; | |
| if (iter != spans.begin() && std::prev(iter)->checkOverlao(span)) { | |
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that is simpler, updated.
I believe originally wasm did not allow overlapping segments, that is, where
one memory segment tramples the data from a previous one. But then the
spec changed its mind and we allowed it. Binaryen seems to have assumed
the original case, and not checked for trampling.
If there is a chance of trampling, we cannot optimize out zeros - the zero
may have an effect if it tramples data from a previous segment. This does
not occur in practice in LLVM output, which is why this wasn't a problem
so far, I think.
An existing testcase hit this issue, so I split it up.