Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/7.0] [Mono] Race in init_method when using LLVM AOT. #93006

Merged
merged 4 commits into from
Oct 6, 2023

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Oct 4, 2023

Backport of #75088 to release/7.0

Fixes #81211

/cc @lambdageek @lateralusX

Customer Impact

Customers targeting Apple platforms using LLVM AOT codegen (the default) in highly concurrent settings (such as firing off multiple simultaneous async HTTP requests) may experience unexpected behavior such as InvalidCastExceptions, NullReferenceExceptions or crashes.

Testing

Manual testing

Risk

Low. This code has been running on .NET 8 main for over a year in CI, as well as on some other non-mobile platforms

vseanreesermsft and others added 3 commits October 3, 2023 18:35
When using LLVM AOT codegen, init_method updates two GOT slots.
These slots are initialized as part of init_method,
but there is a race between initialization of the two slots. Current
implementation can have two threads running init_method for the same
method, but as soon as:

[got_slots [pindex]] = addr

store is visible, it will trigger other threads to return back from
init_method, but since that could happen before the corresponding
LLVM AOT const slot is set, second thread will return to method
calling init_method, load the LLVM aot const, and crash when
trying to use it (since its still NULL).

This crash is very rare but have been identified on x86/x64 CPU's,
when one thread is either preempted between updating regular GOT slot
and LLVM GOT slot or store into LLVM GOT slot gets delayed in
store buffer. I have also been able to emulate the scenario in debugger,
triggering the issue and crashing in the method loading from LLVM aot
const slot.

Fix change order of updates and make sure the update of LLVM aot const
slot happens before memory_barrier, since:

got [got_slots [pindex]] = addr;

have release semantics in relation to addr and update of LLVM aot const
slot. Fix also add acquire/release semantics for ji->type in init_method
since it is used to guard if a thread ignores a patch or not and it
should not be re-ordered with previous stores, since it can cause
similar race conditions with updated slots.
@lambdageek lambdageek added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Oct 4, 2023
@lambdageek
Copy link
Member

Approved by tactics in email

@lambdageek lambdageek changed the base branch from release/7.0 to release/7.0-staging October 5, 2023 13:56
@lambdageek
Copy link
Member

Ah crud, I forgot that the backports should be aiming at the release/7.0-staging branch. Updated the PR base.

Copy link
Member

@lateralusX lateralusX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lambdageek
Copy link
Member

Failures don't seem related

@lambdageek lambdageek merged commit 7ee35b9 into release/7.0-staging Oct 6, 2023
124 of 129 checks passed
@lewing lewing deleted the backport/pr-75088-to-release/7.0 branch October 8, 2023 01:26
@ghost ghost locked as resolved and limited conversation to collaborators Nov 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Codegen-AOT-mono Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants