Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

races in ARM lockless data structure reads #2502

Open
derekbruening opened this issue Jul 5, 2017 · 4 comments
Open

races in ARM lockless data structure reads #2502

derekbruening opened this issue Jul 5, 2017 · 4 comments

Comments

@derekbruening
Copy link
Contributor

In DR we have some data structures we read without holding a lock, relying on the hardware write visibility. The design and code was created with x86 in mind and we did not do a thorough enough re-evaluation for ARM and AArch64. For ARM's memory model we need to add barriers in multiple places to ensure that writes are visible in other threads in the order we require.

Suspect data structures include:

  • The indirect branch lookup (IBL) table: we rely on writing the tag being seen by other threads before the write of the start pc.
  • flushtime_global's store in increment_global_flushtime()
  • likely more...the code needs an audit
derekbruening added a commit that referenced this issue Jul 13, 2018
Fixes two races with shared ibt tables:

+ Adding a new table entry must write the start_pc before the tag.
  This is accomplished with a new ENTRY_SET_TO_ENTRY hashtablex.h
  optional specifier.  For ARM #2502 a new MEMORY_STORE_BARRIER macro
  is added.

+ Resizing a table must not clear the tags in the old table to avoid
  losing the tag on the target_delete ibl path.

Adds a test api.ibl-stress which uses the DR IR to synthetically
construct thousands of basic blocks with indirect branches betweent
them.

To make the test work, relaxes several is-on-stack checks to support
pre-building basic blocks (#2463) from generated code or other
locations not known prior to starting the application.

Issue: #3098, #2502, #2463

Fixes #3098
derekbruening added a commit that referenced this issue Jul 13, 2018
Fixes two races with shared ibt tables:

+ Adding a new table entry must write the start_pc before the tag.
  This is accomplished with a new ENTRY_SET_TO_ENTRY hashtablex.h
  optional specifier.  For ARM #2502 a new MEMORY_STORE_BARRIER macro
  is added.

+ Resizing a table must not clear the tags in the old table to avoid
  losing the tag on the target_delete ibl path.

Adds a test api.ibl-stress which uses the DR IR to synthetically
construct thousands of basic blocks with indirect branches betweent
them.

To make the test work, relaxes several is-on-stack checks to support
pre-building basic blocks (#2463) from generated code or other
locations not known prior to starting the application.

Issue: #3098, #2502, #2463

Fixes #3098
derekbruening added a commit that referenced this issue Mar 23, 2020
Adds 32-bit and 64-bit load and store atomics to core DR and exports
them to the client API.

Adds missing barriers to the existing store atomic macros on both ARM
and AArch64.

Adds a sanity test that at least exercises the functions.  Adds the
test sequence to client.thread and client.tls (only client.tls is
currently enabled on AArchXX).

Uses dr_atomic_load* in drwrap in place of a previously overkill
dr_atomic_add*.

These will also help with #2502.

The next step is to add gencode versions of these, either to the core
or drx.

Issue: #4215, #2502
derekbruening added a commit that referenced this issue Mar 23, 2020
Adds 32-bit and 64-bit load and store atomics to core DR and exports
them to the client API.

Adds missing barriers to the existing store atomic macros on both ARM
and AArch64.

Adds a sanity test that at least exercises the functions.  Adds the
test sequence to client.thread and client.tls (only client.tls is
currently enabled on AArchXX).

Uses dr_atomic_load* in drwrap in place of a previously overkill
dr_atomic_add*.

These will also help with #2502.

The next step is to add gencode versions of these, either to the core
or drx.

Issue: #4215, #2502
derekbruening added a commit that referenced this issue Apr 21, 2020
For update_lookuptable_tls: Adds a store-release to the lookuptable
store and a load-acquire to the IBL generated code for both ARM and
AArch64, as there is no lighter-weight way to ensure the mask and
table stores are seen in the proper order.

For flushtime_global: made all readers and the increment use
store-release semantics.

For safely_nullify_tables: leaving as weak stores since timing and
ordering does not matter there.

Issue: #2502
derekbruening added a commit that referenced this issue Apr 21, 2020
For update_lookuptable_tls: Adds a store-release to the lookuptable
store and a load-acquire to the IBL generated code for both ARM and
AArch64, as there is no lighter-weight way to ensure the mask and
table stores are seen in the proper order.

For flushtime_global: made all readers and the increment use
store-release semantics.

For safely_nullify_tables: leaving as weak stores since timing and
ordering does not matter there.

Issue: #2502
@derekbruening
Copy link
Contributor Author

I'm adding a related action item here:

  • Review the "DMB ISH" barriers in arch_exports.h and downgrade them to stores-only or loads-only as necessary. I based these on g++ 9.2.1's std::atomic generated code and it uses "DMB ISH" for everything which apparently is overkill.

@algrant-arm
Copy link
Contributor

Probably best to leave anything that's generating explicit barriers as DMB ISH. On Armv8 (including all AArch64) DMB ISHLD should be sufficient as an acquire barrier, but this might not be true on pre-v8 systems. And on v8 it's best to use LDAR anyway.

@derekbruening
Copy link
Contributor Author

There are more variables that may need stronger stores and loads:

  • exiting_thread_count

@derekbruening
Copy link
Contributor Author

I caught and diagnosed a plain-DR hang here: #4928 (comment)
Pasting from there:

Looking through our lock routines I see a number of other
reads that should be load-acquires. Some are tricky to fix b/c
they're in utils.h which can't include arch_exports.h: we'll have
to split the headers or sthg.

derekbruening added a commit that referenced this issue Feb 17, 2022
Adds a missing load-acquire in a loop in ATOMIC_MAX that causes hangs
in RSTATS_ADD_PEAK.

Tested on the client.annotation-concurrency test on AArch64 where
without this fix it hangs under release-build plain DR every ~100
runs.  With the fix, there is no hang in 20K runs.

Issue: #2502, #4928
derekbruening added a commit that referenced this issue Feb 17, 2022
Adds a missing load-acquire in a loop in ATOMIC_MAX that causes hangs
in RSTATS_ADD_PEAK.

Tested on the client.annotation-concurrency test on AArch64 where
without this fix it hangs under release-build plain DR every ~100
runs.  With the fix, there is no hang in 20K runs for either plain
DR or drcachesim -offline.

Issue: #2502, #4928, #5366 
Fixes: #5366
derekbruening added a commit that referenced this issue Feb 17, 2022
Adds load-acquire semantics to a number of bare loads (i.e., not
inside a mutex) in DR's various lock routines.

This includes a key missing load-acquire in the spinlock loop in
d_r_mutex_lock_app() which might help explain some performance issues
such as #4279.

Since some of these are in the utils.h header, and we have ordering
dependencies between utils.h and arch_exports.h, this required
splitting the atomic defines out of arch/arch_exports.h into a new
file arch/atomic_exports.h.  Since those atomic defines use ASSERT
which references dynamo_options, this further required splitting the
option struct out of options.h into options_struct.h.

Issue: #2502, #4928, #4279
derekbruening added a commit that referenced this issue Feb 18, 2022
Adds load-acquire semantics to a number of bare loads (i.e., not
inside a mutex) in DR's various lock routines.

This includes a key missing load-acquire in the spinlock loop in
d_r_mutex_lock_app() which might help explain some performance issues
such as #4279.

Since some of these are in the utils.h header, and we have ordering
dependencies between utils.h and arch_exports.h, this required
splitting the atomic defines out of arch/arch_exports.h into a new
file arch/atomic_exports.h.  Since those atomic defines use ASSERT
which references dynamo_options, this further required splitting the
option struct out of options.h into options_struct.h.

Issue: #2502, #4928, #4279
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants