Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make API globals thread safe using atomics #222

Merged
merged 1 commit into from Nov 25, 2021
Merged

Make API globals thread safe using atomics #222

merged 1 commit into from Nov 25, 2021

Conversation

adamreichold
Copy link
Member

While the GIL is held when the API pointer is updated, this can still race with other threads checking the current value of the API pointer (without holding the GIL) and should therefore using atomics.

The loads and stores are performed using acquire-release semantics as we want to dereference the pointer and hence any stores to the referenced memory need to be visible to us.

The get function should also be unsafe as the offset it uses cannot be verified which might create an invalid pointer invoking undefined behaviour as per the contract of pointer::offset.

Finally, the initialization code is moved into a separate cold function to improve code locality for the fast path.

I suspect that even on strongly ordered architectures like x86-64 this might have some performance impact via inhibiting compiler optimizations but I also do not see how the current Cell-based implementation can actually be Sync?

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good spot - one further quick thought by me...

src/npyffi/array.rs Show resolved Hide resolved
src/npyffi/array.rs Outdated Show resolved Hide resolved
Python::with_gil(|py| {
let mut api = self.api.load(Ordering::Relaxed) as *const *const c_void;
if api.is_null() {
api = get_numpy_api(py, MOD_NAME, CAPSULE_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a potential gotcha, can get_numpy_api lead to temporary release of the GIL lock? That would potentially enable multiple threads to run this initialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As CPython's import implementation can be hooked, I think one cannot prevent this from happening in general. But I also think that multiple threads performing the initialization is only an issue of efficiency.

If a hook is releasing the GIL for whatever reason, it needs to be reacquired and all threads will only progress back here with the GIL held and at most store the same capsule pointer redundantly. (Doing the double-checking here on my part was only motivated by efficiency, i.e. we already have to take the lock so why not use this to avoid redundant initialization as we are already on the slow path.)

(If multiple threads importing the same module yields a different capsule and hence API pointer, I think all bets are off and we would need external synchronization like using std::sync::Once.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If multiple threads importing the same module yields a different capsule and hence API pointer, I think all bets are off and we would need external synchronization like using std::sync::Once.)

Well we could compare_exchange the pointer instead of storeing it and only update it if it still NULL and otherwise discard our just initialized value in favour of the "old" one returned by compare_exchange.

But having the global at all seems weird if we are expecting that the get_numpy_api returns different capsules when called from different threads or at different times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think most likely this code is fine as-is thanks to the global nature. In GILOnceCell I chose to drop any surplus values produced by other threads if a race occurred. This was kind of necessary because of the API contract of it being write-once.

While the GIL is held when the API pointer is updated, this can still race with
other threads checking the current value of the API pointer (without holding the
GIL) and should therefore using atomics.

The loads and stores are performed using acquire-release semantics as we want to
dereference the pointer and hence any stores to the referenced memory need to be
visible to us.

The get function should also be unsafe as the offset it uses cannot be verified
which might create an invalid pointer invoking undefined behaviour as per the
contract of pointer::offset.

Finally, the initialization code is moved into a separate cold function to
improve code locality for the fast path.
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, thanks!

@davidhewitt davidhewitt merged commit 6d6084f into PyO3:main Nov 25, 2021
@adamreichold adamreichold deleted the sync-api-globals branch November 25, 2021 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants