Skip to content

Conversation

@Natsoulas
Copy link
Contributor

Description

The changes implement thread safety for the SPICE interface to address GitHub issue #220, which reported deadlocks and data corruption when running parallel Monte Carlo simulations.

This solution uses a mutex and reference counting approach to protect SPICE kernel loading and unloading operations (inspired by Juan's suggestion comment on the issue):

  1. Added a static mutex kernel_manipulation_mutex in the SpiceInterface class
  2. Added a static reference counter kernel_reference_counter to track kernel usage
  3. Modified loadSpiceKernel() to only load kernels once and use reference counting
  4. Modified unloadSpiceKernel() to only unload kernels when no longer needed
  5. Protected clearKeeper() with the mutex
  6. Updated the destructor to properly unload kernels

This ensures that when multiple simulations run in parallel, they won't encounter race conditions or deadlocks when accessing SPICE kernels.

Verification

The changes were verified with a dedicated thread safety unit test:

  • Added test_spiceThreadSafety.py unit test in the SPICE interface's _UnitTest directory
  • The test creates multiple SpiceInterface instances in parallel and forces them to simultaneously load/unload kernels
  • It verifies no deadlocks occur and no kernel files are corrupted
  • No existing functionality was broken - all other SPICE-related tests continue to pass

The test specifically reproduces the conditions described in issue #220.

Documentation

Added code comments explaining the thread safety mechanism in:

  • spiceInterface.h for the static mutex and reference counter
  • spiceInterface.cpp for the mutex implementation and kernel reference counting logic

No existing documentation was invalidated.

Future work

  1. Monitor for the release of NAIF's thread-safe SPICE implementation, which may provide a better solution in the future (Juan's advice from a comment in the issue 220 page).

@Natsoulas Natsoulas requested a review from schaubh May 7, 2025 05:36
@Natsoulas Natsoulas self-assigned this May 7, 2025
@Natsoulas Natsoulas requested a review from a team as a code owner May 7, 2025 05:36
@Natsoulas Natsoulas added bug Something isn't working enhancement New feature or request labels May 7, 2025
@Natsoulas Natsoulas linked an issue May 7, 2025 that may be closed by this pull request
@Natsoulas Natsoulas moved this to 🏗 In progress in Basilisk May 7, 2025
@Natsoulas Natsoulas force-pushed the bug/220-spice-makes-bsk-sims-not-thread-safe branch from 4a1af4d to aeed048 Compare May 7, 2025 06:10
@Natsoulas Natsoulas force-pushed the bug/220-spice-makes-bsk-sims-not-thread-safe branch 3 times, most recently from 8f6bf30 to a9ef12b Compare May 10, 2025 18:18
@Natsoulas Natsoulas moved this from 🏗 In progress to 👀 In review in Basilisk May 10, 2025
Copy link
Contributor

@schaubh schaubh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! All tests passed on my system. Just some minor comments to address

@schaubh schaubh requested a review from juan-g-bonilla May 13, 2025 17:12
@schaubh
Copy link
Contributor

schaubh commented May 13, 2025

Nice work, looks like we are there. @juan-g-bonilla , as you commented on the original issue with the suggestion to use a mutex, would you have time to take a look at this code?

Copy link
Contributor

@juan-g-bonilla juan-g-bonilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done @Natsoulas ! Many of my comments are actually encouraging you to fix existing issues or even the original fix I proposed so we can get this class to its highest quality possible.

Thanks for the tag @schaubh .

@Natsoulas Natsoulas force-pushed the bug/220-spice-makes-bsk-sims-not-thread-safe branch from a9ef12b to 6a063cf Compare May 20, 2025 05:37
@Natsoulas
Copy link
Contributor Author

Natsoulas commented May 20, 2025

@juan-g-bonilla
My latest implementation doesn't look exactly like ur suggestions, but I was able to use a majority of your suggested quality improvements like using unordered map and string concatenation etc. Thank you!

@Natsoulas Natsoulas force-pushed the bug/220-spice-makes-bsk-sims-not-thread-safe branch 2 times, most recently from 13283c0 to 70d1575 Compare May 20, 2025 06:33
Copy link
Contributor

@schaubh schaubh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments to address.

Copy link
Contributor

@juan-g-bonilla juan-g-bonilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor suggestions + a warning about an edge cases when unloading happens before loading. Would be great to address those, but otherwise if you address Dr Shaub's comments LGTM.

@Natsoulas Natsoulas force-pushed the bug/220-spice-makes-bsk-sims-not-thread-safe branch from 70d1575 to ae2c037 Compare May 22, 2025 22:45
@Natsoulas Natsoulas requested a review from schaubh May 23, 2025 00:03
Copy link
Contributor

@schaubh schaubh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to go. Thanks for addressing this long-standing issue.

@Natsoulas Natsoulas merged commit 0140f18 into develop May 23, 2025
11 checks passed
@Natsoulas Natsoulas deleted the bug/220-spice-makes-bsk-sims-not-thread-safe branch May 23, 2025 00:42
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Basilisk May 23, 2025
@Mark2000
Copy link
Contributor

This PR is leading to errors in some of my unit tests in BSK-RL, BSK_ERROR: Attempting to unload kernel that was never loaded for every spice file (https://github.com/AVSLab/bsk_rl/actions/runs/15291893777/job/43012764022?pr=268). Specifically, I had a "safe" way of trying to unload spice

    def __del__(self) -> None:
        """Unload spice kernels when object is deleted."""
        try:
            self.gravFactory.unloadSpiceKernels()
        except AttributeError:
            pass

but I believe that is no longer working. I'm not sure if it's newly trying to unload files that weren't explicitly loaded, or if it always tried that but the error is being passed in a way that's no longer caught by my code. The error is triggered when this object (https://github.com/AVSLab/bsk_rl/blob/f70bd259049f2bcdc40c76eb258736968bf98cf2/src/bsk_rl/utils/orbital.py#L294) is created then deleted.

@schaubh
Copy link
Contributor

schaubh commented May 28, 2025

@Natsoulas , please see Mark's recent issue with your branch. Could you please take a look at this?

@Natsoulas
Copy link
Contributor Author

I'll take a look asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Spice makes BSK simulations not thread safe

5 participants