-
Notifications
You must be signed in to change notification settings - Fork 91
Bug/220 spice makes bsk sims not thread safe #994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug/220 spice makes bsk sims not thread safe #994
Conversation
4a1af4d to
aeed048
Compare
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Outdated
Show resolved
Hide resolved
8f6bf30 to
a9ef12b
Compare
schaubh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work! All tests passed on my system. Just some minor comments to address
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Outdated
Show resolved
Hide resolved
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Outdated
Show resolved
Hide resolved
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Outdated
Show resolved
Hide resolved
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Outdated
Show resolved
Hide resolved
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Outdated
Show resolved
Hide resolved
|
Nice work, looks like we are there. @juan-g-bonilla , as you commented on the original issue with the suggestion to use a mutex, would you have time to take a look at this code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done @Natsoulas ! Many of my comments are actually encouraging you to fix existing issues or even the original fix I proposed so we can get this class to its highest quality possible.
Thanks for the tag @schaubh .
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Show resolved
Hide resolved
a9ef12b to
6a063cf
Compare
|
@juan-g-bonilla |
13283c0 to
70d1575
Compare
schaubh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments to address.
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Show resolved
Hide resolved
juan-g-bonilla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor suggestions + a warning about an edge cases when unloading happens before loading. Would be great to address those, but otherwise if you address Dr Shaub's comments LGTM.
70d1575 to
ae2c037
Compare
schaubh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to go. Thanks for addressing this long-standing issue.
|
This PR is leading to errors in some of my unit tests in BSK-RL, but I believe that is no longer working. I'm not sure if it's newly trying to unload files that weren't explicitly loaded, or if it always tried that but the error is being passed in a way that's no longer caught by my code. The error is triggered when this object (https://github.com/AVSLab/bsk_rl/blob/f70bd259049f2bcdc40c76eb258736968bf98cf2/src/bsk_rl/utils/orbital.py#L294) is created then deleted. |
|
@Natsoulas , please see Mark's recent issue with your branch. Could you please take a look at this? |
|
I'll take a look asap. |
Description
The changes implement thread safety for the SPICE interface to address GitHub issue #220, which reported deadlocks and data corruption when running parallel Monte Carlo simulations.
This solution uses a mutex and reference counting approach to protect SPICE kernel loading and unloading operations (inspired by Juan's suggestion comment on the issue):
kernel_manipulation_mutexin the SpiceInterface classkernel_reference_counterto track kernel usageloadSpiceKernel()to only load kernels once and use reference countingunloadSpiceKernel()to only unload kernels when no longer neededclearKeeper()with the mutexThis ensures that when multiple simulations run in parallel, they won't encounter race conditions or deadlocks when accessing SPICE kernels.
Verification
The changes were verified with a dedicated thread safety unit test:
test_spiceThreadSafety.pyunit test in the SPICE interface's_UnitTestdirectoryThe test specifically reproduces the conditions described in issue #220.
Documentation
Added code comments explaining the thread safety mechanism in:
spiceInterface.hfor the static mutex and reference counterspiceInterface.cppfor the mutex implementation and kernel reference counting logicNo existing documentation was invalidated.
Future work