Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi thread/device support #418

Merged
merged 1 commit into from
Nov 18, 2020
Merged

Multi thread/device support #418

merged 1 commit into from
Nov 18, 2020

Conversation

Robadob
Copy link
Member

@Robadob Robadob commented Nov 6, 2020

ToDo

  • Make EnvironmentManager thread-safe
  • Make Curve thread-safe
  • Encode CudaSimulation instance id, into Curve hashes.
  • Make EnvironmentManager multi-device aware
  • Make Curve multi-device aware
  • Make CUDASimulation multi-device aware
    • It contains a runtime check which purges device data if it is detected corrupted, changing device currently breaks this.
  • Add nvcc arg --default-stream per-thread to CMake
  • Fix RTC
    • Does the dynamic header need changing (or will it for ensembles)
    • I think Pete said jitify complains about compiling in anything other than the main thread, (RTC compile works, doesn't print any warning it's just noticeably slower)
  • Write a bunch of test cases to ensure model integrity isn't harmed during multi device/thread CUDASimulation execution.
    • multi-threaded
      • Agent set/getVariable
      • MessageIn getVariable
      • MessageOut setVariable
      • AgentOut setVariable
      • Environment getProperty
      • AgentFunctionCondition Agent getProperty
    • multi-device
      • Agent set/getVariable
      • MessageIn getVariable
      • MessageOut setVariable
      • AgentOut setVariable
      • Environment getProperty
      • AgentFunctionCondition Agent getProperty
    • RTC multi-device
      • Agent set/getVariable
      • MessageIn getVariable
      • MessageOut setVariable
      • AgentOut setVariable
      • Environment getProperty
      • AgentFunctionCondition Agent getProperty
  • Regular tests work
  • RTC tests work
  • Reset tests_dev cmake

ChangeLog

  • Make Curve singleton thread-safe and unique per-device
  • Make EnvironmentManager singleton thread-safe and unique per-device
  • Add mutex locks around agent function execution, to prevent environment data getting changed by defragment during before execution).
  • Add nvcc arg --default-stream per-thread to CMake (and then comment out as it would require far more testing)
  • Fix double inclusion of common.cmake if configuring with an example as root.
  • Modify initialisation of CUDASimulation so that EnvironmentManager is not used before device has been selected.
  • Add tests for multi-threading and multi-device CUDASimulation execution.
  • Adds a reduced set of multi-thread and multi-device tests for RTC.
  • Fix a bug, where rtc_offsets were reset to 0 when EnvironmentManager:defragment() was called.

Known Issues

  • Curve collision handling (Curve: Collision handling #356) is often triggered if running new tests for multi-device mode. The state of this is dependent on the order of test execution and the number of devices within the machine. This is due to the CUDASimulation instance id incrementing each time, and this mutating the hashes used. The test does not collide in isolation, but atleast 1 test from the series always seems to collide if ran as a set. Might be trouble with hashes bunching, as there should only be a max of around 24/1024 hashes in use at any time during these tests.
  • nsys timeline for one of the multi-device test cases, appears to show the context creation for tertiary devices waiting for CUDA to become idle on already initialised devices. Seems important to initialise CUDA on all devices before launching sims. Calling cudaFree(nullptr) on every device at the start of the test appears to work,
  • The new compile arg --default-stream per-thread removes some implicit device syncs that would otherwise be expected. Very important to sync on either side of agent functions.

Relevant to #245
Closes #395 (if done properly)

@Robadob
Copy link
Member Author

Robadob commented Nov 11, 2020

Current issue is IO tests failing because the Singletons have already been init, when loading frm file. As such the setup env vars loaded from file code is not executed.

@Robadob
Copy link
Member Author

Robadob commented Nov 11, 2020

Full test suite now passes on Linux.

Will add the multi-thread, multi-device tests tomorrow. Then test RTC.

@Robadob
Copy link
Member Author

Robadob commented Nov 12, 2020

One side-effect/by-product of multi-device is an issue we have when a thread exits in not device 0, and a new CUDASim is then created in the same thread, without having applyConfig() called, to change device back to device 0 (the default value found in config).

@Robadob Robadob force-pushed the multi_thread_device branch 3 times, most recently from 4e55957 to 3804cdd Compare November 13, 2020 19:38
@Robadob
Copy link
Member Author

Robadob commented Nov 13, 2020

Didn't make much progress today.

  • Tweak environment property hashing, this seems to have fixed collisions I was having.
  • Modified CUDASimulation destructor to do everything on the right device/properly.
  • Some other minor changes
    • Moved DeviceExceptionBuffer shared memory copy, at start of agentfunction/agentfunctioncondition, so it's only performed by 1st thread of block.
    • etc.

Spent most the day chasing DeviceException test failures. Had some confusion with Mav, which had different errors. The failures still (only?) occur on my laptop and I'm yet to find a reasonable excuse for why they disappear as soon as I make a change. Also affects RTCDeviceException. Should really debug them with tests_dev to have a much faster build time.

@Robadob
Copy link
Member Author

Robadob commented Nov 16, 2020

A bunch of message tests are failing/crashing out in release model (on both Windows/Linux), need to investigate.

@Robadob
Copy link
Member Author

Robadob commented Nov 18, 2020

Initial status of RTC tests, first test passed (message), other 3 failed. The first does not use any environment variables, so this is likely the issue.
Of the test setup with the most detailed failure output, curiously only 1 of 10 tests per device passed. This might hint towards the problem.
Each test also takes several minutes to run, dependent on the number of threads. (and for non-rtc more threads == more chance of hitting a test failure).

Make Curve singleton thread-safe and unique per-device
Make EnvironmentManager singleton thread-safe and unique per-device
Add mutex locks around agent function execution, to prevent environment data getting changed by defragment during before execution).
Add nvcc arg --default-stream per-thread to CMake (and then comment it out as it would require far more testing)
Fix double inclusion of common.cmake if configuring with an example as root.
Modify initialisation of CUDASimulation so that EnvironmentManager is not used before device has been selected.
Add tests for multi-threading and multi-device CUDASimulation execution.
Adds a reduced set of multi-thread and multi-device tests for RTC.
Fix a bug, where rtc_offsets were reset to 0 when EnvironmentManager:defragment() was called.
Required for #245
Closes #395
@Robadob Robadob marked this pull request as ready for review November 18, 2020 13:05
@mondus mondus merged commit c4f67c4 into master Nov 18, 2020
@mondus mondus deleted the multi_thread_device branch November 18, 2020 16:10
@Robadob Robadob mentioned this pull request Nov 19, 2020
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestCUDASimulation.AllDeviceIdValues causes test failures on multi-gpu systems
2 participants