Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCCL support #93

Merged
merged 26 commits into from
Jul 25, 2022
Merged

RCCL support #93

merged 26 commits into from
Jul 25, 2022

Conversation

jrmadsen
Copy link
Collaborator

@jrmadsen jrmadsen commented Jul 18, 2022

  • adds support for RCCL similar to MPI
    • perfetto args and ret
    • timemory rccl_comm_data component
  • includes cpack tweak
    • handle rocprofiler-dev DEB dependency
  • minor tweak to omnitrace exe to prevent printf functions from being instrumented (cause deadlock)

@jrmadsen jrmadsen added enhancement New feature or request libomnitrace Involves omnitrace library sampling Statistical sampling via interrupts cmake Modifies the CMake build system cpack Modifies the CPack packaging system configuration Changes/involves configuration options rccl ROCm Communication Collectives Library labels Jul 18, 2022
@jrmadsen jrmadsen force-pushed the rccl-support branch 5 times, most recently from b02b4e9 to 3885920 Compare July 21, 2022 16:16
@jrmadsen jrmadsen changed the title [WIP] RCCL support [WIP] RCCL support + Improved ROCm-SMI Error Handling Jul 21, 2022
@jrmadsen jrmadsen force-pushed the rccl-support branch 5 times, most recently from c542e65 to e71c467 Compare July 25, 2022 06:54
@jrmadsen jrmadsen changed the title [WIP] RCCL support + Improved ROCm-SMI Error Handling RCCL support Jul 25, 2022
@jrmadsen jrmadsen added the testing Extends/improves/modifies testing label Jul 25, 2022
@jrmadsen jrmadsen marked this pull request as ready for review July 25, 2022 06:57
@jrmadsen jrmadsen removed the sampling Statistical sampling via interrupts label Jul 25, 2022
@jrmadsen jrmadsen force-pushed the rccl-support branch 2 times, most recently from b4d8619 to d148416 Compare July 25, 2022 10:04
- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile
jrmadsen and others added 17 commits July 25, 2022 08:08
- disable ompt
- enable building testing
- remove source /.../setup-env.sh, replace with $GITHUB_ENV
- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled
- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>
- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes
@jrmadsen jrmadsen added the omnitrace-instrument Involves the omnitrace-instrument executable (binary instrumenter) label Jul 25, 2022
@jrmadsen jrmadsen merged commit 45be039 into ROCm:main Jul 25, 2022
@jrmadsen jrmadsen deleted the rccl-support branch July 25, 2022 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmake Modifies the CMake build system configuration Changes/involves configuration options cpack Modifies the CPack packaging system enhancement New feature or request libomnitrace Involves omnitrace library omnitrace-instrument Involves the omnitrace-instrument executable (binary instrumenter) rccl ROCm Communication Collectives Library testing Extends/improves/modifies testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant