-
-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The new LCI parcelport for HPX #5715
Conversation
Can one of the admins verify this patch? |
add to whitelist |
@JiakunYan Thanks for working on this! This is a major step forward in terms of supporting modern networking infrastructures. We just majorly restructured the HPX repository. It has been split into two parts (this repository and https://github.com/STEllAR-GROUP/hpx-local). Please give me a day or two to rebase your PR onto current master (that will involve force-pushing to your branch). While doing so, I will also apply some reformatting changes (cmake-format and clang-format based) to make our CI checks happy. |
@JiakunYan I have created a PR against your base branch here: uiuc-hpc#1. This might require some manual intervention. I think the best would be if you replaced your base branch with https://github.com/hkaiser/hpx/tree/hpx-lci, that contains all of your changes. |
@hkaiser Thanks! I have done it. Continue working on debugging |
That worked for me. Do you still have issues with this setting? |
BTW, I simply disabled PGKCONFIG if LCI is enabled. I think that's acceptable as we know it's a cmake issue and there might not be a proper workaround. |
There are some errors when building tests for HPX. For the old version, I got the following error:
For the new version, I got the following error:
The scripts I was running can be found here: https://github.com/JiakunYan/hpx-scripts/tree/main/rostam/test
|
@JiakunYan this error is unrelated and a problem caused by our recent refactoring. The pgkconfig problems the CI shows seem to be unrelated to your PR as well. We will take care of this. Could you please have a look at the cmake-format and clang-format issues (see: here and here)? |
It seems I don't have access to the two links you mentioned
|
Here are the files: clang-format problem: diff --git a/libs/full/lci_base/src/lci_environment.cpp b/libs/full/lci_base/src/lci_environment.cpp
index c148729..5545908 100644
--- a/libs/full/lci_base/src/lci_environment.cpp
+++ b/libs/full/lci_base/src/lci_environment.cpp
@@ -7,8 +7,8 @@
#include <hpx/config.hpp>
-#include <hpx/modules/logging.hpp>
#include <hpx/modules/lci_base.hpp>
+#include <hpx/modules/logging.hpp>
#include <hpx/modules/runtime_configuration.hpp>
#include <hpx/modules/util.hpp> cmake-format problem: diff --git a/cmake/HPX_AddTest.cmake b/cmake/HPX_AddTest.cmake
index 0232b6f..38cb626 100644
--- a/cmake/HPX_AddTest.cmake
+++ b/cmake/HPX_AddTest.cmake
@@ -159,12 +159,12 @@ function(add_hpx_test category name)
if(_add_test)
set(_full_name "${category}.distributed.lci.${name}")
add_test(NAME "${_full_name}" COMMAND ${cmd} "-p" "lci" "-r" "mpi"
- ${args}
- )
+ ${args}
+ )
set_tests_properties("${_full_name}" PROPERTIES RUN_SERIAL TRUE)
if(${name}_TIMEOUT)
set_tests_properties(
- "${_full_name}" PROPERTIES TIMEOUT ${${name}_TIMEOUT}
+ "${_full_name}" PROPERTIES TIMEOUT ${${name}_TIMEOUT}
)
endif()
endif()
diff --git a/cmake/HPX_SetupHPXLocal.cmake b/cmake/HPX_SetupHPXLocal.cmake
index cb34b8e..d8e06a6 100644
--- a/cmake/HPX_SetupHPXLocal.cmake
+++ b/cmake/HPX_SetupHPXLocal.cmake
@@ -188,7 +188,10 @@ elseif(NOT TARGET HPX::hpx_local AND NOT HPX_FIND_PACKAGE)
if(MSVC AND NOT DEFINED HPXLocal_WITH_BINARY_DIR)
set(HPXLocal_WITH_BINARY_DIR
"${PROJECT_BINARY_DIR}"
- CACHE STRING "Binary directory for local should be the same as for full library" FORCE
+ CACHE
+ STRING
+ "Binary directory for local should be the same as for full library"
+ FORCE
)
endif()
diff --git a/plugins/parcelport/lci/CMakeLists.txt b/plugins/parcelport/lci/CMakeLists.txt
index a6af5e2..6b305e5 100644
--- a/plugins/parcelport/lci/CMakeLists.txt
+++ b/plugins/parcelport/lci/CMakeLists.txt
@@ -25,8 +25,14 @@ if(HPX_WITH_NETWORKING AND HPX_WITH_PARCELPORT_LCI)
"${PROJECT_SOURCE_DIR}/hpx/plugins/parcelport/lci/sender.hpp"
"${PROJECT_SOURCE_DIR}/hpx/plugins/parcelport/lci/sender_connection.hpp"
"${PROJECT_SOURCE_DIR}/hpx/plugins/parcelport/lci/tag_provider.hpp"
- DEPENDENCIES hpx_actions hpx_command_line_handling hpx_config hpx_lci_base
- hpx_dependencies_boost Mpi::mpi LCI::LCI
+ DEPENDENCIES
+ hpx_actions
+ hpx_command_line_handling
+ hpx_config
+ hpx_lci_base
+ hpx_dependencies_boost
+ Mpi::mpi
+ LCI::LCI
INCLUDE_DIRS ${PROJECT_SOURCE_DIR}
FOLDER "Core/Plugins/Parcelport/LCI"
) |
@msimberg the problems related to pkgconfig seem to be a leftover from the repository split. Would you mind having a look? |
@JiakunYan I would like to get back to this as soon as possible. Is there anything I can help to move this forward? |
@hkaiser Apologize! For the last two weeks, I was busy with other things (holiday, of course, and some qual exam related stuff) and was slow on this project. I shall get back in one week. You mentioned that some tests were broken on Rostam due to recent refactoring. Have those tests been fixed now? It doesn't feel good to do major code development without being able to test its correctness. |
@JiakunYan I didn't mean to put any pressure on you, rather was trying to help. The rostam changes should be in place very soon, so next week is perfect. Thanks! |
@hkaiser Thank you! Once the fix is in place, it would be great if you could help me rebase the current branch onto the master branch. |
@JiakunYan: Please see uiuc-hpc#2 for the rebase. |
The error is unrelated to the split. It's triggered by this line not having Line 44 in 5f0645b
|
It is intentional. Without removing CXX, |
Possibly
if you're not already explicitly asking for the |
@msimberg Thanks! I will try them. |
Is the fix in place now? I am still seeing some errors on Rostam when building the tests.
Is this a leftover bug of the split? |
I tested it. If I set |
All right, thanks for checking. Unfortunately I don't have an workarounds in mind. We know that the pkgconfig file generation is problematic (and has been for a while). @hkaiser I'll leave this up to you. I suppose you can always force |
That's exactly what I have done already. At least I had done it at some point... (edit: it's here: https://github.com/STEllAR-GROUP/hpx/pull/5715/files#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR1165) |
@JiakunYan Thanks for the fix. However, the error message is still there. I added
to my cmake to make sure the issues is resolved. However, it is still there. |
I think the cmake variable you want to define should be |
Should I use a different server? |
No. Are you using the newest commit of this PR "4d9cc1c4dfa2df622a4bc290cbd75c00d7af57b0"? The error is still there? |
Yes, if I do that the bootstrap does work and the lci parcel port is found and the code is running. |
Yes, if I do that the bootstrap does work and the lci parcel port is found, and the code is running. |
I checked out the pull request this morning, I assume that is the correct commit or? |
Ok, the code was running for 2 minutes and I get the following error
|
1 similar comment
Ok, the code was running for 2 minutes and I get the following error
|
I suggest you check the output of |
This can be a more complicated error. I don't have access to Perlmutter so I cannot debug it myself. Could you please try compiling HPX with the Debug mode and also adding We don't have a Cray machine at hand so we haven't tested LCI on Cray network for a long time. It is likely that the ofi server (which is designed for Cray network) has some bugs. If you merely want to test the LCI parcelport, it might be better to start with some machines with infiniband and use the ibv server. It is more robust than the ofi server. |
I am working to get you access to a Cray machine. |
I did some recent scaling study on Perlmutter using MPI + Libfabric and I wanted to compare these results with LCI + Libfabric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@hkaiser Just did a rebase. Let's see how the tests go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
The remaining errors are unrelated. Thanks a lot! |
@gonidelis Please include a sentence about this new parcelport in the release notes and add Jiakun to the acknowledgements section. |
@hkaiser Will do! |
About
Add a new parcelport (LCI) for HPX. (Still a work in progress)
Background
The Lightweight Communication Interface (LCI) is designed to be a low-level communication library used by high-level libraries and frameworks. It aims to support irregular and asynchronous applications such as graph analysis, sparse linear algebra, and task-based runtime on modern parallel architectures. Major features include (a) support for more communication primitives such as two-sided send/recv and one-sided (dynamic or direct) remote put (b) better multi-threaded performance (c) explicit user control of communication resource (d) flexible signaling mechanisms such as synchronizer, completion queue, and active message handler. It is an ongoing research project (https://github.com/uiuc-hpc/LC).
Checklist
Completed
Known Issues