-
Notifications
You must be signed in to change notification settings - Fork 15
Infer the ReFrame partition name based on the node_type listed in the job's cfg #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
casparvl
commented
Nov 12, 2025
Comment on lines
+186
to
+203
| # Check if the partition specified by RFM_SYSTEM is in the config file | ||
| # Redirect to /dev/null because we don't want to print an ERROR, we want to try a fallback | ||
| reframe --show-config | grep -v "could not find a configuration entry for the requested system/partition combination" > /dev/null | ||
| if [[ $? -eq 1 ]]; then | ||
| # There was a match by grep, so we failed to find the system/partition combination | ||
| # Try the previous approach for backwards compatibility | ||
| # This fallback can be scrapped once all bots have adopted the new naming convention | ||
| # (i.e. using the node_type name from app.cfg) for ReFrame partitions | ||
| # Get the correct partition name | ||
| echo "Falling back to old naming scheme for REFRAME_PARTITION_NAME." | ||
| echo "This naming scheme is deprecated, please update your partition names in the ReFrame config file." | ||
| REFRAME_PARTITION_NAME=${EESSI_SOFTWARE_SUBDIR//\//_} | ||
| if [ ! -z "$EESSI_ACCELERATOR_TARGET_OVERRIDE" ]; then | ||
| REFRAME_PARTITION_NAME=${REFRAME_PARTITION_NAME}_${EESSI_ACCELERATOR_TARGET_OVERRIDE//\//_} | ||
| fi | ||
| echo "Constructed partition name based on EESSI_SOFTWARE_SUBDIR and EESSI_ACCELERATOR_TARGET: ${REFRAME_PARTITION_NAME}" | ||
| export RFM_SYSTEM="BotBuildTests:${REFRAME_PARTITION_NAME}" | ||
| fi |
Contributor
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is all to keep backwards compatibility
laraPPr
approved these changes
Dec 1, 2025
Contributor
laraPPr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, we construct a ReFrame partition name based on the software subdir that is being used. E.g. the constructed partition name would be
x86_64_intel_skylake_avx512on an intel skylake node. However, this leads to failures in the test step when cross compiling:Because ReFrame will look for a partition
aarch64_neoverse_v1_accel_nvidia_cc90in the ReFrame config, but that doesn't exist. We could of course define 5 identical partitionsaarch64_neoverse_v1_accel_nvidia_ccXX(with XX=70, 80, 90, 100, 120), but there's no point: they all represent the same physical node, with the same physical properties (namely: it has no GPU!).Rather than this duplication, we should really have one ReFrame partition that corresponds to the actual hardware in the node, and make sure that whenever that node is used for building, this partition config is used. With the change of
arch_target_maptonode_type_mapon the bot side, I silently introduced a property in thecfg/job.cfgfile that allows us to do this: it stores thenode_type, i.e. the keys in thenode_type_map. Using this means one has to use the same names for partitions in theapp.cfgand the ReFrame config file, which I think is intuitive.The way in which I implemented this is that I retrieve the
node_typevalue in thebot/test.sh, then pass it as an argument totest_suite.sh. I also made sure that it still works with the old configuration.Proof that this works: