Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions bot/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,22 @@ else
fi
echo "bot/build.sh: EESSI_ACCELERATOR_TARGET_OVERRIDE='${EESSI_ACCELERATOR_TARGET_OVERRIDE}'"

# Log the full lscpu and os-release info:
lscpu > _bot_job${SLURM_JOB_ID}.lscpu
cat /etc/os-release > _bot_job${SLURM_JOB_ID}.os

# Also: fetch CPU flags into an array, so that we can implement a hard check against a reference
lscpu_flags_line=$(lscpu | grep "Flags:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would grab the full lscpu output, in a separate file?

Other fields (like Model name), and also additional info like host OS (/etc/os-release and /etc/redhat-release) would be relevant to grab

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. First step could just be to write output of lscpu to _bot_job{job_id}.lscpu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we could definitely do both - I'd be happy if a file just gets dumped in the workdir of the bot like Thomas proposes. But to compare the flags against a reference, you definitely want the list of flags extracted separately, without any context. And, as stated in the comment below this code, you can quite easily compare between (sorted) bash arrays to spot any difference.

Copy link
Contributor Author

@casparvl casparvl Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words: I think there's two goals here

  1. Better logging, in which case we want to include as much info as possible (full lscpu and os-release output)
  2. Runtime checking of the supported Flags, and producing a hard abort if it is not as expected

I've implemented the 2nd, you want the first, I propose we do both ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the hard abort. It's anticipated yes, but not yet there, right.

Lets try to move quickly. Log the output of lscpu + os release into a file or two, keep grabbing the flags and print them to the job output. Putting this into production quickly we already gather information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the hard abort. It's anticipated yes, but not yet there, right.

You're right, it's not, this was just preparation, only logging the Flags and not doing anything with it yet.

Copy link
Contributor Author

@casparvl casparvl Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, see above.

FYI: I would like to keep what I have in terms of bash array. Once we have a reference, it'll be a 1 line change (loading the reference) + uncommenting the code below to implement the hard fail. No need to reinvent that later...

# strip leading "Flags:" and spaces, and put result in a bash array
if [[ $lscpu_flags =~ Flags:\ (.*) ]]; then lscpu_flags=(${BASH_REMATCH[1]}); fi
# for now, just print
echo "bot/build.sh: CPU flags=${lscpu_flags[@]}"
# TODO: an actual comparison with a reference bash array, e.g. through
# diff_result=$(diff <(printf "%s\n" "${lscpu_flags[@]}" | sort) <(printf "%s\n" "${lscpu_flags_ref[@]}" | sort))
# if [ ! -z "$diff_result" ]; then
# echo "bot/build.sh: ERROR: difference between reported lscpu flags and reference for this ($EESSI_SOFTWARE_SUBDIR_OVERRIDE) CPU architecture. This could mean an incorrect build host was used to build for this target.
# fi

# get EESSI_OS_TYPE from .architecture.os_type in ${JOB_CFG_FILE} (default: linux)
EESSI_OS_TYPE=$(cfg_get_value "architecture" "os_type")
export EESSI_OS_TYPE=${EESSI_OS_TYPE:-linux}
Expand Down