Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If job requires rhel8 container, use slc7 python libraries #11077

Merged
merged 1 commit into from
Apr 12, 2022

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Apr 9, 2022

Fixes #11051
Complement to #11060 and #11062

Status

ready

Description

We do not have a build of COMP in EL8, so for jobs requesting rhel8, use our slc7_* python libraries.

We might consider using python and future library from one of the EL8 CMSSW releases as well, still unsure whether that would work out of the box though.

UPDATE: these changes triggered a discussion with the SI team, and they actually updated the container to be used for REQUIRED_OS=any from rhel6 to rhel7, on April 11, 2022. We need to check whether current production agents will behave properly.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

Complement to #11060 and #11062

External dependencies / deployment changes

None

@amaltaro
Copy link
Contributor Author

amaltaro commented Apr 9, 2022

Testing it with vocms0192 and amaltaro_TC_EL8_Agent202_Val_220409_033647_4276.
FYI @smuzaffar

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12995/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

amaltaro commented Apr 9, 2022

This doesn't work. Stage out fails with this error:

stderr: xrdcp: /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by /cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.5/3.5.60-2/el8-x86_64/usr/lib64/libXrdCl.so.3)
xrdfs: /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by xrdfs)
xrdfs: /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by /cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.5/3.5.60-2/el8-x86_64/usr/lib64/libXrdCl.so.3)
xrdfs: /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by xrdfs)
xrdfs: /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by /cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.5/3.5.60-2/el8-x86_64/usr/lib64/libXrdCl.so.3)
xrdfs: /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by xrdfs)
xrdfs: /cvmfs/cms.cern.ch/COMP/slc7_amd64_gcc630/external/gcc/6.3.0/lib64/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by /cvmfs/oasis.opensciencegrid.org/mis/osg-wn-client/3.5/3.5.60-2/el8-x86_64/usr/lib64/libXrdCl.so.3)

I guess we will have to do the same trick as we did for ppc64le, create a symlink to COMP python3 and py3-future from one of the EL8 CMSSW releases.

@smuzaffar
Copy link

yes, we need COMP python for el8 cmssw release. I can create the symlinks, let me know what paths WMCore would be expacting under /cvmfs/cms.cern.ch/COMP?

@smuzaffar
Copy link

by the way, starting with rhel8 can we simplify the logic here

WMCore/etc/submit_py3.sh

Lines 141 to 149 in 844ccaf

if [ "$REQUIRED_OS" = "rhel7" ] || [ "$REQUIRED_OS" = "rhel8" ]
then
WMA_SCRAM_ARCH=slc7_${THIS_ARCH}_gcc630
elif [ "$THIS_ARCH" = "amd64" ]
then
WMA_SCRAM_ARCH=slc6_${THIS_ARCH}_gcc700
else
WMA_SCRAM_ARCH=slc6_${THIS_ARCH}_gcc493
fi
e.g.

if [ "$REQUIRED_OS" = "rhel7" ]
then
    WMA_SCRAM_ARCH=slc7_${THIS_ARCH}_gcc630
elif [ "$REQUIRED_OS" = "rhel6" ]
then
    if [ "$THIS_ARCH" = "amd64" ]
    then
        WMA_SCRAM_ARCH=slc6_${THIS_ARCH}_gcc700
    else
        WMA_SCRAM_ARCH=slc6_${THIS_ARCH}_gcc493
    fi
else
    WMA_SCRAM_ARCH=${REQUIRED_OS}_${THIS_ARCH}
fi

and we create rhelX_(amd64|aarch64|ppc64le) symlinks for rhel8 and above. In near future we have to test rhel9 too :-)

@amaltaro
Copy link
Contributor Author

amaltaro commented Apr 9, 2022

@smuzaffar your suggestion would be the ideal and most maintainable approach in my opinion. However, I don't feel comfortable with blindly using CMSSW-based builds to run the production job wrapper, reason is because there was no validation at all to running the WMCore runtime code with those libraries (python, future, and other packages that I see in cvmfs - gcc, zlib, bz2lib, etc)

@amaltaro
Copy link
Contributor Author

amaltaro commented Apr 9, 2022

Shahzad, I guess we can start with a path like:

/cvmfs/cms.cern.ch/COMP/rhel8_amd64_gcc630/

and maybe the same for the other 2 architectures (ppc64le and aarch64).

We need only python3 and py3-future packages.

@smuzaffar
Copy link

smuzaffar commented Apr 10, 2022

@smuzaffar your suggestion would be the ideal and most maintainable approach in my opinion. However, I don't feel comfortable with blindly using CMSSW-based builds to run the production job wrapper, reason is because there was no validation at all to running the WMCore runtime code with those libraries (python, future, and other packages that I see in cvmfs - gcc, zlib, bz2lib, etc)

I just wanted to have the code to support future archs. /cvmfs/cms.cern.ch/COMP/${REQUIRED_OS}_$(uname -m) could be symlink pointing to actual /cvmfs/cms.cern.ch/COMP/<os>_<arch>_gcc<version> . Whenever comp is ready to build/provide its own python3/future packages for these archs then we can just install those under /cvmfs/cms.cern.ch/COMP/<comp_os>_<arch>_<comp_gcc> and update the symlink to point to it. If comp does not have the arch support yet then we can create the symlink to point to cms based python3/future ( just like we do for ppc64le and now for rhel8).

For example, for existing WMA_SCRAM_ARCH , we could already create

  • rhel6_x86_64 -> slc6_amd64_gcc700
  • rhel6_ppc64le -> slc6_ppc64le_gcc493 (by the way, we do not have slc6_ppc64le releases, so no idea why we created this, may be it was during the ppc64le initial tests by Tommaso and submit.sh did not have the proper support for it. I think this can be deleted now)
  • rhel7_(x86_64|ppc64le) -> slc7_(amd64|ppc64le)_gcc630

and for new archs e.g. rhel8, rhel9 and aarch64

  • ${REQUIRED_OS}_$(uname -m) - > pointing to either cms based python3/future archs OR comp based python3/future architectures

Once these symlinks are available then all you need is to use

WMA_SCRAM_ARCH="${REQUIRED_OS}_$(uname -m)"

@smuzaffar
Copy link

smuzaffar commented Apr 10, 2022

by the way, we are also in process of setting up jenkins jobs to test the system via crab jobs. This will allow us to automatically test new archs (e.g. rhel9) and find issues during our regular Integration Build process.

@smuzaffar
Copy link

@amaltaro , new archs are now available under /cvmfs/cms.cern.ch/COMP. Now paths are

cc8_aarch64_gcc9/external/py3-future/0.18.2
cc8_aarch64_gcc9/external/python3/3.8.2 -> /cvmfs/cms.cern.ch/cc8_aarch64_gcc9/external/python3/3.8.2
cc8_amd64_gcc9/external/py3-future/0.18.2
cc8_amd64_gcc9/external/python3/3.8.2 -> /cvmfs/cms.cern.ch/cc8_amd64_gcc9/external/python3/3.8.2-555576dae8d03b340ea3079ddb8abb0d
cc8_ppc64le_gcc9/external/py3-future/0.18.2
cc8_ppc64le_gcc9/external/python3/3.8.2 -> /cvmfs/cms.cern.ch/cc8_ppc64le_gcc9/external/python3/3.8.2
cs9_amd64_gcc11/external/py3-future/0.18.2
cs9_amd64_gcc11/external/python3/3.9.6 -> /cvmfs/cms.cern.ch/cs9_amd64_gcc11/external/python3/3.9.6-2c93591d796451daeaf67dc8130bfa35
slc7_aarch64_gcc820/external/py3-future/0.18.2 -> ../py2-future/0.18.2
slc7_aarch64_gcc820/external/python3/3.8.2 -> /cvmfs/cms.cern.ch/slc7_aarch64_gcc820/external/python3/3.8.2
slc7_ppc64le_gcc630/external/py3-future/0.18.2
slc7_ppc64le_gcc630/external/python3/3.8.2 -> /cvmfs/cms.cern.ch/slc7_ppc64le_gcc820/external/python3/3.8.2
slc7_ppc64le_gcc820/external/py3-future/0.18.2
slc7_ppc64le_gcc820/external/python3/3.8.2 -> /cvmfs/cms.cern.ch/slc7_ppc64le_gcc820/external/python3/3.8.2

I also have created short symlink, so if you prefer please use these in submit script

rhel5_x86_64 -> slc5_amd64_gcc434
rhel6_x86_64 -> slc6_amd64_gcc700
rhel7_aarch64 -> slc7_aarch64_gcc820
rhel7_ppc64le -> slc7_ppc64le_gcc820
rhel7_x86_64 -> slc7_amd64_gcc630
rhel8_aarch64 -> cc8_aarch64_gcc9
rhel8_ppc64le -> cc8_ppc64le_gcc9
rhel8_x86_64 -> cc8_amd64_gcc9
rhel9_x86_64 -> cs9_amd64_gcc11

@amaltaro
Copy link
Contributor Author

@smuzaffar thank you very much, Shahzad!

I also have created short symlink, so if you prefer please use these in submit script

I am giving this a try in my last commit that I just pushed in into this branch. Let's see how it goes with grid tests.

then
WMA_SCRAM_ARCH=slc6_${THIS_ARCH}_gcc700
WMA_SCRAM_ARCH=slc7_${THIS_ARCH}
Copy link

@smuzaffar smuzaffar Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amaltaro , may be rhel7 instead of slc7 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks!

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12999/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13000/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

Tests are still running, but I can already see a few 100s jobs all successfully completing.
@smuzaffar I couldn't explicitly ask for your review here, but feel free to leave any other concerns/questions/thumbs up that you might have.

@smuzaffar
Copy link

looks good to me.

Copy link
Contributor

@vkuznet vkuznet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made one comment how to improve the it is up to you to decide if you want to make it. From the point of view of required changes everything is fine and I'll approve it.

then
WMA_SCRAM_ARCH=slc6_${THIS_ARCH}_gcc700
WMA_SCRAM_ARCH=rhel7_${THIS_ARCH}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather define separate default flavor variable, e.g. DEFAULT_OS=rhl7 somewhere on the top of the script and then use it here. This makes it easy to maintain and avoid changing defaults somewhere down in a code. I usually prefer to have common definitions on top of the script and then have a line saying DO NOT EDIT below this line to make clear separation between definitions and logic itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion, thanks Valentin.

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@amaltaro
Copy link
Contributor Author

With these changes in, now the Cleanup jobs fail because I just noticed they do not define REQUIRED_OS=any, but to an empty string. To be followed up in: #11081

Thanks for the reviews everyone!
@smuzaffar these changes to the simplified schema of rhelX_arch seems to be working fine, which is great!

Trying the new map of OS_ARCH

replace slc7 by rhel7

apply Valentins suggestion
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13006/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

Having the agent patched with #11083 does the tricky and there are successful Cleanup jobs now. I think it's safe to get this merged and packaged in a new release, however, I am trying to understand some random errors for:

  • merge jobs at T2_CH_CERN_HLT (only for the EL8 workflow, and it might be not related only to merge)
  • cleanup jobs at T1_US_FNAL, still using rhel6 container.

Discussion is ongoing over the SI mattermost channel.

@amaltaro
Copy link
Contributor Author

other than the two transient issues mentioned above, this looks good to go and will be backported to the agent branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to RHEL8 workflows/jobs
5 participants