Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different architectures during job submission and runtime #10853

Merged
merged 2 commits into from Oct 12, 2021

Conversation

amaltaro
Copy link
Contributor

@amaltaro amaltaro commented Oct 5, 2021

Superseeds #10673
Fixes #10840

Status

not-tested

Description

Summary of changes is:

  • support different CPU architectures at the job wrapper (actually, CVMFS is only setup for ppc64le and x86_64. Architecture aarch64 will only be supported in the future)
    • this means, CVMFS scripts will be sourced accordingly
  • added a map of ScramArch to architecture in the BasePlugin
    • note that there is no default architecture, in case a job does not specify ScramArch (only known case is Cleanup, which will then allow it to run in any architecture)
    • note that the result must be a single architecture, so some precedence logic goes embedded
  • overwrite the Requirements expression string according to the architecture that we want to run that job.

UPDATE: The SI team discussed this with the HTCondor developers, and indeed TARGET.Arch in the requirements is the way to use such resources.

Is it backward compatible (if not, which system it affects?)

maybe

Related PRs

None

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python2 Unit tests: succeeded
    • 1 tests added
  • Python3 Unit tests: succeeded
    • 1 tests added
    • 2 changes in unstable tests
  • Python2 Pylint check: succeeded
    • 6 warnings
    • 35 comments to review
  • Python3 Pylint check: failed
    • 8 warnings and errors that must be fixed
    • 6 warnings
    • 43 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 26 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12485/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

amaltaro commented Oct 8, 2021

This is completed untested, so I will hold off merging this until we can have some tests running on top of 1.5.4.patch1 (to be released soon).

@cmsdmwmbot
Copy link

Jenkins results:

  • Python2 Unit tests: succeeded
    • 1 tests added
  • Python3 Unit tests: succeeded
    • 1 tests added
  • Python2 Pylint check: succeeded
    • 6 warnings
    • 35 comments to review
  • Python3 Pylint check: failed
    • 8 warnings and errors that must be fixed
    • 6 warnings
    • 43 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 26 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12511/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor Author

@tommasoboccali @spigad Hi Tom, Daniele, this patch seems to be working fine (in terms of, it does not break standard functionality).
Please let me know if you have anything else to add here, otherwise we should be merging it tonight or tomorrow morning and a new WMAgent release will be made with it.

Regarding support to multiple architectures, I believe it still requires further discussion and I'd rather look at that issue in the future (to be addressed with this ticket: #10674)

@spigad
Copy link
Member

spigad commented Oct 11, 2021

Hi @amaltaro just spoke with @tommasoboccali we agreed to go ahead so far we've not things to add.

And about multi-arch I agree we need further discussions ..
thanks Daniele

Copy link
Contributor

@todor-ivanov todor-ivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amaltaro. There is just a single question inline. You may want to take a look at it.

elif "aarch64" in requiredArchs:
return "aarch64"
else: # should never get here!
return defaultArch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is a behavior by design, but let me still ask. What is the difference between enforcing the default value from this line, and not enforcing it on line 170?
https://github.com/dmwm/WMCore/pull/10853/files#diff-5885ed172eb7ae63636d70ecf9a1c15766b05c33565a8e251f325a34e2539ad9R170

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines (191/192) are probably not needed. As the comment says, I can't see a case where we would reach that code.
IF something really manages to reach this, then using the "standard" grid architecture looked the most reasonable option to me (instead of allowing such jobs to run at any architecture).

@amaltaro
Copy link
Contributor Author

Thanks for the review, Todor. Merging and soon will get it backported to 1.5.4_wmagent branch as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to Power PC 64 workflows and jobs
4 participants