Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy DQM ROOT file to EOS and register entry in DQMUpload executor #11015

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

khurtado
Copy link
Contributor

@khurtado khurtado commented Feb 28, 2022

Fixes #10287

Status

New changes being tested

Description

This implements the new DQM Gui method described in #10287

  • Copy DQM root file to EOS
  • Register entry to new DQMGUI: Note this does not upload any files. It registers one root file per call, right after the file has been uploaded to EOS.
  • Delete files uploaded to EOS if registration fails
  • Use value of forceRunNumber for multiRun, to replicate current behavior by old visDQMGUI

Is it backward compatible (if not, which system it affects?)

YES

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 4 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12830/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 24 new failures
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12836/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 24 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12837/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12838/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12840/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12842/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 11 tests no longer failing
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12857/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12894/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12901/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12910/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12912/artifact/artifacts/PullRequestReport.html

@khurtado khurtado force-pushed the dqmupload1 branch 2 times, most recently from b8e5661 to 2e3d149 Compare March 21, 2022 20:40
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12913/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12914/artifact/artifacts/PullRequestReport.html

- Copy DQM root files to EOS
- Register files to new DQMGUI via http post method
- If registration fails, delete EOS files upload
@khurtado
Copy link
Contributor Author

khurtado commented Mar 22, 2022

@amaltaro I think this *almost is ready for review. What is pending before the code review is getting a new permanent EOS area, managed by WMCore/production, rather than the unmerged area. Who should we contact for that?

@khurtado khurtado force-pushed the dqmupload1 branch 2 times, most recently from ed3384c to 49b785b Compare March 25, 2022 20:24
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 3 warnings
    • 26 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12953/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
  • Python3 Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 3 warnings
    • 26 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12954/artifact/artifacts/PullRequestReport.html

@khurtado khurtado force-pushed the dqmupload1 branch 2 times, most recently from 4039951 to 7754191 Compare March 25, 2022 21:10
- Fix indentation
- Assume single register URL
- Retry registration
- Use self.job directly
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 3 warnings
    • 26 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12955/artifact/artifacts/PullRequestReport.html

@khurtado
Copy link
Contributor Author

@amaltaro I think I addressed most of the comments and left comments on the ones I didn't

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 3 warnings
    • 26 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12956/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 8 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 11 warnings and errors that must be fixed
    • 3 warnings
    • 26 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12957/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kenyi, sorry for the delay reviewing this. Next time, please refresh the review request through GH such that we can properly filter it.

While reviewing it, it just occurred to me that T0 also uses this code. So we need to verify whether T0 would need to use this same mechanism; or whether we need to work on workflow configuration that could enable/disable this new DQM feature. Can you please follow this up with the DQM team?

self.retryCount = 3
self.registerLFNBase = '/store/unmerged/DQMGUI'
self.registerEOSPrefix = '/eos/cms'
self.registerURL = 'https://cmsweb-testbed.cern.ch/dqm/offline-test-new/api/v1/register'
Copy link
Contributor

@amaltaro amaltaro Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, and they are likely workflow dependent.

What I am trying to say is whether we could use the spec DQMUploadUrl url to decide which DQM Register url to use? Otherwise, I fear that we might have to create yet another StdSpecs parameter for all our workflow specs. A third option would be to hard code it, but I would only be okay with it it we are likely not going to change this url for the coming couple of years.

"""
# If lfn start with '/', make it relative to it
lfnFile = os.path.relpath(analysisFile.lfn, start='/')
lfn = os.path.join(self.registerLFNBase, lfnFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is a surprise to me as well!

@khurtado khurtado requested a review from amaltaro April 1, 2022 19:42
Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khurtado Kenyi, here goes a fresh review of the current changes. In addition to the comments made along the code, I'd like to mention:

  1. Please update the initial PR description.
  2. I think it would be extremely useful to have a short summary of what DQMUpload module does at the top of the module, including all the steps it takes (since it no longer does a simple upload of the root file).
  3. We also need to clarify whether T0 workflows will have to follow the same new design (have you heard anything from the DQM team?)
  4. Did we agree on a final map of urls? If so, it needs to be implemented as well.

src/python/WMCore/WMSpec/Steps/Executors/DQMUpload.py Outdated Show resolved Hide resolved
self.retryCount = 3
self.registerLFNBase = '/store/unmerged/DQMGUI'
self.registerEOSPrefix = '/eos/cms'
self.registerURL = 'https://cmsweb-testbed.cern.ch/dqm/offline-test-new/api/v1/register'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create a method to get the register url (from the map that we discussed in the GH issue)?

self.httpPost(os.path.join(stepLocation,
os.path.basename(analysisFile.fileName)))
# Upload to EOS and register (new method)
self.uploadToEOSAndRegister(step, stepLocation, analysisFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of actions, perhaps we could have two different methods here:

  1. uploading the root file to EOS
  2. registering the root file metadata in the new DQM Gui
    what do you think?

"""
# If lfn start with '/', make it relative to it
lfnFile = os.path.relpath(analysisFile.lfn, start='/')
lfn = os.path.join(self.registerLFNBase, lfnFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kenyi, coming back to this, what format is analysisFile.lfn supposed to have? Can you please give me an example (or maybe two, one for TaskChain and one for StepChain)?

This relpath has a behaviour that could possibly cause problems, see:

In [8]: os.path.relpath('test', start='/')
Out[8]: 'Users/amaltaro/Pycharm/cmsdist/test'

In [9]: os.path.relpath('/test', start='/')
Out[9]: 'test'


def _register(self, registerURL, args):
"""
POST request to register URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have something like def _getHttpsOpener with the initial common code? It it becomes too ugly, then we keep these into 2 different methods.

src/python/WMCore/WMSpec/Steps/Executors/DQMUpload.py Outdated Show resolved Hide resolved
@khurtado khurtado requested a review from amaltaro May 4, 2022 22:20
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 15 warnings and errors that must be fixed
    • 3 warnings
    • 56 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 12 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13135/artifact/artifacts/PullRequestReport.html

@khurtado
Copy link
Contributor Author

khurtado commented May 4, 2022

Hi @amaltaro , to answer your questions:

  1. Do you mean the first commit, or this PR GH description?
  2. Agreed, Done!
  3. Yes, it's the same machinery for T0, answer 1 in: The new DQM GUI file management #10287 (comment)
  4. Done

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 15 warnings and errors that must be fixed
    • 3 warnings
    • 56 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 12 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13136/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

amaltaro commented May 4, 2022

  1. Do you mean the first commit, or this PR GH description?

The initial PR description at the very top, here: #11015 (comment)
However, having a second look at it, it looks good to me. Perhaps can you expand on the granularity used for the file registration in the new DQM Gui? Is it one HTTP call for each run? For each root file?

@khurtado
Copy link
Contributor Author

khurtado commented May 5, 2022

  1. Do you mean the first commit, or this PR GH description?

The initial PR description at the very top, here: #11015 (comment) However, having a second look at it, it looks good to me. Perhaps can you expand on the granularity used for the file registration in the new DQM Gui? Is it one HTTP call for each run? For each root file?

Sounds good! I just extended that bullet a little bit.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
  • Python3 Pylint check: failed
    • 13 warnings and errors that must be fixed
    • 3 warnings
    • 56 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 12 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13153/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kenyi, other than the minor exception suggestion, it looks good to me.

src/python/WMCore/WMExceptions.py Outdated Show resolved Hide resolved
- Use upload.URL -> registerURL mapping
- Address PR comments
@khurtado
Copy link
Contributor Author

khurtado commented May 6, 2022

Kenyi, other than the minor exception suggestion, it looks good to me.

@amaltaro Thanks! I just made that last change.
Note we still need to have a successful test with the new changes. It is currently failing while registering the file, but it looks like it's due to cmsweb related issue with the new mapping rather than this code. See my last comment below:

#10287 (comment)

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 13 warnings and errors that must be fixed
    • 3 warnings
    • 56 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 12 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13165/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The new DQM GUI file management
3 participants