Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly pass dbsUrl to the locationsFromMSPileup method #11906

Merged

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Feb 20, 2024

Fixes #11903

Status

READY

Description

We were wrongly passing the whole dbs object to the locationsFromMSPileup method in WMCore.WorkQueue.DataLocationMapper and later trying to parse it as a string in order to find out whether it was a testbed instance or not. This is used later as a marker to the actual MSPileup instance we are currently configured against.

With this fix we are now refering to the dbs.dbsURL attribute in this parsing procedure which should fix the issue. But the question, whether using the dbsURL in order to find out the MSPileup instance is the optimal way, still remains.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 4 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 6 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14890/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the bugfix_GWQFailsToFetchDataMSPilup_fix-11903 branch from aaa02b3 to 2748054 Compare February 20, 2024 08:46
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 4 warnings
    • 19 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 6 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14891/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

Since it was a short fix, I have tested it by patching directly the testbed workqueue and just restarting the threads. Here [1] is the output. It clearly fixes the issue and I also expect the relevant workflows to be able to advance their status. Here is one to check later: tivanov_TaskChain_PUMCRecyc_Feb2024_Val_v2_240215_150924_2146

FYI @amaltaro, @anpicci

[1]

2024-02-20 08:37:59,153:INFO:CherryPyPeriodicTask:Setting CherryPy periodic task with the following config:
locationUpdateTask.reqMgrConfig = {'reqmgr2_endpoint': 'https://cmsweb-testbed.cern.ch/reqmgr2', 'central_logdb_url': 'https://cmsweb-testbed.cern.ch/couchdb/wmstats_logdb', 'log_reporter': 'global_workqueue'}
locationUpdateTask.queueParams = {'CouchUrl': 'https://cmsweb-testbed.cern.ch/couchdb', 'DbName': 'workqueue', 'InboxDbName': 'workqueue_inbox', 'WMStatsCouchUrl': 'https://cmsweb-testbed.cern.ch/couchdb/wmstats', 'QueueURL': 'https://cmsweb-testbed.cern.ch/couchdb/workqueue', 'ReqMgrServiceURL': 'https://cmsweb-testbed.cern.ch/reqmgr2', 'RequestDBURL': 'https://cmsweb-testbed.cern.ch/couchdb/reqmgr_workload_cache', 'central_logdb_url': 'https://cmsweb-testbed.cern.ch/couchdb/wmstats_logdb', 'log_reporter': 'global_workqueue', 'rucioAccount': 'wmcore_transferor', 'rucioAccountPU': 'wmcore_pileup', 'rucioAuthUrl': 'https://cms-rucio-auth.cern.ch', 'rucioUrl': 'http://cms-rucio.cern.ch'}
locationUpdateTask.central_logdb_url = 'https://cmsweb-testbed.cern.ch/couchdb/wmstats_logdb'
locationUpdateTask.locationUpdateDuration = 21600
locationUpdateTask.log_reporter = 'global_workqueue'
locationUpdateTask.object = 'WMCore.GlobalWorkQueue.CherryPyThreads.LocationUpdateTask.LocationUpdateTask'
locationUpdateTask.log_file = '/data/srv/logs/workqueue/locationUpdateTask-workqueue-665c48ff9-mpz7k-20240220.log'

2024-02-20 08:37:59,301:INFO:LogDB:<LogDB(url=https://cmsweb-testbed.cern.ch/couchdb/wmstats_logdb, identifier=global_workqueue, agent=1)>
2024-02-20 08:37:59,309:INFO:Rucio:WMCore Rucio initialization parameters: {'account': 'wmcore_transferor', 'rucio_host': 'http://cms-rucio.cern.ch', 'auth_host': 'https://cms-rucio-auth.cern.ch', 'ca_cert': None, 'auth_type': None, 'creds': None, 'timeout': 600, 'user_agent': 'wmcore-client'}
2024-02-20 08:37:59,322:INFO:Rucio:Rucio client initialization parameters: {'host': 'http://cms-rucio.cern.ch', 'auth_host': 'https://cms-rucio-auth.cern.ch', 'auth_type': 'x509', 'account': 'wmcore_transferor', 'user_agent': 'wmcore-client/1.29.16', 'ca_cert': '/etc/grid-security/certificates/', 'creds': {'client_cert': '/data/srv/current/auth/workqueue/dmwm-service-cert.pem', 'client_key': '/data/srv/current/auth/workqueue/dmwm-service-key.pem'}, 'timeout': 600, 'request_retries': 3}
2024-02-20 08:37:59,335:INFO:LogDB:<LogDB(url=https://cmsweb-testbed.cern.ch/couchdb/wmstats_logdb, identifier=global_workqueue, agent=1)>
2024-02-20 08:37:59,509:INFO:WorkQueue:Executing data location update...
2024-02-20 08:37:59,857:INFO:DataLocationMapper:Fetching location from Rucio for account: wmcore_transferor
2024-02-20 08:37:59,975:INFO:Rucio:Container: /QCDB-4Jets_HT-40to100_TuneCP5_13p6TeV_madgraphMLM-pythia8/Run3Summer23MiniAODv4-130X_mcRun3_2023_realistic_v14-v2/MINIAODSIM with container-based location at: {'T1_US_FNAL_Disk'}
2024-02-20 08:38:00,993:INFO:Rucio:Container: /QCDB-4Jets_HT-40to100_TuneCP5_13p6TeV_madgraphMLM-pythia8/Run3Summer23MiniAODv4-130X_mcRun3_2023_realistic_v14-v2/MINIAODSIM with block-based location at: {'T1_RU_JINR_Disk', 'T1_US_FNAL_Disk'}, and final location: ['T1_RU_JINR_Disk', 'T1_US_FNAL_Disk']
2024-02-20 08:38:01,157:INFO:DataLocationMapper:Fetching location from Rucio for account: wmcore_transferor
2024-02-20 08:38:01,772:INFO:DataLocationMapper:Found 15 unique input data to update location
2024-02-20 08:38:02,799:INFO:DataLocationMapper:Updating 0 elements for Input location update
2024-02-20 08:38:02,986:INFO:DataLocationMapper:Fetching location from Rucio for account: wmcore_transferor
2024-02-20 08:38:03,877:INFO:DataLocationMapper:Found 21 unique parent data to update location
2024-02-20 08:38:05,975:INFO:DataLocationMapper:Updating 0 elements for Parent location update
2024-02-20 08:38:06,115:INFO:DataLocationMapper:Fetching locations from MSPileup for 1
2024-02-20 08:38:06,258:INFO:DataLocationMapper:locationsFromPileup - name: /Neutrino_E-10_gun/Run3Summer21PrePremix-Summer22_124X_mcRun3_2022_realistic_v11-v2/PREMIX, currentRSEs: ['T1_US_FNAL_Disk', 'T2_CH_CERN'], containerFraction: 1.0
2024-02-20 08:38:06,307:INFO:DataLocationMapper:Fetching locations from MSPileup for 4
2024-02-20 08:38:06,390:INFO:DataLocationMapper:locationsFromPileup - name: /Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX, currentRSEs: ['T1_US_FNAL_Disk', 'T2_CH_CERN'], containerFraction: 1.0
2024-02-20 08:38:06,491:INFO:DataLocationMapper:locationsFromPileup - name: /RelValMinBias_14TeV/CMSSW_10_6_1-106X_mcRun3_2021_realistic_v1_rsb-v1/GEN-SIM, currentRSEs: ['T1_US_FNAL_Disk', 'T2_CH_CERN'], containerFraction: 1.0
2024-02-20 08:38:06,588:INFO:DataLocationMapper:locationsFromPileup - name: /RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM, currentRSEs: ['T2_CH_CERN'], containerFraction: 1.0
2024-02-20 08:38:06,763:INFO:DataLocationMapper:locationsFromPileup - name: /RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM, currentRSEs: ['T2_CH_CERN'], containerFraction: 1.0
2024-02-20 08:38:06,800:INFO:DataLocationMapper:Found 5 unique pileup data to update location
2024-02-20 08:38:07,067:INFO:DataLocationMapper:/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX, setting location to: ['T1_US_FNAL_Disk', 'T2_CH_CERN']
2024-02-20 08:38:07,068:INFO:DataLocationMapper:/Neutrino_E-10_gun/RunIISummer20ULPrePremix-UL16_106X_mcRun2_asymptotic_v13-v1/PREMIX, setting location to: ['T1_US_FNAL_Disk', 'T2_CH_CERN']
2024-02-20 08:38:07,182:INFO:DataLocationMapper:/RelValMinBias_14TeV/CMSSW_10_6_1-106X_mcRun3_2021_realistic_v1_rsb-v1/GEN-SIM, setting location to: ['T1_US_FNAL_Disk', 'T2_CH_CERN']
2024-02-20 08:38:07,298:INFO:DataLocationMapper:/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM, setting location to: ['T2_CH_CERN']
2024-02-20 08:38:07,312:INFO:DataLocationMapper:/RelValMinBias_14TeV/CMSSW_11_2_0_pre8-112X_mcRun3_2024_realistic_v10_forTrk-v1/GEN-SIM, setting location to: ['T2_CH_CERN']
2024-02-20 08:38:07,442:INFO:DataLocationMapper:/RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM, setting location to: ['T2_CH_CERN']
2024-02-20 08:38:07,498:INFO:DataLocationMapper:/RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM, setting location to: ['T2_CH_CERN']
2024-02-20 08:38:07,536:INFO:DataLocationMapper:/RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM, setting location to: ['T2_CH_CERN']
2024-02-20 08:38:07,573:INFO:DataLocationMapper:Updating 6 elements for Pileup location update
2024-02-20 08:38:08,096:INFO:LocationUpdateTask:LocationUpdateTask executed in 8.587 secs and updated 6 non-unique elements

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Todor, it looks good to me.

@amaltaro amaltaro merged commit ed0bd54 into dmwm:master Feb 20, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GlobalWorkQueue: DataLocationMapper fails to get data from MSPileup
3 participants