Skip to content

Releases: dmwm/WMCore

WMCore 2.3.3 production WMAgent release

01 May 18:36
Compare
Choose a tag to compare

This WMAgent release has a major change affecting stage in/out, where storage JSON has been adopted and the XML format is now deprecated.
In addition, it has a fully functional WorkflowUpdater component, which continuously updates workflow sandboxes with up-to-date pileup information (artifact of the partial pileup feature). On this pileup feature, the agent no longer resolves pileup data location via Rucio, it now fetches this information solely from the MSPileup service.
Furthermore, the Lexicon file has also been updated to support datatiers with digits (e.g. L1SCOUT). Another feature added is the support to EL9 workflows and the automatic detection of container OS during runtime.
Lastly, a few important bugfixes and enhancements are provided with this release.

Release date: 30 April 2024.
Changes since release: 2.3.1.

WMAgent

Software stack

Features and/or feature changes

  • Created MSPileupUtils module and modified GQ CherryPy thread to use MSPileup instead of Rucio (Dennis Lee) #11870
  • New function to get dataset locations from MSPileup and its implementation in StartPolicyInterface #11620 (Andrea Piccinelli) #11879
  • add support for rhel9. Fix #11985 (Stefano Belforte) #11897
  • Add pileup availability logic in WorkflowUpdater component (Valentin Kuznetsov) #11884
  • Abandon storage.xml in favor of storage.json for stage-in and stage-out (nhduongvn) #11869 #11917
  • Initial implementation of DBSConcurrency module (Valentin Kuznetsov) #11913
  • Allow digits in data tier names (German Giraldo) #11930
  • Fix regex for Spec generation with alphanumerical data tiers (German Giraldo) #11951
  • Add new Merge Repack special cases to AccountantWorker (German Giraldo) #11952
  • Allow numbers in DataTier fields in all regular expressions (Todor Ivanov) #11962
  • [Container] Addition of the sites with HPC resources for the WMAgent docker deployment (Andrea Piccinelli) #11966

Bug Fixes

  • Dont cross check location when data is reported without location (Alan Malta Rodrigues) #11878
  • Discover the singularity container OS at runtime. (Todor Ivanov) #11896
  • Decode cmsRun stdout (German Giraldo) #11933
  • Fix print function in WorkflowUpdater (Alan Malta Rodrigues) #11971
  • Add DBS3_READER_URL and WorkflowUpdater.dbsUrl (Valentin Kuznetsov) #11972
  • Add check for creating updateBlockInfo return dictionary (Author: Todor Ivanov) #11975

Enhancements

  • Increase JobSubmitter queue size from 50k to 100k (Alan Malta Rodrigues) #11889

WMCore 2.3.2 production central services release

30 Apr 03:10
Compare
Choose a tag to compare

This central services release provides full functionality for partial pileup, including the relevant changes on the agent side with the WorkflowUpdater component. In addition, we have refactored the Rucio RSE expression for Tape output data placement, which now uses dm_weight attribute instead of the legacy ddm_quota. In addition, datatiers with digits are now supported across the system (e.g. L1SCOUT).

Release date: 30 April 2024.
Changes since release: 2.3.1.

Central services

Software stack

Features and/or feature changes

  • Use MSPileupUtils getPileupDocs in MSTransferor, WorkflowUpdaterPoller (Dennis Lee) #11910
  • Provide concurrent implementation for REST jobdetail API (Alan Malta Rodrigues) #11885
  • Revisit logic of transition records, put code into stand-alone function (Valentin Kuznetsov) #11921
  • Update Tape RSE attribute from ddm_quota to dm_weight in MSOutput (Alan Malta Rodrigues) #11940

Bug Fixes

  • Fix interval for WMStats DataCacheUpdate CherryPy thread (Alan Malta Rodrigues) #11924
  • add BossAir.Plugins.BasePlugin modules to crabtaskworker deps (Thanayut Seethongchuen) #11926
  • Use POST method for getting pileup documents in MSTransferor (Alan Malta Rodrigues) #11936
  • Adjust scopes use in MSPileup (Valentin Kuznetsov) #11938
  • Fix issue with customDID (Valentin Kuznetsov) #11948

Enhancements

  • Added MSPileupUtils getPileupDocs mock and emulator (Dennis Lee) #11905
  • Add code to insert transition record; code re-factoring (Valentin Kuznetsov) #11947

WMAgent

Features and/or feature changes

  • Add pileup availability logic in WorkflowUpdater component (Valentin Kuznetsov) #11884
  • Abandon storage.xml in favor of storage.json for stage-in and stage-out (nhduongvn) #11869 #11917
  • Initial implementation of DBSConcurrency module (Valentin Kuznetsov) #11913
  • Allow digits in data tier names (German Giraldo) #11930
  • Fix regex for Spec generation with alphanumerical data tiers (German Giraldo) #11951
  • Add new Merge Repack special cases to AccountantWorker (German Giraldo) #11952
  • Allow numbers in DataTier fields in all regular expressions (Todor Ivanov) #11962
  • [Container] Addition of the sites with HPC resources for the WMAgent docker deployment (Andrea Piccinelli) #11966

Bug Fixes

  • Decode cmsRun stdout (German Giraldo) #11933
  • Fix print function in WorkflowUpdater (Alan Malta Rodrigues) #11971
  • Add DBS3_READER_URL and WorkflowUpdater.dbsUrl (Valentin Kuznetsov) #11972
  • Add check for creating updateBlockInfo return dictionary (Author: Todor Ivanov) #11975

Enhancements

WMCore 2.3.1 production central services release

21 Feb 13:44
Compare
Choose a tag to compare

This release brings in full functionality for partial pileup data placement, note however that it requires further developments and deployment of a new WMAgent release before it can be adopted in operations.
We have also refactored pileup data location across WM services, now relying solely on MSPileup. In addition, DQMHarvest workflows will not have a full container input data placement, followed by the relevant changes in the WorkQueue Dataset start policy.
On the WMAgent side, the system now support workflows requesting EL9 Operating System (and their variations). Default ScramArch has been removed for Cleanup jobs, which now auto-discover the OS+Arch and bootstrap the code accordingly.

Release date: 21 Feb 2024.
Changes since release: 2.2.6.

Central services

Software stack

Features and/or feature changes

  • Implement partialPileupTask task logic (Valentin Kuznetsov) #11807
  • Python (standard library) implementation of update pileup object script (Valentin Kuznetsov) #11872
  • Created MSPileupUtils module and modified GQ CherryPy thread to use MSPileup instead of Rucio (Dennis Lee) #11870
  • New function to get dataset locations from MSPileup and its implementation in StartPolicyInterface #11620 (Andrea Piccinelli) #11879
  • Make container level data placement for DQMHarvest; update Start.Policy.Dataset to container too (Alan Malta Rodrigues) #11894

Bug Fixes

  • Dont cross check location when data is reported without location (Alan Malta Rodrigues) #11878
  • Parse MSPileup filter as string if only listing one (Dennis Lee) #11886
  • Correctly pass dbsUrl to the locationsFromMSPileup method (Todor Ivanov) #11906
  • Fix variable name in log record; complement to #11879 (Alan Malta Rodrigues) #11908

Enhancements

  • providing a checklist for code review (Alan Malta Rodrigues)

WMAgent

Features and/or feature changes

  • Created MSPileupUtils module and modified GQ CherryPy thread to use MSPileup instead of Rucio (Dennis Lee) #11870
  • New function to get dataset locations from MSPileup and its implementation in StartPolicyInterface #11620 (Andrea Piccinelli) #11879
  • add support for rhel9. Fix #11985 (Stefano Belforte) #11897

Bug Fixes

  • Discover the singularity container OS at runtime. (Todor Ivanov) #11896

Enhancements

  • Increase JobSubmitter queue size from 50k to 100k (Alan Malta Rodrigues) #11889

WMAgent 2.3.0 WMAgent production release

18 Jan 03:01
Compare
Choose a tag to compare

This version brings in an initial implementation of a new agent component, called WorkflowUpdater, which will be used to continuously update the pileup location and the workflow sandbox.
It also changes the behavior of site disallowed list, which now will be enforced across all tasks of a workflow. Concerning the job runtime, PSet tweaks have been made more verbose and we now dump some basic information about the environment and cmsRun steps, to be in the future used for job customization.
This release has some other important bug fixes and overall enhancements of the agent.

Release date: 17 January 2024.
Changes since release: 2.2.5.

WMAgent

Software stack

Features and/or feature changes

  • Inherit siteLists from upper level task while creating WMBS subscriptions (Todor Trendafilov Ivanov) #11724
  • Initial implementation for WorkflowUpdater component (Alan Malta Rodrigues) #11795 #11859
  • Give priority to older workflows when fetching from database for JobSubmitter (German Giraldo) #11804
  • Add loging and decode output of pre-scripts (German Giraldo) #11803
  • Add a generic script for deploying wmagent inside a virtual environment. (Todor Ivanov) #11624
  • Add runtime information json. (Kenyi Hurtado) #11812

Bug Fixes

  • Fix logic for updating task-level site thresholds (Alan Malta Rodrigues) #11776
  • Fix ChangeState logic for limiting number of docs to commit in bulk (Alan Malta Rodrigues) #11786
  • Do not fail workflows due to empty location in global workqueue (Valentin Kuznetsov) #11810 #11862
  • Deal with invalid blocks in WMBS for Dataset start policy (Alan Malta Rodrigues) #11838

Enhancements

  • Bump deploy-wmagent script to version 2.2.5; insert T3_US_Ookami (Alan Malta Rodrigues) #11766
  • Fix for 11239: Replaced instances of logging.warn to logging.warning (Dennis Lee) #11785
  • Make DBS3Upload slightly more verbose (Alan Malta Rodrigues) #11768
  • Fixed Inappropriate Logical Expression (fabihatasneem) #11808

WMCore 2.2.6 production central services release

16 Jan 11:39
Compare
Choose a tag to compare

Despite not being yet available to the end users, initial changes for supporting partial pileup have been integrated into this release. The StepChain parentage CherryPy thread has been refactored to support partial block resolution and to be more resilient. MSUnmerged had some performance related improvements. Cleanup of unneeded rules (MSRuleCleaner) is no longer dependent on the status of the StepChain parentage resolution. Global WorkQueue will no longer fail workflows that have an incomplete input data placement and/or in cases where the data has not arrived at the final destination, instead it relies on a continuous data location daemon. Lastly, this release brings in many WMAgent and central services improvements.

Release date: 16 Jan 2024.
Changes since release: 2.2.4.

Central services

Software stack

  • Replace svn by git in GitHub actions (Alan Malta Rodrigues) #11858

Features and/or feature changes

  • Refactor StepChainParentage thread to resolve by workflow (Alan Malta Rodrigues) #11694
  • Deal with partial block in the parentage fix (Alan Malta Rodrigues) #11757 #11779
  • Remove unused Unified configuration and code; add schema and validation (Alan Malta Rodrigues) #11770
  • MSUnmerged: Try to remove the base directory first and avoid recursive operations (#11781) (Todor Ivanov) #11781
  • MSUnmerged: Handle gfal exceptions while listing baseDirEntry && Avoid extra stat operations during recursion. (#11794) (Todor Ivanov) #11794
  • MSPileup: Add support for customName in MSPileup (Valentin Kuznetsov) #11765
  • MSPileup: Adjust to use customName along with pileupName (Valentin Kuznetsov) #11769
  • MSPileup: Introduce transition attribute in MSPileup record (Valentin Kuznetsov) #11802
  • MSRuleCleaner: Moved ParentageResolved check from dispatch to archive (Dennis Lee) #11805
  • MSTransferor: Ensure MSTransferor does not request more copies than RSEs available (Alan Malta Rodrigues) #11844

Bug Fixes

  • Do not fail workflows due to empty location in global workqueue (Valentin Kuznetsov) #11810 #11862

Enhancements

  • Update test json templates with MINIAODSIM; fix DQMHarvest ; GPU StepChain; ReReco 2022C; etc (Alan Malta Rodrigues) #11238
  • Fix for 11239: Replaced instances of logging.warn to logging.warning (Dennis Lee) #11785
  • Add MSPileup and refactor MongoDB in the WM Schematic (Alan Malta Rodrigues) #11789
  • New test campaigns added to parseUnifiedCampaigns (Alan Malta Rodrigues) #11806
  • Added aborted-completed to the state transition diagram (Alan Malta Rodrigues) #11809

WMAgent

Features and/or feature changes

  • Inherit siteLists from upper level task while creating WMBS subscriptions (Todor Trendafilov Ivanov) #11724
  • Initial implementation for WorkflowUpdater component (Alan Malta Rodrigues) #11795 #11859
  • Give priority to older workflows when fetching from database for JobSubmitter (German Giraldo) #11804
  • Add loging and decode output of pre-scripts (German Giraldo) #11803
  • Add a generic script for deploying wmagent inside a virtual environment. (Todor Ivanov) #11624
  • Add runtime information json. (Kenyi Hurtado) #11812

Bug Fixes

  • Fix logic for updating task-level site thresholds (Alan Malta Rodrigues) #11776
  • Fix ChangeState logic for limiting number of docs to commit in bulk (Alan Malta Rodrigues) #11786
  • Do not fail workflows due to empty location in global workqueue (Valentin Kuznetsov) #11810 #11862
  • Deal with invalid blocks in WMBS for Dataset start policy (Alan Malta Rodrigues) #11838

Enhancements

  • Bump deploy-wmagent script to version 2.2.5; insert T3_US_Ookami (Alan Malta Rodrigues) #11766
  • Fix for 11239: Replaced instances of logging.warn to logging.warning (Dennis Lee) #11785
  • Make DBS3Upload slightly more verbose (Alan Malta Rodrigues) #11768
  • Fixed Inappropriate Logical Expression (fabihatasneem) #11808

WMAgent 2.2.5 WMAgent production release

13 Oct 18:24
Compare
Choose a tag to compare

Most of the changes in this cycle have been integrated into WMAgent. Starting with grid jobs, that now carry out two new job classads:

  • CMS_extendedJobType: which is used to characterize the physics task type of the job (StepChain could have a comma separated list).
  • CMS_CampaignName: which now propagates the request high level description all the way to the job (StepChain could have a comma separated list).

More on the WMAgent job monitoring, it now propagates all of the CMSSW FJR performance metrics to MonIT (through WMArchive index).
In addition, there are two new sections with performance metrics that are uploaded to WMArchive, they are:

  • WMTiming: it contains timing information captured by the job wrapper, thus relative to the whole grid job.
  • WMCMSSWSubprocess: it contains timing information related to a given cmsRun step executed by the job.

Release date: 12 October 2023.
Changes since release: 2.2.3.1.

WMAgent

Software stack

Features and/or feature changes

  • Add characterization and propagation of task type based on cmsDriver step arguments (Kenyi Hurtado Anampa) #11680
  • Add campaign name attribute to base WMTask object and propagate it to the job level as a classad (Kenyi Hurtado Anampa) #11710 #11760
  • Add campaign names support for stepchain workflows (Kenyi Hurtado Anampa) #11738
  • Add CMSSW metrics to FJR (Valentin Kuznetsov) #11663
  • Add CMSSW performance metrics to WMArchive document (Valentin Kuznetsov) #11696
  • Adds WMCMSSWSubprocess metrics to FJR document (Valentin Kuznetsov) #11716
  • Add WMCMSSWSubprocess and WMTiming metrics to WMArchive document (Valentin Kuznetsov) #11692
  • Provide in the FWJR the CPU and wallclock time for CMSSW subprocesses (Valentin Kuznetsov) #11665
  • Provide timestamps metrics about WM job (Valentin Kuznetsov) #11656 #11726
  • Change default rucio pileup account (khurtado) #11673

Bug Fixes

  • Parse /proc//smaps_rollup if present && Reduce string concatenation operations (Todor Ivanov) #11676

Enhancements

  • Fix use of input function in unregister-wmstats script (Alan Malta Rodrigues) #11688

WMCore 2.2.4 production central services release

04 Oct 08:26
Compare
Choose a tag to compare

This release supports two new workload attributes: physics task type and campaign name, to properly characterize those on the worker nodes via condor job classads.
In addition, new performance metrics have been added to the job report file and propagated all the way to WMArchive. They are called WMTiming and WMCMSSWSubprocess, in addition to all of the CMSSW performance metrics that are now fetched from the Framework Job Report and published all the way to WMArchive.
It also includes a few bug fixes and usual code enhancements.

Release date: 4 Oct 2023.
Changes since release: 2.2.2.

Central services

Software stack

  • Update requirements for HTCondor 10.2.3 (Alan Malta Rodrigues) #11691

Features and/or feature changes

  • Add characterization and propagation of task type based on cmsDriver step arguments (Kenyi Hurtado Anampa) #11680
  • Add campaign name attribute to base WMTask object and propagate it to the job level as a classad (Kenyi Hurtado Anampa) #11710 #11760
  • Add campaign names support for stepchain workflows (Kenyi Hurtado Anampa) #11738
  • Adopt wmcore_pileup Rucio account in workqueue (Alan Malta Rodrigues) #11670

Bug Fixes

  • Update McM client (Geovanny Gonzalez-Rodriguez) #11672

Enhancements

  • Retry svn checkout in the CD pipeline up to 5 times (Alan Malta Rodrigues) #11752
  • Changes in src/Utils and test/Utils_t to remove py2 compatibilities (anpicci) #11618

WMAgent

Features and/or feature changes

  • Add CMSSW metrics to FJR (Valentin Kuznetsov) #11663
  • Add CMSSW performance metrics to WMArchive document (Valentin Kuznetsov) #11696
  • Add additional variables to WMAgent.secrets needed for the docker container intialization process. (Todor Ivanov) #11717
  • Adds WMCMSSWSubprocess metrics to FJR document (Valentin Kuznetsov) #11716
  • Add WMCMSSWSubprocess and WMTiming metrics to WMArchive document (Valentin Kuznetsov) #11692
  • Provide in the FWJR the CPU and wallclock time for CMSSW subprocesses (Valentin Kuznetsov) #11665
  • Provide timestamps metrics about WM job (Valentin Kuznetsov) #11656 #11726
  • Change default rucio pileup account (khurtado) #11673

Bug Fixes

  • Parse /proc//smaps_rollup if present && Reduce string concatenation operations (Todor Ivanov) #11676

Enhancements

  • Fix use of input function in unregister-wmstats script (Alan Malta Rodrigues) #11688

WMAgent 2.2.3.1 WMAgent production release

15 Jul 13:08
Compare
Choose a tag to compare

Release providing a few important new and feature changes, and overall enhancements to make the agent more resilient.

Release date: 14 July 2023.
Changes since release: 2.2.0.4.

WMAgent

Software stack

  • Upgrade pip-based HTCondor from 8.9.7 to 10.2.3 (Alan Malta Rodrigues) cmsdist#8582

Features and/or feature changes

  • Add new exception/error code for jobs removed by condor for unknown reasons. (khurtado) #11649
  • Explicitly enabled verbose and abort on failure for GFAL2 plugin (Alan Malta Rodrigues) #11636
  • Make ResourceControlUpdater to continuously update PNNs in the database (Alan Malta Rodrigues) #11599
  • Replace imp by importlib (Valentin Kuznetsov) #11530

Bug Fixes

  • Gracefully parse cpu performance metrics in SummaryDB (Alan Malta Rodrigues) #11590
  • Use previous existStatus code instead of default 99108 (Valentin Kuznetsov) #11581
  • Fix manage path in the restartComponent script (Alan Malta Rodrigues) #11572
  • Add new CMSCouch exception for Request Entity Too Large (Alan Malta Rodrigues) #11502

Enhancements

WMCore 2.2.2 production central services release

11 Jul 17:15
Compare
Choose a tag to compare

This release is mostly providing enhancements and some important bug fixing, like a potential resolution for the GFAL2 stage out issues; correction of the workflow status transition for growing workflows; and the ability to automatically insert new RSEs (PNNs) into the agent database.
In addition, we have taken one step closer to a WMAgent containerization solution, which also involved some refactoring of our CD pipeline workflow.
Lastly, this is a base release for the upcoming WMAgent upgrade cycle.

Release date: 11 July 2023.
Changes since release: 2.2.1.

Central services

Software stack

  • Upgrade pip-based HTCondor from 8.9.7 to 10.2.3 (Alan Malta Rodrigues) cmsdist#8582
  • Changes to install WMAgent from PyPi; provided install and run.sh scripts (Todor Ivanov) CMSKubernetes#1393
  • Create a wmagent-base Dockerfile separating OS from application dependencies (Alan Malta Rodrigues) CMSKubernetes#1394
  • Update all WM Pypi Dockerfiles to take TAG as build argument (Alan Malta Rodrigues) CMSKubernetes#1397

Features and/or feature changes

  • Continuously update PNNs with ResourceControlUpdater (Alan Malta Rodrigues) #11599
  • New CouchDB view for WorkQueue OpenForNewData requests; add constraint to status transition (Alan Malta Rodrigues) #11611
  • CI/CD: Substitute curl with svn in GH actions workflow (Todor Ivanov) #11639
  • CI/CD: Add Docker context path to docker/build-push-action@v1 (Todor Ivanov) #11642
  • CI/CD: Update GH action for docker image to use a build argument (Alan Malta Rodrigues) #11638 #11652
  • CI/CD: Fix GH action to build/push docker image (Alan Malta Rodrigues) #11651
  • CI/CD: Refactor docker build/push workflow action (Alan Malta Rodrigues) #11653

Bug Fixes

  • Change GH api to build release notes; plus refactoring (Alan Malta Rodrigues) #11645 #11650

Enhancements

WMAgent

Features and/or feature changes

  • Explicitly enabled verbose and abort on failure for GFAL2 plugin (Alan Malta Rodrigues) #11636
  • Add new exception/error code for jobs removed by condor for unknown reasons. (khurtado) #11649

Bug Fixes

Enhancements

WMCore 2.2.1 production central services release

31 May 13:48
Compare
Choose a tag to compare

This release contains support to GPU StepChain workflows - assuming that steps have the same GPU requirements. It brings in many bug fixes and improvements to both central services and WMAgent. Note that the imp python library has been fully replaced by the standard importlib as well. Lastly, further changes to the Docker images have been made such that all central services run prometheus exporter and populate service metrics to MonIT.

Release date: 31 May 2023.
Changes since release: 2.2.0.2.

Central services

Software stack

  • Update dmwm-base image to pypi-20230525, fixing process_exporter

Features and/or feature changes

  • Add ReqMgr2 validateRunlist (Valentin Kuznetsov) #11535
  • Replace imp by importlib (Valentin Kuznetsov) #11530
  • Add GPU support to the StepChain spec (Alan Malta Rodrigues) #11588

Bug Fixes

  • MSPileup: Remove postToAMQ (post_to_amq) and fix typo in doc_type_amq (Valentin Kuznetsov) #11573
  • Fix racing condition in unit test with CouchDB (Valentin Kuznetsov) #11540
  • Fix broken module path and module name during import (#11587) (Todor Ivanov) #11587
  • MSPileup: detect active containers with no wmcore_transferor rules (Alan Malta Rodrigues) #11579

Enhancements

  • MSPileup consider Neutrino PDs as premix (Alan Malta Rodrigues) #11543

WMAgent

Features and/or feature changes

  • Replace imp by importlib (Valentin Kuznetsov) #11530

Bug Fixes

  • Fix broken module path and module name during import (#11587) (Todor Ivanov) #11587
  • Fix manage path in the restartComponent script (Alan Malta Rodrigues) #11572
  • Add new CMSCouch exception for Request Entity Too Large (Alan Malta Rodrigues) #11502
  • Use previous exitStatus code instead of default 99108 (Valentin Kuznetsov) #11581
  • PyPi: Fix missing static dependencies for wmagent package (Todor Ivanov) #11586
  • PyPi: Fix missing sublevel areas of packages in the wm-database module (#11592) (Todor Ivanov) #11592
  • Gracefully parse cpu performance metrics in SummaryDB (Alan Malta Rodrigues) #11590

Enhancements