Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Deletable Blocks checks in two stages. #11127

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented May 4, 2022

Fixes #11042

Status

ready

Description

With the current PR we split the deletableBlocks checks from RucioInjector in two steps - first we fetch all the block in closed state as usual, but we do not require the whole workflow information to be cleaned from WMBS. In the following step we fetch another list of workflows which are suitable for deletion, but will eventually fall under the conditions ruled by the archiveDelayHours configuration parameter. Upon that we check for which of the so found blocks actually have been produced by an already 'deletable' workflow. Once the intersection between those two lists is made the final set of blocks to be deleted is provided to the rest of the code as before.

During the above check we also apply another requirement for the block. Before we add it for deletion we check if its lifetime is bigger than a certain configurable value. We set that from the agent configuration with:
config.RucioInjector.blockDeletionDelayHours and the clock for measuring the block lifetime is started with the blockCreateTime. It would have been better to have this started with the workflow completion instead, but there is no cheap way of fetching that information from RucioInjector or from the agent as a whole actually.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

None

We do have a new configuration variable introduced with the PR:
config.RucioInjector.blockDeletionDelayHours
But this one is to be provided from the agent configuration, so no service_config PR has been created for it.

External dependencies / deployment changes

None

@todor-ivanov todor-ivanov requested a review from amaltaro May 4, 2022 10:57
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
  • Python3 Pylint check: failed
    • 18 warnings and errors that must be fixed
    • 6 warnings
    • 13 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13127/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented May 4, 2022

And here is the output from the CompletedBlocks query: [1]. Which I believe is the expected output.
This is for a replay wich is in the following state:

  • Repack released && Completed but not archived -> Its blocks are in Completed && NOT Deleted
  • PromptReco is Released nut NOT Comleted -> No output blocks yet

As of the output from the DeletableWorkflows query I've split the query in pieces:
[2] - All the completed workflows from this replay - it is only the Repack visible, PromptReco is not yet completed.
[3] - All the completed workflows with child workflows NOT completed - again only the repack visible - because it has a PromptReco workflow associated with it
[4] - The full query - no workflows satisfy the full intersection
[5] - The full query but without the constraint for child workflows completion

FYI @germanfgv @amaltaro @hufnagel @drkovalskyi

[1]

SQL> SELECT dbsbuffer_block.blockname,
  2         dbsbuffer_location.pnn,
  3         dbsbuffer_dataset.path,
  4         dbsbuffer_dataset_subscription.site,
  5         dbsbuffer_workflow.name
  6  FROM dbsbuffer_dataset_subscription
  7  INNER JOIN dbsbuffer_dataset ON
  8    dbsbuffer_dataset.id = dbsbuffer_dataset_subscription.dataset_id
  9  INNER JOIN dbsbuffer_block ON
 10    dbsbuffer_block.dataset_id = dbsbuffer_dataset_subscription.dataset_id
 11  INNER JOIN dbsbuffer_file ON
 12    dbsbuffer_file.block_id = dbsbuffer_block.id
 13  INNER JOIN dbsbuffer_workflow ON
 14    dbsbuffer_workflow.id = dbsbuffer_file.workflow
 15  INNER JOIN dbsbuffer_location ON
 16    dbsbuffer_location.id = dbsbuffer_block.location
 17  WHERE dbsbuffer_dataset_subscription.delete_blocks = 1
 18  AND dbsbuffer_dataset_subscription.subscribed = 1
 19  AND dbsbuffer_block.status = 'Closed'
 20  AND dbsbuffer_block.deleted = 0
 21  GROUP BY dbsbuffer_block.blockname,
 22           dbsbuffer_location.pnn,
 23           dbsbuffer_dataset.path,
 24           dbsbuffer_dataset_subscription.site,
 25           dbsbuffer_workflow.name
 26  HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed);

BLOCKNAME								     PNN	     PATH				      SITE	      NAME
---------------------------------------------------------------------------- --------------- ---------------------------------------- --------------- ------------------------------------------------------------------------------------------
/Cosmics/Tier0_REPLAY_2022-v425/RAW#559fa049-e3aa-4ba0-8ce0-333da56e5537     T0_CH_CERN_Disk /Cosmics/Tier0_REPLAY_2022-v425/RAW      T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/HLTPhysics/Tier0_REPLAY_2022-v425/RAW#c143e216-03b9-4a22-b1f1-45a136e819b8  T0_CH_CERN_Disk /HLTPhysics/Tier0_REPLAY_2022-v425/RAW   T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/MinimumBias/Tier0_REPLAY_2022-v425/RAW#5c56ac08-5792-48f5-b6e0-a12addd0f0cb T0_CH_CERN_Disk /MinimumBias/Tier0_REPLAY_2022-v425/RAW  T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/HcalNZS/Tier0_REPLAY_2022-v425/RAW#56efc611-056d-46bb-ae38-ed1311e58d2a     T0_CH_CERN_Disk /HcalNZS/Tier0_REPLAY_2022-v425/RAW      T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/NoBPTX/Tier0_REPLAY_2022-v425/RAW#a7961d75-df91-4fd4-a2ac-248c980819fe      T0_CH_CERN_Disk /NoBPTX/Tier0_REPLAY_2022-v425/RAW       T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114

[2]

SQL> SELECT Distinct name FROM wmbs_workflow
  2              WHERE name NOT IN (SELECT DISTINCT ww.name FROM wmbs_workflow ww
  3                                 INNER JOIN wmbs_subscription ws ON
  4                                            ws.workflow = ww.id
  5                                 WHERE ws.finished =0);

NAME
--------------------------------------------------------------------------------
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114

[3]

SQL> SELECT DISTINCT ww.name FROM wmbs_workflow ww
  2             INNER JOIN wmbs_subscription ws ON
  3                        ws.workflow = ww.id
  4             INNER JOIN wmbs_fileset wfs ON
  5                        wfs.id = ws.fileset
  6             INNER JOIN wmbs_fileset_files wfsf ON
  7                        wfsf.fileset = wfs.id
  8             INNER JOIN wmbs_file_parent wfp ON
  9                        wfp.parent = wfsf.fileid
 10             INNER JOIN wmbs_fileset_files child_fileset ON
 11                        child_fileset.fileid = wfp.child
 12             INNER JOIN wmbs_subscription child_subscription ON
 13                        child_subscription.fileset = child_fileset.fileset
 14             WHERE child_subscription.finished = 0;


NAME
--------------------------------------------------------------------------------
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114

[4]

SQL> SELECT DISTINCT wmbs_workflow.name,
  2                  wmbs_workflow.spec,
  3                  wmbs_workflow.id AS workflow_id,
  4                  wmbs_subscription.id AS sub_id
  5  FROM wmbs_subscription
  6  INNER JOIN wmbs_workflow ON
  7             wmbs_workflow.id = wmbs_subscription.workflow
  8  INNER JOIN (SELECT name FROM wmbs_workflow
  9              WHERE name NOT IN (SELECT DISTINCT ww.name FROM wmbs_workflow ww
 10                                 INNER JOIN wmbs_subscription ws ON
 11                                            ws.workflow = ww.id
 12                                 WHERE ws.finished =0)) complete_workflow ON
 13             complete_workflow.name = wmbs_workflow.name
 14  WHERE wmbs_workflow.name NOT IN (
 15             SELECT DISTINCT ww.name FROM wmbs_workflow ww
 16             INNER JOIN wmbs_subscription ws ON
 17                        ws.workflow = ww.id
 18             INNER JOIN wmbs_fileset wfs ON
 19                        wfs.id = ws.fileset
 20             INNER JOIN wmbs_fileset_files wfsf ON
 21                        wfsf.fileset = wfs.id
 22             INNER JOIN wmbs_file_parent wfp ON
 23                        wfp.parent = wfsf.fileid
 24             INNER JOIN wmbs_fileset_files child_fileset ON
 25                        child_fileset.fileid = wfp.child
 26             INNER JOIN wmbs_subscription child_subscription ON
 27                        child_subscription.fileset = child_fileset.fileset
 28             WHERE child_subscription.finished = 0);

no rows selected

[5]

SQL> SELECT DISTINCT wmbs_workflow.name, wmbs_workflow.spec,
  2             wmbs_workflow.id AS workflow_id, wmbs_subscription.id AS sub_id
  3      FROM wmbs_subscription
  4          INNER JOIN wmbs_workflow ON
  5              wmbs_workflow.id = wmbs_subscription.workflow
  6          INNER JOIN (SELECT name FROM wmbs_workflow
  7                      WHERE name NOT IN (
  8                          SELECT DISTINCT ww.name FROM wmbs_workflow ww
  9                                INNER JOIN wmbs_subscription ws
 10                                   ON ws.workflow = ww.id
 11                          WHERE ws.finished =0)) complete_workflow ON
 12              complete_workflow.name = wmbs_workflow.name;

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     45 	45
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     48 	48
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     46 	46
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     47 	47
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     50 	50
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     56 	56
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     58 	58
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     42 	42
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     51 	51
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     54 	54
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     55 	55
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     43 	43
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     49 	49
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     52 	52
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     53 	53
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     57 	57
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     44 	44

17 rows selected.

@todor-ivanov todor-ivanov force-pushed the feature_T0_DisentangleBlockDelition_fix-11042 branch from 929ce18 to 8cc2ac6 Compare May 4, 2022 13:36
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 15 warnings and errors that must be fixed
    • 6 warnings
    • 13 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13128/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented May 4, 2022

And here is the new state:

  • Express released && Completed
  • Repack released && Completed but not archived -> Its blocks are in Completed && NOT Deleted
  • PromptReco is Released nut NOT Comleted -> No output blocks yet

Full set of completed blocks:

SQL> SELECT dbsbuffer_block.blockname,
  2         dbsbuffer_location.pnn,
  3         dbsbuffer_dataset.path,
  4         dbsbuffer_dataset_subscription.site,
  5         dbsbuffer_workflow.name
  6  FROM dbsbuffer_dataset_subscription
  7  INNER JOIN dbsbuffer_dataset ON
  8    dbsbuffer_dataset.id = dbsbuffer_dataset_subscription.dataset_id
  9  INNER JOIN dbsbuffer_block ON
 10    dbsbuffer_block.dataset_id = dbsbuffer_dataset_subscription.dataset_id
 11  INNER JOIN dbsbuffer_file ON
 12    dbsbuffer_file.block_id = dbsbuffer_block.id
 13  INNER JOIN dbsbuffer_workflow ON
 14    dbsbuffer_workflow.id = dbsbuffer_file.workflow
 15  INNER JOIN dbsbuffer_location ON
 16    dbsbuffer_location.id = dbsbuffer_block.location
 17  WHERE dbsbuffer_dataset_subscription.delete_blocks = 1
 18  AND dbsbuffer_dataset_subscription.subscribed = 1
 19  AND dbsbuffer_block.status = 'Closed'
 20  AND dbsbuffer_block.deleted = 0
 21  GROUP BY dbsbuffer_block.blockname,
 22           dbsbuffer_location.pnn,
 23           dbsbuffer_dataset.path,
 24           dbsbuffer_dataset_subscription.site,
 25           dbsbuffer_workflow.name
 26  HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed);

BLOCKNAME								     PNN	     PATH				      SITE	      NAME
---------------------------------------------------------------------------- --------------- ---------------------------------------- --------------- --------------------------------------------------------------------------------
/Cosmics/Tier0_REPLAY_2022-v425/RAW#559fa049-e3aa-4ba0-8ce0-333da56e5537     T0_CH_CERN_Disk /Cosmics/Tier0_REPLAY_2022-v425/RAW      T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/HLTPhysics/Tier0_REPLAY_2022-v425/RAW#c143e216-03b9-4a22-b1f1-45a136e819b8  T0_CH_CERN_Disk /HLTPhysics/Tier0_REPLAY_2022-v425/RAW   T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/StreamExpressCosmics/Tier0_REPLAY_2022-TkAlCosmics0T-Express-v425/ALCARECO# T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T0_CH_CERN_Disk Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220
/StreamExpressCosmics/Tier0_REPLAY_2022-SiStripCalZeroBias-Express-v425/ALCA T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T0_CH_CERN_Disk Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220
/StreamExpressCosmics/Tier0_REPLAY_2022-PromptCalibProdSiStrip-Express-v425/ T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T0_CH_CERN_Disk Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220
/MinimumBias/Tier0_REPLAY_2022-v425/RAW#5c56ac08-5792-48f5-b6e0-a12addd0f0cb T0_CH_CERN_Disk /MinimumBias/Tier0_REPLAY_2022-v425/RAW  T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/HcalNZS/Tier0_REPLAY_2022-v425/RAW#56efc611-056d-46bb-ae38-ed1311e58d2a     T0_CH_CERN_Disk /HcalNZS/Tier0_REPLAY_2022-v425/RAW      T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/NoBPTX/Tier0_REPLAY_2022-v425/RAW#a7961d75-df91-4fd4-a2ac-248c980819fe      T0_CH_CERN_Disk /NoBPTX/Tier0_REPLAY_2022-v425/RAW       T0_CH_CERN_Disk Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
/StreamExpressCosmics/Tier0_REPLAY_2022-SiPixelCalZeroBias-Express-v425/ALCA T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T0_CH_CERN_Disk Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220
/StreamExpressCosmics/Tier0_REPLAY_2022-SiStripPCLHistos-Express-v425/ALCARE T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T0_CH_CERN_Disk Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220
/ExpressCosmics/Tier0_REPLAY_2022-Express-v425/FEVT#21800b3c-a4b9-41e6-bcbf- T0_CH_CERN_Disk /ExpressCosmics/Tier0_REPLAY_2022-Expres T0_CH_CERN_Disk Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220
/StreamExpressCosmics/Tier0_REPLAY_2022-Express-v425/DQMIO#decad3e6-9576-4fd T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T0_CH_CERN_Disk Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220

12 rows selected.

Fully completed workflows with dependent workflows NOT completed:

SQL> SELECT DISTINCT ww.name FROM wmbs_workflow ww
  2             INNER JOIN wmbs_subscription ws ON
  3                        ws.workflow = ww.id
  4             INNER JOIN wmbs_fileset wfs ON
  5                        wfs.id = ws.fileset
  6             INNER JOIN wmbs_fileset_files wfsf ON
  7                        wfsf.fileset = wfs.id
  8             INNER JOIN wmbs_file_parent wfp ON
  9                        wfp.parent = wfsf.fileid
 10             INNER JOIN wmbs_fileset_files child_fileset ON
 11                        child_fileset.fileid = wfp.child
 12             INNER JOIN wmbs_subscription child_subscription ON
 13                        child_subscription.fileset = child_fileset.fileset
 14             WHERE child_subscription.finished = 0;

NAME
--------------------------------------------------------------------------------
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503

As expected both Express and Repack present.

Full set of trully Deletable Workflows:

SQL> SELECT DISTINCT wmbs_workflow.name,
  2                  wmbs_workflow.spec,
  3                  wmbs_workflow.id AS workflow_id,
  4                  wmbs_subscription.id AS sub_id
  5  FROM wmbs_subscription
  6  INNER JOIN wmbs_workflow ON
  7             wmbs_workflow.id = wmbs_subscription.workflow
  8  INNER JOIN (SELECT name FROM wmbs_workflow
  9              WHERE name NOT IN (SELECT DISTINCT ww.name FROM wmbs_workflow ww
 10                                 INNER JOIN wmbs_subscription ws ON
 11                                            ws.workflow = ww.id
 12                                 WHERE ws.finished =0)) complete_workflow ON
 13             complete_workflow.name = wmbs_workflow.name
 14  WHERE wmbs_workflow.name NOT IN (
 15             SELECT DISTINCT ww.name FROM wmbs_workflow ww
 16             INNER JOIN wmbs_subscription ws ON
 17                        ws.workflow = ww.id
 18             INNER JOIN wmbs_fileset wfs ON
 19                        wfs.id = ws.fileset
 20             INNER JOIN wmbs_fileset_files wfsf ON
 21                        wfsf.fileset = wfs.id
 22             INNER JOIN wmbs_file_parent wfp ON
 23                        wfp.parent = wfsf.fileid
 24             INNER JOIN wmbs_fileset_files child_fileset ON
 25                        child_fileset.fileid = wfp.child
 26             INNER JOIN wmbs_subscription child_subscription ON
 27                        child_subscription.fileset = child_fileset.fileset
 28             WHERE child_subscription.finished = 0);

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     22 	22
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     29 	29
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     30 	30
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     31 	31
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     33 	33
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     23 	23
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     26 	26
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     28 	28
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     32 	32
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     35 	35
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     20 	20
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     24 	24
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     34 	34
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     36 	36
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     19 	19
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     25 	25
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     21 	21
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     27 	27

18 rows selected.

Only Express present. Repack is hold as it's output is still an input for PromptReco.

FYI @amaltaro @germanfgv

@todor-ivanov todor-ivanov force-pushed the feature_T0_DisentangleBlockDelition_fix-11042 branch from 8cc2ac6 to 933fb2d Compare May 4, 2022 15:36
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 16 warnings and errors that must be fixed
    • 6 warnings
    • 26 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 3 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13131/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented May 4, 2022

We now have the following replay status:

  • Express released && Completed
  • Repack released && Completed && archived -> Its blocks are in Completed && Deleted
  • PromptReco is Released && Comleted -> blocks deleted

Unfortunately we found out we the repack workflow being archived much earlier than we thought - because were were running with those two patches in place:
#11122
#11127

One is targeting early archival and late CouchCleanup, while the other was targeting late archival and early block level deletions.
And indeed in the logs we find:

2022-05-04 11:36:43,432:139787802531584:DEBUG:CleanCouchPoller:Setting T0 workflow: Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 to status: normal-archive
d at central CouchDB.Local CouchDB data will be cleaned after 100 hours.

But this affected only the visibility of Repack in wmstats. We will rerun the tests once again to be 100% sure of the end result. But the essential part which was the early block deletions did follow the expected behavior. Here follow the states in WMBS:

Full set of completed workflows:

SQL> SELECT distinct name FROM wmbs_workflow
  2              WHERE name NOT IN (SELECT DISTINCT ww.name FROM wmbs_workflow ww
  3                                 INNER JOIN wmbs_subscription ws ON
  4                                            ws.workflow = ww.id
  5                                 WHERE ws.finished =0)
  6  ;

NAME
--------------------------------------------------------------------------------
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503

PromptReco being Completed, meaning also Parent workflows - Repack :

SQL> SELECT DISTINCT ww.name FROM wmbs_workflow ww
  2             INNER JOIN wmbs_subscription ws ON
  3                        ws.workflow = ww.id
  4             INNER JOIN wmbs_fileset wfs ON
  5                        wfs.id = ws.fileset
  6             INNER JOIN wmbs_fileset_files wfsf ON
  7                        wfsf.fileset = wfs.id
  8             INNER JOIN wmbs_file_parent wfp ON
  9                        wfp.parent = wfsf.fileid
 10             INNER JOIN wmbs_fileset_files child_fileset ON
 11                        child_fileset.fileid = wfp.child
 12             INNER JOIN wmbs_subscription child_subscription ON
 13                        child_subscription.fileset = child_fileset.fileset
 14             WHERE child_subscription.finished = 0;

NAME
--------------------------------------------------------------------------------
PromptReco_Run350944_HLTPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_111
PromptReco_Run350944_NoBPTX_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118
PromptReco_Run350944_MinimumBias_Tier0_REPLAY_2022_ID220503111413_v425_220503_11
PromptReco_Run350944_HcalNZS_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118

All Completed blocks + workflows producing them - ready for deletion:

SQL> SELECT DISTINCT wmbs_workflow.name,
  2                  wmbs_workflow.spec,
  3                  wmbs_workflow.id AS workflow_id,
  4                  wmbs_subscription.id AS sub_id
  5  FROM wmbs_subscription
  6  INNER JOIN wmbs_workflow ON
  7             wmbs_workflow.id = wmbs_subscription.workflow
  8  INNER JOIN (SELECT name FROM wmbs_workflow
  9              WHERE name NOT IN (SELECT DISTINCT ww.name FROM wmbs_workflow ww
 10                                 INNER JOIN wmbs_subscription ws ON
 11                                            ws.workflow = ww.id
 12                                 WHERE ws.finished =0)) complete_workflow ON
 13             complete_workflow.name = wmbs_workflow.name
 14  WHERE wmbs_workflow.name NOT IN (
 15             SELECT DISTINCT ww.name FROM wmbs_workflow ww
 16             INNER JOIN wmbs_subscription ws ON
 17                        ws.workflow = ww.id
 18             INNER JOIN wmbs_fileset wfs ON
 19                        wfs.id = ws.fileset
 20             INNER JOIN wmbs_fileset_files wfsf ON
 21                        wfsf.fileset = wfs.id
 22             INNER JOIN wmbs_file_parent wfp ON
 23                        wfp.parent = wfsf.fileid
 24             INNER JOIN wmbs_fileset_files child_fileset ON
 25                        child_fileset.fileid = wfp.child
 26             INNER JOIN wmbs_subscription child_subscription ON
 27                        child_subscription.fileset = child_fileset.fileset
 28             WHERE child_subscription.finished = 0);

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     45 	45
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     48 	48
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     61 	61
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     69 	69
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      3 	 3
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      5 	 5
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      9 	 9
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     10 	10
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     15 	15
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     22 	22
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     93 	93

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     60 	60
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     88 	88
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     29 	29
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     30 	30
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     31 	31
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     33 	33
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     46 	46
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     47 	47
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     50 	50
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     56 	56
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     58 	58

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     76 	76
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     77 	77
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     84 	84
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     85 	85
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      8 	 8
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     23 	23
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     26 	26
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     28 	28
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     32 	32
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     35 	35
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     42 	42

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     51 	51
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     64 	64
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     70 	70
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     82 	82
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     83 	83
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     90 	90
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      2 	 2
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     20 	20
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     24 	24
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     92 	92
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     34 	34

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     36 	36
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     54 	54
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     55 	55
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     62 	62
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     68 	68
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     73 	73
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     80 	80
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     81 	81
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     86 	86
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     87 	87
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      1 	 1

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      7 	 7
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     13 	13
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     14 	14
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     19 	19
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     25 	25
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     43 	43
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     49 	49
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     52 	52
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     53 	53
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     57 	57
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     63 	63

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     66 	66
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     67 	67
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     72 	72
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     74 	74
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     79 	79
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     89 	89
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      4 	 4
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     17 	17
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     21 	21
Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID220503111413_v425_220 /data/tier0/admin/Specs/Express_Run350944_StreamExpressCosmics_Tier0_REPLAY_2022_ID2205031	     27 	27
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     91 	91

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     95 	95
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     59 	59
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     65 	65
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     71 	71
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	      6 	 6
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     12 	12
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     16 	16
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     94 	94
Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114 /data/tier0/admin/Specs/Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v4	     44 	44
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     75 	75
PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1118	 /data/tier0/admin/Specs/PromptReco_Run350944_Cosmics_Tier0_REPLAY_2022_ID220503111413_v425	     78 	78

NAME										 SPEC											    WORKFLOW_ID     SUB_ID
-------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------ ----------- ----------
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     11 	11
Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID220503111413_v425_220503 /data/tier0/admin/Specs/Express_Run350944_StreamCalibration_Tier0_REPLAY_2022_ID2205031114	     18 	18

90 rows selected.

And the final list of deletableBlocks fetched from RucioInjectorLogs:

2022-05-04 17:46:04,425:140503937758976:DEBUG:RucioInjectorPoller:Final deletable blocks dict: {'/Cosmics/Tier0_REPLAY_2022-v425/RAW#559fa049-e3aa-4ba0-8ce0-333da56e5537': {'dataset': '/Cosm
ics/Tier0_REPLAY_2022-v425/RAW',
                                                                              'location': 'T0_CH_CERN_Disk',
                                                                              'sites': {'T0_CH_CERN_Disk'},
                                                                              'workflowName': 'Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114'},
 '/HLTPhysics/Tier0_REPLAY_2022-v425/RAW#c143e216-03b9-4a22-b1f1-45a136e819b8': {'dataset': '/HLTPhysics/Tier0_REPLAY_2022-v425/RAW',
                                                                                 'location': 'T0_CH_CERN_Disk',
                                                                                 'sites': {'T0_CH_CERN_Disk'},
                                                                                 'workflowName': 'Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114'},
 '/HcalNZS/Tier0_REPLAY_2022-v425/RAW#56efc611-056d-46bb-ae38-ed1311e58d2a': {'dataset': '/HcalNZS/Tier0_REPLAY_2022-v425/RAW',
                                                                              'location': 'T0_CH_CERN_Disk',
                                                                              'sites': {'T0_CH_CERN_Disk'},
                                                                              'workflowName': 'Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114'},
 '/MinimumBias/Tier0_REPLAY_2022-v425/RAW#5c56ac08-5792-48f5-b6e0-a12addd0f0cb': {'dataset': '/MinimumBias/Tier0_REPLAY_2022-v425/RAW',
                                                                                  'location': 'T0_CH_CERN_Disk',
                                                                                  'sites': {'T0_CH_CERN_Disk'},
                                                                                  'workflowName': 'Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114'},
 '/NoBPTX/Tier0_REPLAY_2022-v425/RAW#a7961d75-df91-4fd4-a2ac-248c980819fe': {'dataset': '/NoBPTX/Tier0_REPLAY_2022-v425/RAW',
                                                                             'location': 'T0_CH_CERN_Disk',
                                                                             'sites': {'T0_CH_CERN_Disk'},
                                                                             'workflowName': 'Repack_Run350944_StreamPhysics_Tier0_REPLAY_2022_ID220503111413_v425_220503_1114'}}
2022-05-04 17:46:04,425:140503937758976:INFO:RucioInjectorPoller:Handeling 5 candidate blocks
2022-05-04 17:46:04,454:140503937758976:DEBUG:connectionpool:http://cms-rucio.cern.ch:80 "POST /replicas/datasets_bulk HTTP/1.1" 200 None
2022-05-04 17:46:04,456:140503937758976:DEBUG:RucioInjectorPoller:BlockName: /Cosmics/Tier0_REPLAY_2022-v425/RAW#559fa049-e3aa-4ba0-8ce0-333da56e5537
2022-05-04 17:46:04,456:140503937758976:DEBUG:RucioInjectorPoller:Needed: {'T0_CH_CERN_Disk'} / Available: {'T0_CH_CERN_Disk'}
2022-05-04 17:46:04,487:140503937758976:DEBUG:connectionpool:http://cms-rucio.cern.ch:80 "POST /replicas/datasets_bulk HTTP/1.1" 200 None
2022-05-04 17:46:04,487:140503937758976:DEBUG:RucioInjectorPoller:BlockName: /HLTPhysics/Tier0_REPLAY_2022-v425/RAW#c143e216-03b9-4a22-b1f1-45a136e819b8
...

@amaltaro @germanfgv

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov some of this review was done live on slack, but please clean those pylint and pep8/pycodestyle in the new modules. You can find also a few comments along the line. Thanks

Copy link
Contributor

@germanfgv germanfgv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@todor-ivanov As you may remember, we need a separate config parameter to control how long we keep the block on disk. I think this should be straight forward change. Let me know if you have any issues with this.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 16 warnings and errors that must be fixed
    • 6 warnings
    • 26 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 3 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13215/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro I requested your review yet again, but it was kind of early. I am also working on the pylint tests and one last configuration parameter @germanfgv was mentioning is still needed. But please take a quick look and check if what I have done for changing the DAO parsing method looks sound.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 19 warnings and errors that must be fixed
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13217/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 19 warnings and errors that must be fixed
    • 6 warnings
    • 26 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13216/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 19 warnings and errors that must be fixed
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13223/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the feature_T0_DisentangleBlockDelition_fix-11042 branch from b5fbac0 to bc00367 Compare May 17, 2022 14:13
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 17 warnings and errors that must be fixed
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13224/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13227/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

@amaltaro please go ahead and proceed with your review.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 7 tests added
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13230/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good Todor. However I have many comments and questions that might need further follow up. Please also have a look at the usual Jenkins report, there are a few minor things that you should take into account for the new modules.

AND dbsbuffer_dataset_subscription.subscribed = 1
AND dbsbuffer_block.status = 'Closed'
AND dbsbuffer_block.deleted = 0
GROUP BY dbsbuffer_block.blockname,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to group the output for faster post-processing? Or is it just a live visualization enhancement?

Copy link
Contributor Author

@todor-ivanov todor-ivanov May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No visualization is involved at all.
The Group By clause ensures no duplicate records are returned for any of the grouping columns. That is needed for the Having clause bellow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By construction, I would say we will never have a duplicate row for a given blockname + pnn. I am not an SQL expert though and I could be wrong. But if I am right, this would make this query faster and cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Removing the group by statement will make that query dangerous!

dbsbuffer_dataset_subscription.site,
dbsbuffer_workflow.name,
dbsbuffer_block.create_time
HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please educate me on why we need this check as well? If I understand it properly, it says: the amount of dbsbuffer blocks matching all these constraints must be equal to the amount of completed workflows in dbsbuffer. Is that correct?
And a completed workflow in dbsbuffer comes from the fact that all work units (wmbs records) have been processed in that workflow, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the amount of dbsbuffer blocks matching all these constraints must be equal to the amount of completed workflows in dbsbuffer. Is that correct?

Yes

And a completed workflow in dbsbuffer comes from the fact that all work units (wmbs records) have been processed in that workflow, right?

Again, Yes.

My understanding here is: This basically assures all the records returned would include only blocks related to workflows marked as completed in dbsbuffer_workflow table. This one I did not change it is from the old DAO. So just to be 100% sure we are not messing things here I'd like to hear @hufnagel 's opinion as well.

Copy link
Contributor

@amaltaro amaltaro May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right! But given that we already join dbsbuffer_workflow, I wonder why not having an extra AND clause in the WHERE block with this constraint dbsbuffer_workflow.completed=1 (or whatever value it's supposed to have).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reiterating through this once again @amaltaro

Lets talk with examples here
Here is what this query returns as it is right now for a completely finished replay.

SQL> SELECT
  2         count(*),
  3         SUM(dbsbuffer_workflow.completed),
  4         dbsbuffer_block.blockname,
  5         dbsbuffer_location.pnn,
  6         dbsbuffer_dataset.path,
  7         dbsbuffer_dataset_subscription.site,
  8         dbsbuffer_workflow.name
  9  FROM dbsbuffer_dataset_subscription
 10  INNER JOIN dbsbuffer_dataset ON
 11    dbsbuffer_dataset.id = dbsbuffer_dataset_subscription.dataset_id
 12  INNER JOIN dbsbuffer_block ON
 13    dbsbuffer_block.dataset_id = dbsbuffer_dataset_subscription.dataset_id
 14  INNER JOIN dbsbuffer_file ON
 15    dbsbuffer_file.block_id = dbsbuffer_block.id
 16  INNER JOIN dbsbuffer_workflow ON
 17    dbsbuffer_workflow.id = dbsbuffer_file.workflow
 18  INNER JOIN dbsbuffer_location ON
 19    dbsbuffer_location.id = dbsbuffer_block.location
 20  WHERE dbsbuffer_dataset_subscription.delete_blocks = 1
 21  AND dbsbuffer_dataset_subscription.subscribed = 1
 22  AND dbsbuffer_block.status = 'Closed'
 23  AND dbsbuffer_block.deleted = 0
 24  GROUP BY dbsbuffer_block.blockname,
 25           dbsbuffer_location.pnn,
 26           dbsbuffer_dataset.path,
 27           dbsbuffer_dataset_subscription.site,
 28           dbsbuffer_workflow.name
 29  HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed);

  COUNT(*) SUM(DBSBUFFER_WORKFLOW.COMPLETED) BLOCKNAME									      PNN	      PATH				       SITE	       NAME
---------- --------------------------------- -------------------------------------------------------------------------------- --------------- ---------------------------------------- --------------- -------------------------------
	 1				   1 /TestEnablesEcalHcal/Tier0_REPLAY_2022-Express-v425/RAW#c0ee5da6-3616-419b-81ba- T0_CH_CERN_Disk /TestEnablesEcalHcal/Tier0_REPLAY_2022-E T2_CH_CERN      Express_Run351572_StreamCalibra
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-SiStripPCLHistos-Express-v425/ALCARECO#e T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /MinimumBias/Tier0_REPLAY_2022-v425/RAW#961b6dcb-7514-4adc-b26c-372792c977fc     T0_CH_CERN_Disk /MinimumBias/Tier0_REPLAY_2022-v425/RAW  T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /L1Accept/Tier0_REPLAY_2022-v425/RAW#575557fc-e92c-45c6-9103-c93572d3538b	      T0_CH_CERN_Disk /L1Accept/Tier0_REPLAY_2022-v425/RAW     T2_CH_CERN      Repack_Run351572_StreamNanoDST_
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-SiPixelCalZeroBias-Express-v425/ALCARECO T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /HcalNZS/Tier0_REPLAY_2022-v425/RAW#32396414-57a4-4188-a6f6-75a208bd4c8f	      T0_CH_CERN_Disk /HcalNZS/Tier0_REPLAY_2022-v425/RAW      T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /NoBPTX/Tier0_REPLAY_2022-v425/RAW#0bbb5f07-c8c2-4725-8251-79c89354e272	      T0_CH_CERN_Disk /NoBPTX/Tier0_REPLAY_2022-v425/RAW       T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /StreamCalibration/Tier0_REPLAY_2022-PromptCalibProdEcalPedestals-Express-v425/A T0_CH_CERN_Disk /StreamCalibration/Tier0_REPLAY_2022-Pro T2_CH_CERN      Express_Run351572_StreamCalibra
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-PromptCalibProdSiStrip-Express-v425/ALCA T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /HLTPhysics/Tier0_REPLAY_2022-v425/RAW#0f75ea20-d421-4946-b366-63655c5bb3a3      T0_CH_CERN_Disk /HLTPhysics/Tier0_REPLAY_2022-v425/RAW   T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /StreamCalibration/Tier0_REPLAY_2022-Express-v425/DQMIO#6082423a-08ad-44b3-8f9e- T0_CH_CERN_Disk /StreamCalibration/Tier0_REPLAY_2022-Exp T2_CH_CERN      Express_Run351572_StreamCalibra

  COUNT(*) SUM(DBSBUFFER_WORKFLOW.COMPLETED) BLOCKNAME									      PNN	      PATH				       SITE	       NAME
---------- --------------------------------- -------------------------------------------------------------------------------- --------------- ---------------------------------------- --------------- -------------------------------
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-SiStripCalZeroBias-Express-v425/ALCARECO T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 4				   4 /StreamExpressCosmics/Tier0_REPLAY_2022-Express-v425/DQMIO#bfdf5c4d-003a-41a1-bd T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /StreamCalibration/Tier0_REPLAY_2022-EcalTestPulsesRaw-Express-v425/ALCARECO#184 T0_CH_CERN_Disk /StreamCalibration/Tier0_REPLAY_2022-Eca T2_CH_CERN      Express_Run351572_StreamCalibra
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-TkAlCosmics0T-Express-v425/ALCARECO#880f T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 5				   5 /ExpressCosmics/Tier0_REPLAY_2022-Express-v425/FEVT#4a678f2c-3945-41cf-b06c-faef T0_CH_CERN_Disk /ExpressCosmics/Tier0_REPLAY_2022-Expres T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /StreamExpressCosmics/Tier0_REPLAY_2022-Express-v425/DQMIO#cec89541-78f7-44a4-aa T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /Cosmics/Tier0_REPLAY_2022-v425/RAW#9d334e74-6687-4670-8f38-f95c720f8287	      T0_CH_CERN_Disk /Cosmics/Tier0_REPLAY_2022-v425/RAW      T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /RPCMonitor/Tier0_REPLAY_2022-v425/RAW#c0abae4e-71a4-4b43-8043-e5e5ee1d6a90      T0_CH_CERN_Disk /RPCMonitor/Tier0_REPLAY_2022-v425/RAW   T2_CH_CERN      Repack_Run351572_StreamRPCMON_T

19 rows selected.

And If I understand your suggestion correctly the change should be something like:

SQL> SELECT
  2         count(*),
  3         SUM(dbsbuffer_workflow.completed),
  4         dbsbuffer_block.blockname,
  5         dbsbuffer_location.pnn,
  6         dbsbuffer_dataset.path,
  7         dbsbuffer_dataset_subscription.site,
  8         dbsbuffer_workflow.name
  9  FROM dbsbuffer_dataset_subscription
 10  INNER JOIN dbsbuffer_dataset ON
 11    dbsbuffer_dataset.id = dbsbuffer_dataset_subscription.dataset_id
 12  INNER JOIN dbsbuffer_block ON
 13    dbsbuffer_block.dataset_id = dbsbuffer_dataset_subscription.dataset_id
 14  INNER JOIN dbsbuffer_file ON
 15    dbsbuffer_file.block_id = dbsbuffer_block.id
 16  INNER JOIN dbsbuffer_workflow ON
 17    dbsbuffer_workflow.id = dbsbuffer_file.workflow
 18  INNER JOIN dbsbuffer_location ON
 19    dbsbuffer_location.id = dbsbuffer_block.location
 20  WHERE dbsbuffer_dataset_subscription.delete_blocks = 1
 21    AND dbsbuffer_dataset_subscription.subscribed = 1
 22    AND dbsbuffer_block.status = 'Closed'
 23    AND dbsbuffer_block.deleted = 0
 24    AND dbsbuffer_workflow.completed = 1
 25  GROUP BY dbsbuffer_block.blockname,
 26           dbsbuffer_location.pnn,
 27           dbsbuffer_dataset.path,
 28           dbsbuffer_dataset_subscription.site,
 29           dbsbuffer_workflow.name ;

  COUNT(*) SUM(DBSBUFFER_WORKFLOW.COMPLETED) BLOCKNAME									      PNN	      PATH				       SITE	       NAME
---------- --------------------------------- -------------------------------------------------------------------------------- --------------- ---------------------------------------- --------------- -------------------------------
	 1				   1 /TestEnablesEcalHcal/Tier0_REPLAY_2022-Express-v425/RAW#c0ee5da6-3616-419b-81ba- T0_CH_CERN_Disk /TestEnablesEcalHcal/Tier0_REPLAY_2022-E T2_CH_CERN      Express_Run351572_StreamCalibra
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-SiStripPCLHistos-Express-v425/ALCARECO#e T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /MinimumBias/Tier0_REPLAY_2022-v425/RAW#961b6dcb-7514-4adc-b26c-372792c977fc     T0_CH_CERN_Disk /MinimumBias/Tier0_REPLAY_2022-v425/RAW  T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /L1Accept/Tier0_REPLAY_2022-v425/RAW#575557fc-e92c-45c6-9103-c93572d3538b	      T0_CH_CERN_Disk /L1Accept/Tier0_REPLAY_2022-v425/RAW     T2_CH_CERN      Repack_Run351572_StreamNanoDST_
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-SiPixelCalZeroBias-Express-v425/ALCARECO T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /HcalNZS/Tier0_REPLAY_2022-v425/RAW#32396414-57a4-4188-a6f6-75a208bd4c8f	      T0_CH_CERN_Disk /HcalNZS/Tier0_REPLAY_2022-v425/RAW      T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /NoBPTX/Tier0_REPLAY_2022-v425/RAW#0bbb5f07-c8c2-4725-8251-79c89354e272	      T0_CH_CERN_Disk /NoBPTX/Tier0_REPLAY_2022-v425/RAW       T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /StreamCalibration/Tier0_REPLAY_2022-PromptCalibProdEcalPedestals-Express-v425/A T0_CH_CERN_Disk /StreamCalibration/Tier0_REPLAY_2022-Pro T2_CH_CERN      Express_Run351572_StreamCalibra
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-PromptCalibProdSiStrip-Express-v425/ALCA T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /HLTPhysics/Tier0_REPLAY_2022-v425/RAW#0f75ea20-d421-4946-b366-63655c5bb3a3      T0_CH_CERN_Disk /HLTPhysics/Tier0_REPLAY_2022-v425/RAW   T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /StreamCalibration/Tier0_REPLAY_2022-Express-v425/DQMIO#6082423a-08ad-44b3-8f9e- T0_CH_CERN_Disk /StreamCalibration/Tier0_REPLAY_2022-Exp T2_CH_CERN      Express_Run351572_StreamCalibra

  COUNT(*) SUM(DBSBUFFER_WORKFLOW.COMPLETED) BLOCKNAME									      PNN	      PATH				       SITE	       NAME
---------- --------------------------------- -------------------------------------------------------------------------------- --------------- ---------------------------------------- --------------- -------------------------------
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-SiStripCalZeroBias-Express-v425/ALCARECO T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 4				   4 /StreamExpressCosmics/Tier0_REPLAY_2022-Express-v425/DQMIO#bfdf5c4d-003a-41a1-bd T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /StreamCalibration/Tier0_REPLAY_2022-EcalTestPulsesRaw-Express-v425/ALCARECO#184 T0_CH_CERN_Disk /StreamCalibration/Tier0_REPLAY_2022-Eca T2_CH_CERN      Express_Run351572_StreamCalibra
	 5				   5 /StreamExpressCosmics/Tier0_REPLAY_2022-TkAlCosmics0T-Express-v425/ALCARECO#880f T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 5				   5 /ExpressCosmics/Tier0_REPLAY_2022-Express-v425/FEVT#4a678f2c-3945-41cf-b06c-faef T0_CH_CERN_Disk /ExpressCosmics/Tier0_REPLAY_2022-Expres T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /StreamExpressCosmics/Tier0_REPLAY_2022-Express-v425/DQMIO#cec89541-78f7-44a4-aa T0_CH_CERN_Disk /StreamExpressCosmics/Tier0_REPLAY_2022- T2_CH_CERN      Express_Run351572_StreamExpress
	 1				   1 /Cosmics/Tier0_REPLAY_2022-v425/RAW#9d334e74-6687-4670-8f38-f95c720f8287	      T0_CH_CERN_Disk /Cosmics/Tier0_REPLAY_2022-v425/RAW      T2_CH_CERN      Repack_Run351572_StreamPhysics_
	 1				   1 /RPCMonitor/Tier0_REPLAY_2022-v425/RAW#c0abae4e-71a4-4b43-8043-e5e5ee1d6a90      T0_CH_CERN_Disk /RPCMonitor/Tier0_REPLAY_2022-v425/RAW   T2_CH_CERN      Repack_Run351572_StreamRPCMON_T

19 rows selected.

The output for a fully completed replay does look equivalent, indeed. Needs to be tested for a running one with a PromptReco paused though.

The difference in those tow queries to me sounds to be the following:

  • in the former the requirement for matching a completed=1 requirement for the rows returned is applied upon aggregation on all of the groups generated by the group by statement,
  • while in the later the condition is applied during the selection statement before the grouping and aggregation.

This could matter for a running workflow.


"""

from __future__ import division
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you missed this (please remove both future imports).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another ping

@todor-ivanov
Copy link
Contributor Author

Thanks @amaltaro for your review. I think I have addressed all your comments. Please take another look.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13240/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, please remove the future imports (comments along the code).
I also have further comments/questions to properly understand it.


"""

from __future__ import print_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been deleted as well. In short, we no longer need to import anything from __future__ or future


class GetCompletedBlocks(DBFormatter):
"""
Retrieves a list of blocks that are closed but NOT sure yet if they are deletedable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here (and a few misspelling in the lines below).

AND dbsbuffer_dataset_subscription.subscribed = 1
AND dbsbuffer_block.status = 'Closed'
AND dbsbuffer_block.deleted = 0
GROUP BY dbsbuffer_block.blockname,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By construction, I would say we will never have a duplicate row for a given blockname + pnn. I am not an SQL expert though and I could be wrong. But if I am right, this would make this query faster and cleaner.

dbsbuffer_dataset_subscription.site,
dbsbuffer_workflow.name,
dbsbuffer_block.create_time
HAVING COUNT(*) = SUM(dbsbuffer_workflow.completed)
Copy link
Contributor

@amaltaro amaltaro May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right! But given that we already join dbsbuffer_workflow, I wonder why not having an extra AND clause in the WHERE block with this constraint dbsbuffer_workflow.completed=1 (or whatever value it's supposed to have).


"""

from __future__ import division
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you missed this (please remove both future imports).

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13241/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the feature_T0_DisentangleBlockDelition_fix-11042 branch from 0d29758 to 4ef1e42 Compare May 20, 2022 13:56
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13242/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

Thanks @amaltaro other than the general change of the sql query structure (which I thing needs further discussion) I did fllow your comments. Please take another look.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left another comment or two along the code for your consideration. It looks like you haven't looked at the pycodestyle report as well, please fix this:

src/python/WMComponent/RucioInjector/Database/MySQL/GetCompletedBlocks.py
    Line 88, E225 missing whitespace around operator

Regarding the SQL query, it's been tested and we do not know how much an improvement would buy us. So let's leave that open for a possible future discussion (use of the GROUP and HAVING clauses).

Once you make further changes, if you want to get this merged, then please:

  • squash your commits accordingly
  • remove those labels.


"""

from __future__ import division
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another ping

@@ -82,6 +82,8 @@ def __init__(self, config):

self.useDsetReplicaDeep = getattr(config.RucioInjector, "useDsetReplicaDeep", False)
self.delBlockSlicesize = getattr(config.RucioInjector, "delBlockSlicesize", 100)
self.blockDeletionDelayHours = getattr(config.RucioInjector, "blockDeletionDelayHours", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, you don't need to update the code, but given that you don't use this variable (blockDeletionDelayHours) anywhere in the class, you could simply remove the self (not making it an object instance attribute)

Fix log message

Typo

Change DAO parsing method.

Include files left behind from previous commit.

Fix docstring

Add blockDeletionDelayHours.

Typo

Pylint changes

Bugfix - missed plural in variable name mapping

Review comments

Review comments 2

Review comments 2
@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro the changes are finalized and the Commits squashed. Please go ahead and merge at your convenience.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13243/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 6 warnings
    • 25 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/13244/artifact/artifacts/PullRequestReport.html

@amaltaro amaltaro merged commit 43824d7 into dmwm:master May 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

T0: Disentangle Blocklevel rules preservation from workflow archival time
4 participants