Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle gfal exceptions while listing baseDirEntry && Avoid extra stat operations during recursion #11794

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Nov 14, 2023

Fixes #11793
Fixes #11701

Status

READY

Description

As a natural consequence of the way how the operations are reordered for the purpose of the current bug fixing, we also improve the MSUnmerged service in at least 3 different aspects:

  • Handle the File Not Found exception explicitly during the baseDirEntry listing and not relying on the stat operation to catch a missing file.
  • Avoid extra stat operations during recursive calls when we are already sure the entry point is a directory
  • Try to remove the whole nonempty directory as early as possible - this is supposed to be an addition to the attempt to minimize recursive operations as introduced with this PR: Try to remove the base directory first and avoid recursive operations #11781

In Addition:

  • Fix the missing timestamps in the REST interface for all RSE documents in MongoDB (this is an additional fix of a minor mistake I have made during the previous fix related to RSE timestamps, for which I decided to not create a whole separate issue).

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

This PR is actually also contributing to the fix of the following issue: #11701 which was partially fixed by: #11781

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 3 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 7 warnings
    • 35 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 3 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14619/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

Hi @vkuznet, thanks for this review. I have answered your comments inline. If you think anything else is worth attention please let me know.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 7 warnings
    • 35 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14626/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Nov 15, 2023

Hi @amaltaro @vkuznet with my latest commit I actually found a way to reduce recursive operations during the RSE cleaning process even more.

With the changes suggested here: #1178 we are to delete the whole branch of empty directories under a specific entry in the site's namespace. But for some of the sites we are actually able to use WebDAV to try to remove the whole branch even before emptying it from its contents. Something which was actually previously impossible with SRMv2, and this was driving us into those recursive operations.

In my latest commit I am doing this already and now this (for the sites, which allow this operation to happen) is an extreme speedup. This one is a real game changer (again under the constraint: for the sites where it works).

I tested it with T2_IT_Legnaro: https://cmsweb.cern.ch/ms-unmerged/data/info?rse=T2_IT_Legnaro&detail=True And the results were really good. The runtime was reduced to just few min.

It also reduces the logs [1], because we do not dump into the logs the deletion result of every file present in the directory in the cases when we succeed to remove it from the top. Unfortunately there will be no change for those who enter the recursive operations, though.

[1]
Initial Run:

In [1]: msConfig['enableRealMode'] = True

In [2]: %page rse

In [3]: rse = msUnmerged.cleanRSE(rse)


2023-11-15 13:09:13,199:INFO:MSUnmerged:cleanRSE(): Trying to remove nonempty directory: /store/unmerged/Run3Summer22NanoAODv12/ZZto2L2Q_TuneCP5_13p6TeV_powheg-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_v5_ext1-v2
2023-11-15 13:09:13,400:WARNING:MSUnmerged:cleanRSE(): MISSING directory: /store/unmerged/Run3Summer22NanoAODv12/ZZto2L2Q_TuneCP5_13p6TeV_powheg-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_v5_ext1-v2
2023-11-15 13:09:13,400:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro  Success deleting directory: /store/unmerged/Run3Summer22NanoAODv12/ZZto2L2Q_TuneCP5_13p6TeV_powheg-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_v5_ext1-v2
2023-11-15 13:09:13,411:DEBUG:connectionpool:_make_request(): http://cms-rucio.cern.ch:80 "GET /rses/T2_IT_Legnaro/lfns2pfns?operation=delete&lfn=cms%3A%2Fstore%2Funmerged%2FRun3Summer22EENanoAODv12%2FWtoLNu-4Jets_TuneCP5_13p6TeV_madgraphMLM-pythia8%2FNANOAODSIM%2F130X_mcRun3_2022_realistic_postEE_v6-v2 HTTP/1.1" 200 351
2023-11-15 13:09:13,416:DEBUG:MSUnmerged:cleanRSE(): 
RSE: T2_IT_Legnaro 
DELETING: /store/unmerged/Run3Summer22EENanoAODv12/WtoLNu-4Jets_TuneCP5_13p6TeV_madgraphMLM-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2.
PFN list with: 3 entries: 
    davs://t2-xrdcms.lnl.infn.it:2880/pnfs/lnl.infn.it/data/cms/store/unmerged/Run3Summer22EENanoAODv12/WtoLNu-4Jets_TuneCP5_13p6TeV_madgraphMLM-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2520001/15859d01-cf7c-48f2-95c4-cab8fdf4457b.root
    davs://t2-xrdcms.lnl.infn.it:2880/pnfs/lnl.infn.it/data/cms/store/unmerged/Run3Summer22EENanoAODv12/WtoLNu-4Jets_TuneCP5_13p6TeV_madgraphMLM-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2520001/9d7121c8-87c4-4ac3-ac55-a25b9b78dca8.root
    davs://t2-xrdcms.lnl.infn.it:2880/pnfs/lnl.infn.it/data/cms/store/unmerged/Run3Summer22EENanoAODv12/WtoLNu-4Jets_TuneCP5_13p6TeV_madgraphMLM-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2/2520001/aaf1b528-054c-4fb9-a799-35c4bb7a2598.root

2023-11-15 13:09:13,417:INFO:MSUnmerged:cleanRSE(): Trying to remove nonempty directory: /store/unmerged/Run3Summer22EENanoAODv12/WtoLNu-4Jets_TuneCP5_13p6TeV_madgraphMLM-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2
2023-11-15 13:09:13,618:WARNING:MSUnmerged:cleanRSE(): MISSING directory: /store/unmerged/Run3Summer22EENanoAODv12/WtoLNu-4Jets_TuneCP5_13p6TeV_madgraphMLM-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2
2023-11-15 13:09:13,618:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro  Success deleting directory: /store/unmerged/Run3Summer22EENanoAODv12/WtoLNu-4Jets_TuneCP5_13p6TeV_madgraphMLM-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2
2023-11-15 13:09:13,650:DEBUG:connectionpool:_make_request(): http://cms-rucio.cern.ch:80 "GET /rses/T2_IT_Legnaro/lfns2pfns?operation=delete&lfn=cms%3A%2Fstore%2Funmerged%2FRun2023B%2FParkingDoubleMuonLowMass4%2FMINIAOD%2F22Sep2023-v1 HTTP/1.1" 200 213
2023-11-15 13:09:13,655:DEBUG:MSUnmerged:cleanRSE(): 
RSE: T2_IT_Legnaro 
DELETING: /store/unmerged/Run2023B/ParkingDoubleMuonLowMass4/MINIAOD/22Sep2023-v1.
PFN list with: 10 entries: 
    davs://t2-xrdcms.lnl.infn.it:2880/pnfs/lnl.infn.it/data/cms/store/unmerged/Run2023B/ParkingDoubleMuonLowMass4/MINIAOD/22Sep2023-v1/70000/1acfce59-f8a8-413b-82b6-c3309419ad37.root
    davs://t2-xrdcms.lnl.infn.it:2880/pnfs/lnl.infn.it/data/cms/store/unmerged/Run2023B/ParkingDoubleMuonLowMass4/MINIAOD/22Sep2023-v1/70000/2d9aa9c0-768c-47d0-8e83-6eebb06fe85c.root
    davs://t2-xrdcms.lnl.infn.it:2880/pnfs/lnl.infn.it/data/cms/store/unmerged/Run2023B/ParkingDoubleMuonLowMass4/MINIAOD/22Sep2023-v1/70000/48d049c2-23ad-4faf-b3d9-4658ef26da5c.root
    davs://t2-xrdcms.lnl.infn.it:2880/pnfs/lnl.infn.it/data/cms/store/unmerged/Run2023B/ParkingDoubleMuonLowMass4/MINIAOD/22Sep2023-v1/70000/594bb846-0aa8-4ba4-b110-0020fc27beef.root
    ...
2023-11-15 13:09:18,429:INFO:MSUnmerged:cleanRSE(): Trying to remove nonempty directory: /store/unmerged/Run2023D/Muon0/NANOAOD/22Sep2023_v2-v1
2023-11-15 13:09:18,629:WARNING:MSUnmerged:cleanRSE(): MISSING directory: /store/unmerged/Run2023D/Muon0/NANOAOD/22Sep2023_v2-v1
2023-11-15 13:09:18,629:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro  Success deleting directory: /store/unmerged/Run2023D/Muon0/NANOAOD/22Sep2023_v2-v1
2023-11-15 13:09:18,641:DEBUG:connectionpool:_make_request(): http://cms-rucio.cern.ch:80 "GET /rses/T2_IT_Legnaro/lfns2pfns?operation=delete&lfn=cms%3A%2Fstore%2Funmerged%2FRun3Summer23DRPremix%2FQCD-4Jets_HT-200to400_TuneCP5_13p6TeV_madgraphMLM-pythia8%2FAODSIM%2F130X_mcRun3_2023_realistic_v14-v2 HTTP/1.1" 200 341
2023-11-15 13:09:18,646:DEBUG:MSUnmerged:cleanRSE(): 
RSE: T2_IT_Legnaro 
DELETING: /store/unmerged/Run3Summer23DRPremix/QCD-4Jets_HT-200to400_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/130X_mcRun3_2023_realistic_v14-v2.
PFN list with: 1 entries: 
    davs://t2-xrdcms.lnl.infn.it:2880/pnfs/lnl.infn.it/data/cms/store/unmerged/Run3Summer23DRPremix/QCD-4Jets_HT-200to400_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/130X_mcRun3_2023_realistic_v14-v2/25610003/d2ff9af0-06fb-4a96-badd-82347f04e72c.root

2023-11-15 13:09:18,646:INFO:MSUnmerged:cleanRSE(): Trying to remove nonempty directory: /store/unmerged/Run3Summer23DRPremix/QCD-4Jets_HT-200to400_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/130X_mcRun3_2023_realistic_v14-v2
2023-11-15 13:09:18,845:WARNING:MSUnmerged:cleanRSE(): MISSING directory: /store/unmerged/Run3Summer23DRPremix/QCD-4Jets_HT-200to400_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/130X_mcRun3_2023_realistic_v14-v2
2023-11-15 13:09:18,845:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro  Success deleting directory: /store/unmerged/Run3Summer23DRPremix/QCD-4Jets_HT-200to400_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/130X_mcRun3_2023_realistic_v14-v2

Uploading the RSE contents to MongoDB:

In [6]: rse = msUnmerged.uploadRSEToMongoDB(rse)
2023-11-15 13:11:45,596:INFO:MSUnmerged:uploadRSEToMongoDB(): RSE: T2_IT_Legnaro Writing rse data to MongoDB.

Second attempt to clean it:

In [7]: rse = msUnmerged.cleanRSE(rse)

2023-11-15 13:14:39,553:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run3Summer23BPixDRPremix/TTto4Q_MT-171p5_TuneCP5_13p6TeV_powheg-pythia8/AODSIM/130X_mcRun3_2023_realistic_postBPix_v2-v3 already successfully deleted.
2023-11-15 13:14:39,554:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run3Summer22EEDRPremix/Zto2Nu-4Jets_HT-400to800_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/124X_mcRun3_2022_realistic_postEE_v1-v2 already successfully deleted.
2023-11-15 13:14:39,554:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run3Summer22MiniAODv4/TTto4Q_TuneCP5Down_13p6TeV_powheg-pythia8/MINIAODSIM/130X_mcRun3_2022_realistic_v5-v2 already successfully deleted.
2023-11-15 13:14:39,554:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run3Summer23DRPremix/DYto2TautoMuTauh_M-50_TuneCP5_13p6TeV_madgraphMLM-pythia8/AODSIM/130X_mcRun3_2023_realistic_v14-v1 already successfully deleted.
2023-11-15 13:14:39,554:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run3Summer22EENanoAODv12/TWminusto2L2Nu_MT-171p5_TuneCP5_13p6TeV_powheg-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v2 already successfully deleted.
2023-11-15 13:14:39,554:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run2022G/ParkingDoubleMuonLowMass4/MINIAOD/22Sep2023-v1 already successfully deleted.
2023-11-15 13:14:39,554:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run2023C/ParkingDoubleMuonLowMass3/MINIAOD/22Sep2023_v1-v1 already successfully deleted.
2023-11-15 13:14:39,555:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run2022F/EGamma/NANOAOD/22Sep2023-v1 already successfully deleted.
2023-11-15 13:14:39,555:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run2022E/MinimumBias/NANOAOD/22Sep2023-v1 already successfully deleted.
2023-11-15 13:14:39,555:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/Run3Summer22NanoAODv12/TTto4Q_TuneCP5Down_13p6TeV_powheg-pythia8/NANOAODSIM/130X_mcRun3_2022_realistic_v5-v2 already successfully deleted.
2023-11-15 13:14:39,555:INFO:MSUnmerged:cleanRSE(): RSE: T2_IT_Legnaro, dir: /store/unmerged/RunIIAutumn18DRPremix/HiggsToLLTo4b_mH_600_mX_60_ctau_0p1_TuneCP5_13TeV_pythia8/AODSIM/102X_upgrade2018_realistic_v15-v2 already successfully deleted.
...

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, please find a couple of enhancement suggestions along the code.

src/python/WMCore/MicroService/MSUnmerged/MSUnmerged.py Outdated Show resolved Hide resolved
src/python/WMCore/MicroService/MSUnmerged/MSUnmerged.py Outdated Show resolved Hide resolved
@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Nov 16, 2023

And while working on this bugix I can now log the full sequence of events which happen when a random error from gfal + WebDAV occurs (the same error as explained here: #11658 (comment)): [1]
In contrast to what it should look like when we do not have those random errors [2].

In [1] we enter a long sequence of:

  • First we follow a linear sequence of file deletions in slices of several hundreds at a time, following the list of files for the current directory and constructed from the information provided by RucioConMon
  • We then eventually manage to remove the so cleaned directory or
  • If we are unlucky enough and hit the same SSL error again, we enter the recursive iterations for trying to purge the already empty space under the current directory. (as is exactly the case here)
  • And since those random errors seem to depend on the amount of remote operations executed, the bigger the recursion the bigger the chance to hit one of those again. and at the end fail to remove the directory (again as is exactly the case here)

Maybe we should consider gfal retries. What do you think @amaltaro @vkuznet

[1]

2023-11-16 14:35:01,927:DEBUG:MSUnmerged:cleanRSE(): 
RSE: T2_ES_CIEMAT 
DELETING: /store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2.
PFN list with: 14 entries: 
    davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/2540170/D139366F-E8FC-D34C-8BB9-1AA950B7B99D.root
    davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/2540251/05F9FC1B-D89F-7E42-BF58-A4E557732404.root
    davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/2540255/1C7483FE-5137-2A45-AD88-2D378833081E.root
    davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2/30108/5C5A8405-C699-204F-B025-99AAFB13F784.root
    ...

2023-11-16 14:35:01,927:INFO:MSUnmerged:cleanRSE(): Trying to remove nonempty directory: /store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2

2023-11-16 14:35:32,554:ERROR:MSUnmerged:_rmDir(): FAILED to remove directory: davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2: gfalException: DavPosix::rmdir  (Neon): Could not read status line: SSL error: unexpected eof while reading, gfalErrorCode: 112
2023-11-16 14:35:32,555:INFO:MSUnmerged:cleanRSE(): Trying to clean the contents of nonempty directory: /store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2 in slices of: 100 files
2023-11-16 14:35:38,745:DEBUG:MSUnmerged:cleanRSE(): RSE: T2_ES_CIEMAT, Dir: /store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2, delResult: [None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]
2023-11-16 14:35:38,746:INFO:MSUnmerged:cleanRSE(): RSE: T2_ES_CIEMAT, Dir: /store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2, filesDeletedSuccess: 14

2023-11-16 14:36:09,086:ERROR:MSUnmerged:_rmDir(): FAILED to remove directory: davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2: gfalException: DavPosix::rmdir  (Neon): Could not read status line: SSL error: unexpected eof while reading, gfalErrorCode: 112
2023-11-16 14:36:09,087:INFO:MSUnmerged:cleanRSE(): Trying to recursively purge directory: /store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2:

2023-11-16 14:36:39,386:ERROR:MSUnmerged:_purgeTree(): FAILED to list dirEntry: davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2: gfalException: (Neon): Could not read status line: SSL error: unexpected eof while reading, gfalErrorCode: 112
2023-11-16 14:36:39,387:ERROR:MSUnmerged:cleanRSE(): RSE: T2_ES_CIEMAT Failed to purge directory: /store/unmerged/RunIISummer20UL16RECOAPV/G1Jet_LHEGpT-50To100_PhotonInBarrel_Mjj-300_TuneCP5_13TeV-amcatnlo-pythia8/AODSIM/106X_mcRun2_asymptotic_preVFP_v8-v2

[2]

2023-11-16 14:36:39,396:DEBUG:MSUnmerged:cleanRSE(): 
RSE: T2_ES_CIEMAT 
DELETING: /store/unmerged/RunIISummer16DR80Premix/NMSSM_XToYHTo4B_MX-3500_MY-650_TuneCP5_13TeV-madgraph-pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1.
PFN list with: 3 entries: 
    davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer16DR80Premix/NMSSM_XToYHTo4B_MX-3500_MY-650_TuneCP5_13TeV-madgraph-pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/80000/263FA435-BF80-ED11-9727-0CC47A5FC619.root
    davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer16DR80Premix/NMSSM_XToYHTo4B_MX-3500_MY-650_TuneCP5_13TeV-madgraph-pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/80000/805C24FB-B680-ED11-9C2C-0CC47A5FC619.root
    davs://srm.ciemat.es:2880/pnfs/ciemat.es/data/cms/prod/store/unmerged/RunIISummer16DR80Premix/NMSSM_XToYHTo4B_MX-3500_MY-650_TuneCP5_13TeV-madgraph-pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/80000/94320CD6-AF80-ED11-80F2-0CC47A5FC619.root

2023-11-16 14:36:39,396:INFO:MSUnmerged:cleanRSE(): Trying to remove nonempty directory: /store/unmerged/RunIISummer16DR80Premix/NMSSM_XToYHTo4B_MX-3500_MY-650_TuneCP5_13TeV-madgraph-pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1
2023-11-16 14:36:42,340:INFO:MSUnmerged:cleanRSE(): RSE: T2_ES_CIEMAT  Success deleting directory: /store/unmerged/RunIISummer16DR80Premix/NMSSM_XToYHTo4B_MX-3500_MY-650_TuneCP5_13TeV-madgraph-pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 3 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 7 warnings
    • 36 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14633/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 7 warnings
    • 36 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14635/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

@amaltaro @vkuznet , please go ahead and take a look at the final version of this code, because otherwise I'll continue diving into details, but instead, I'd like to merge this one and deploy in production ASAP.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 1 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 7 warnings
    • 36 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14636/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, I know you have been testing this code in your development environment.

So I would say to go ahead and merge it whenever you feel confortable with the changes. Before deploying in production, we need to backport it to the correct branch and cut a new patch release, basically what I explained here:
#11781 (review)

@todor-ivanov
Copy link
Contributor Author

Thanks @amaltaro I'll do it tomorrow morning

@todor-ivanov
Copy link
Contributor Author

Hi @amaltaro, I am squashing and merging this PR at this stage. There are few more improvements for which I already have a prototype in my dev environment, like handling some more gfalError codes and implementing retries in some specific use cases. But for them we may need a little bit of feedback from gfal developers. So I'll create a separate WMCore issue for this and will proceed with what I already have here which is already well tested.

FYI: @vkuznet

… operations during recursion.

Fix rewriting timestamps for all documents in the REST interface.

Trying to remove nonEmpty directories as early as possible.

Adding _rmDir auxiliary method

Fix broken log message && Add gfall error code to the error log messages

Avoid duplicate attempts to enter and already missing directory.

Update filesDeletedFail counters.
@todor-ivanov todor-ivanov force-pushed the bugfix_MSUnmerged_UnhandledGfalException_fix-11793 branch from 034d744 to e59ac60 Compare November 17, 2023 14:42
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 tests no longer failing
    • 1 changes in unstable tests
  • Python3 Pylint check: succeeded
    • 7 warnings
    • 36 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 4 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14639/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov merged commit f6b0260 into dmwm:master Nov 20, 2023
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants