Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete file lists in slices. #10792

Merged

Conversation

todor-ivanov
Copy link
Contributor

Fixes #10713

Status

not-tested

Description

One of the possible reasons for having the Cherripy thread restarting in the MSUnmerged service is a buffer overflow of gfal2 for directories with long lists of files to be deleted. In order to prevent this to happen, with the current change we suggest to slice the list of files in chunks of 100, before we send it to gfal2.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

The msconfig parameter is about to be set with a separate config and linked here.

External dependencies / deployment changes

None

@cmsdmwmbot
Copy link

Jenkins results:

  • Python2 Unit tests: succeeded
  • Python3 Unit tests: succeeded
  • Python2 Pylint check: succeeded
    • 7 warnings
    • 10 comments to review
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 7 warnings
    • 16 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12424/artifact/artifacts/PullRequestReport.html

delResult = ctx.unlink(pfnList)

for pfnSlice in list(grouper(pfnList, self.msConfig["filesToDeleteSliceSize"])):
delResult.append(ctx.unlink(pfnSlice))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todor, please educate me on the behaviour of this ctx.unlink() call. I understand it will return the value None if the file was successfully deleted, right?
How about the response in case of a failure?
And does the response list maintain the same order of input files provided in the list?

I'm asking these questions to see if there is a better way to account for successful deletions (from L315).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amaltaro , It does return None for any successful deletion, but I do not remember what was the value if otherwise. And I for sure cannot guarantee the order of deletions. So one thing is for sure, from the last time I was checking - L315 is the one and only condition to assure that all of the deletions from the bulk list of pfns have been executed successfully.

Sum deletedSuccess in the slices.
@todor-ivanov todor-ivanov force-pushed the bugfix_MSUnmerged_BuffOverflow_fix-10713 branch from 803c09c to 041bea8 Compare September 8, 2021 09:53
@cmsdmwmbot
Copy link

Jenkins results:

  • Python2 Unit tests: succeeded
  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python2 Pylint check: succeeded
    • 7 warnings
    • 10 comments to review
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 7 warnings
    • 16 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12428/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python2 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python2 Pylint check: succeeded
    • 7 warnings
    • 10 comments to review
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 7 warnings
    • 16 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/12429/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt changes, Todor. Code is looking pretty good to me!

@amaltaro amaltaro merged commit 22b37ab into dmwm:master Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MS] MSUnmerged has been restarting in the t2t3 replica
3 participants