Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate file descriptor closure in DQMFileSaverPB fix #35881

Merged

Conversation

pmandrik
Copy link
Contributor

@pmandrik pmandrik commented Oct 28, 2021

PR description:

From Srecko Morovic mail :
"We have been seeing rare issues with writing output files in HLT with symptoms of the file sometimes being closed prematurely [1]. We saw similar issues in recent DAQ3 tests that used to happen when DQM File Saver is included (but was not investigated in detail at that time).
On inspecting the code, it turns out that in this module there is a protocol buffer file stream that is closed, and after that also the file descriptor [2] gets closed. In PB documentation [3] it's stated that the file is already closed by closing the PB stream, so the other close should not be necessary.
This could cause the race condition we're seeing, e.g. when in a multi-threaded setup some other thread opens the same file descriptor ID between two close calls. Maybe it also is the cause for other problems we have, such as frontier issues we see occasionally, which could be caused by socket fd close called prematurely.
In my private tests I see the empty output file problem disappearing when I remove line #326. Open fd count per process remains constant (so there is no fd leak introduced by this).
Conclusion from this is that line 326 should be removed. Please have a look and cross-check yourself.

[1] http://cmsonline.cern.ch/cms-elog/1127051
[2] https://github.com/cms-sw/cmssw/blob/master/DQMServices/FileIO/plugins/DQMFileSaverPB.cc#L326
[3] https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.zero_copy_stream_impl#FileOutputStream.Close.details
"

PR validation:

backport tested at p5 DQM playback

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35881/26279

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @pmandrik for master.

It involves the following packages:

  • DQMServices/FileIO (dqm)

@emanueleusai, @ahmad3213, @cmsbuild, @jfernan2, @pmandrik, @pbo0, @rvenditti can you please review it and eventually sign? Thanks.
@barvic this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@jfernan2
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20021/summary.html
COMMIT: 43cc06b
CMSSW: CMSSW_12_1_X_2021-10-27-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35881/20021/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20021/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20021/git-merge-result

RelVals-INPUT

  • 138.3138.3_RunMinimumBias2021Splash+RunMinimumBias2021Splash+RECODR3Splash+HARVESTDR3/step2_RunMinimumBias2021Splash+RunMinimumBias2021Splash+RECODR3Splash+HARVESTDR3.log

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901440
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2901412
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20031/summary.html
COMMIT: 43cc06b
CMSSW: CMSSW_12_1_X_2021-10-27-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35881/20031/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20031/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20031/git-merge-result

RelVals-INPUT

  • 138.3138.3_RunMinimumBias2021Splash+RunMinimumBias2021Splash+RECODR3Splash+HARVESTDR3/step2_RunMinimumBias2021Splash+RunMinimumBias2021Splash+RECODR3Splash+HARVESTDR3.log

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901440
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2901418
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-INPUT
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20041/summary.html
COMMIT: 43cc06b
CMSSW: CMSSW_12_1_X_2021-10-28-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35881/20041/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

  • 138.3138.3_RunMinimumBias2021Splash+RunMinimumBias2021Splash+RECODR3Splash+HARVESTDR3/step2_RunMinimumBias2021Splash+RunMinimumBias2021Splash+RECODR3Splash+HARVESTDR3.log

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901440
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2901412
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 2, 2021

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-9d6b00/20188/summary.html
COMMIT: 43cc06b
CMSSW: CMSSW_12_2_X_2021-11-02-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/35881/20188/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 2901890
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2901862
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 41 files compared)
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@jfernan2
Copy link
Contributor

jfernan2 commented Nov 3, 2021

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 3, 2021

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Nov 3, 2021

+1

@cmsbuild cmsbuild merged commit fc5836f into cms-sw:master Nov 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants