For async calls always store XrdCl::FileSystem with the response-handler. #35455

osschar · 2021-09-28T17:44:57Z

This is a continuation of #34700.

XrdCl's internal response handlers (before user-supplied handler is called) access URL string that is part of the FileSystem object. As FileSystemObject was created on the stack, this gets destroyed while async request are still pending.

This PR moves FileSystem object into our response-handlers so it's lifetime is the same as that of the response. All response handlers auto-destruct as needed.

…ler.

cmsbuild · 2021-09-28T17:45:16Z

A new Pull Request was created by @osschar (Matevž Tadel) for CMSSW_12_1_DEVEL_X.

It involves the following packages:

Utilities/XrdAdaptor (core)

@makortel, @smuzaffar, @cmsbuild, @Dr15Jones can you please review it and eventually sign? Thanks.
@wddgit this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

makortel · 2021-09-28T17:50:40Z

test parameters:

enable_tests = threading

makortel · 2021-09-28T17:50:46Z

@cmsbuild, please test

makortel · 2021-09-28T17:50:53Z

Thanks @osschar!

cmsbuild · 2021-09-28T20:50:33Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7b09b3/19209/summary.html
COMMIT: 360ca4b
CMSSW: CMSSW_12_1_DEVEL_X_2021-09-27-2300/slc7_amd64_gcc900
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35455/19209/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 40
DQMHistoTests: Total histograms compared: 3211080
DQMHistoTests: Total failures: 0
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3211058
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 39 files compared)
Checked 169 log files, 37 edm output root files, 40 DQM output files
TriggerResults: no differences found

makortel · 2021-09-28T20:56:36Z

+1

Base branch is already DEVEL.

cmsbuild · 2021-09-28T20:56:58Z

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_1_DEVEL_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

qliphy · 2021-09-29T00:43:20Z

+1

smuzaffar · 2021-09-30T06:41:23Z

This has been integrated in DEVEL IBs and we now have two DEVEL IBs with out any failure. Thanks a lot @osschar for these changes

smuzaffar · 2021-09-30T07:00:21Z

backport done
Successfully backported PR #35455 as #35481 for branch master

osschar · 2021-09-30T08:00:29Z

Yay, good news :)

Forgot to mention before, yesterday I also went over the dead-lock-like stack-traces Dan pointed to previously ... and I realized it was the same thing, that XrdCl response handler came back upon a released FileSystem object and what was supposed to be its internal mutex was in a presumed locked state ... and this blocked the destruction of XrdCl at the end of the job.

I also asked Michal about stack-allocated FileSystem object vs. xroot-5.3 and he says this should never have worked, even with 4.12, that the FS object needs to stay alive until the async handler completes. Now, since this (apparently) never gave us trouble, I believe the calls in question were really implemented as synchronous calls in 4.12.

Once this is merged in master (thank you!) I will also add an increased source-reselection timeout after server response is "max-number-of-redirections-reached" as it is unlikely any further open requests could result in new sources.

I am somewhat tempted to investigate if xrootd-5.3 could be backported to some older releases ... in particular those that will be used for analysis as the changes introduced here allow for a more error tolerant configuration of XCache clusters in view of errors encountered on individual cache servers. [ Alternatively, if storage.xml is (or can be made) CMSSW release-dependent, one could operate two cache redirectors with different settings for releases that have 5.3 and those that still have 4.12. I'll check how this works with computing guys and what releases are realistically expected to be used for physics in the near future (I suspect I know what the answer to this will be, sigh :) ). ]

For async calls always store XrdCl::FileSystem with the response-hand…

360ca4b

…ler.

cmsbuild added core-pending orp-pending pending-signatures tests-pending labels Sep 28, 2021

cmsbuild added tests-started and removed tests-pending labels Sep 28, 2021

cmsbuild added tests-approved and removed tests-started labels Sep 28, 2021

cmsbuild added core-approved fully-signed and removed core-pending pending-signatures labels Sep 28, 2021

cmsbuild added orp-approved and removed orp-pending labels Sep 29, 2021

cmsbuild merged commit 9ba0bf8 into cms-sw:CMSSW_12_1_DEVEL_X Sep 29, 2021

smuzaffar mentioned this pull request Sep 30, 2021

For async calls always store XrdCl::FileSystem with the response-handler. #35481

Merged

This was referenced May 9, 2023

Update XRootD to 5.x in CMSSW 10_6_X cms-sw/cmsdist#8484

Merged

[10_6_X] Fixes for XRootD 5.x #41599

Merged

cmsbuild mentioned this pull request Oct 18, 2023

[DEVEL] drop -Wno-deprecated and -Wno-deprecated-copy flags cms-sw/cmsdist#8766

Merged

cmsbuild mentioned this pull request Nov 17, 2023

[GCC] Build latest GCC 12 with stdcxx backtrace enabled cms-sw/cmsdist#8821

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For async calls always store XrdCl::FileSystem with the response-handler. #35455

For async calls always store XrdCl::FileSystem with the response-handler. #35455

osschar commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

makortel commented Sep 28, 2021

makortel commented Sep 28, 2021

makortel commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

makortel commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

qliphy commented Sep 29, 2021

smuzaffar commented Sep 30, 2021

smuzaffar commented Sep 30, 2021

osschar commented Sep 30, 2021

For async calls always store XrdCl::FileSystem with the response-handler. #35455

For async calls always store XrdCl::FileSystem with the response-handler. #35455

Conversation

osschar commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

makortel commented Sep 28, 2021

makortel commented Sep 28, 2021

makortel commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

Comparison Summary

makortel commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

qliphy commented Sep 29, 2021

smuzaffar commented Sep 30, 2021

smuzaffar commented Sep 30, 2021

osschar commented Sep 30, 2021