-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For async calls always store XrdCl::FileSystem with the response-handler. #35455
For async calls always store XrdCl::FileSystem with the response-handler. #35455
Conversation
A new Pull Request was created by @osschar (Matevž Tadel) for CMSSW_12_1_DEVEL_X. It involves the following packages:
@makortel, @smuzaffar, @cmsbuild, @Dr15Jones can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
test parameters:
|
@cmsbuild, please test |
Thanks @osschar! |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7b09b3/19209/summary.html Comparison SummarySummary:
|
+1 Base branch is already DEVEL. |
This pull request is fully signed and it will be integrated in one of the next CMSSW_12_1_DEVEL_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
This has been integrated in DEVEL IBs and we now have two DEVEL IBs with out any failure. Thanks a lot @osschar for these changes |
Yay, good news :) Forgot to mention before, yesterday I also went over the dead-lock-like stack-traces Dan pointed to previously ... and I realized it was the same thing, that XrdCl response handler came back upon a released FileSystem object and what was supposed to be its internal mutex was in a presumed locked state ... and this blocked the destruction of XrdCl at the end of the job. I also asked Michal about stack-allocated FileSystem object vs. xroot-5.3 and he says this should never have worked, even with 4.12, that the FS object needs to stay alive until the async handler completes. Now, since this (apparently) never gave us trouble, I believe the calls in question were really implemented as synchronous calls in 4.12. Once this is merged in master (thank you!) I will also add an increased source-reselection timeout after server response is "max-number-of-redirections-reached" as it is unlikely any further open requests could result in new sources. I am somewhat tempted to investigate if xrootd-5.3 could be backported to some older releases ... in particular those that will be used for analysis as the changes introduced here allow for a more error tolerant configuration of XCache clusters in view of errors encountered on individual cache servers. [ Alternatively, if storage.xml is (or can be made) CMSSW release-dependent, one could operate two cache redirectors with different settings for releases that have 5.3 and those that still have 4.12. I'll check how this works with computing guys and what releases are realistically expected to be used for physics in the near future (I suspect I know what the answer to this will be, sigh :) ). ] |
This is a continuation of #34700.
XrdCl's internal response handlers (before user-supplied handler is called) access URL string that is part of the FileSystem object. As FileSystemObject was created on the stack, this gets destroyed while async request are still pending.
This PR moves FileSystem object into our response-handlers so it's lifetime is the same as that of the response. All response handlers auto-destruct as needed.