Timeout Xrootd requests at the CMSSW-level #18440

bbockelm · 2017-04-22T03:44:24Z

We currently rely on Xrootd's timeout mechanism -- issues with the current release (xrootd 4.5.0) have shown that this isn't 100% reliable.

We don't have our own mechanism because the Xrootd callbacks aren't cancellable -- they might fire at any time after we've given up, even after we've destroyed the corresponding object.

There were two ideas bounced around:

When we timeout, purposely leak the RequestManager object. The (eventual) Xrootd callback will access the zombie object. Since the timeout/failure will generate an CMSSW exception for a read failure, the job is likely going to fail anyway -- no harm in leaking!.
Instead of passing a pointer to the RequestManager object, create a new callback-specific object on the heap. The callback-specific object would hold a weak_ptr to the original RequestManager. We proceed with the callback only if the weak_ptr can be materialized to a strong pointer.
- The downside is that we'll have to do this for every read request, meaning we introduce more heap allocations per read request. Possibly expensive.

The text was updated successfully, but these errors were encountered:

cmsbuild · 2017-04-22T03:44:37Z

A new Issue was created by @bbockelm Brian Bockelman.

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

davidlange6 · 2017-04-24T11:31:35Z

assign core

cmsbuild · 2017-04-24T11:31:51Z

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

Dr15Jones · 2017-04-24T13:10:01Z

@bbockelm I'm confused, if the RequestManager can be leaked then it is on the heap. If it is on the heap then it could be held by a std::shared_ptr<RequestManager>. If it is held by a std::shared_ptr<> then one can create as many std::weak_ptr<> as we want and hand those to the callback specific object.

What am I missing?

bbockelm · 2017-04-24T14:22:44Z

@Dr15Jones - yup, that's almost exactly the second option outlined. Currently, there is no callback-specific object passed around (instead, the callback is given a pointer to the long-lived RequestManager), meaning we'd have to create a callback-specific object on the heap for each IO request to hold the std::weak_ptr<>.

I think this is acceptable since there are other places that allocate per-request, so we're already paying the cost of per-IO heap allocations.

Dr15Jones · 2017-04-24T14:25:37Z

@bbockelm I'm still confused. If the callback is given a pointer to the 'long-lived' RequestManager, I don't see why it couldn't instead just be given a std::weak_ptr<> to the same RequestManager.

smuzaffar · 2023-11-14T11:04:54Z

is this still an open issue?

cmsbuild added the pending-assignment label Apr 22, 2017

cmsbuild added core-pending pending-signatures and removed pending-assignment labels Apr 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout Xrootd requests at the CMSSW-level #18440

Timeout Xrootd requests at the CMSSW-level #18440

bbockelm commented Apr 22, 2017

cmsbuild commented Apr 22, 2017 •

edited by smuzaffar

Loading

davidlange6 commented Apr 24, 2017

cmsbuild commented Apr 24, 2017

Dr15Jones commented Apr 24, 2017

bbockelm commented Apr 24, 2017

Dr15Jones commented Apr 24, 2017

smuzaffar commented Nov 14, 2023

Timeout Xrootd requests at the CMSSW-level #18440

Timeout Xrootd requests at the CMSSW-level #18440

Comments

bbockelm commented Apr 22, 2017

cmsbuild commented Apr 22, 2017 • edited by smuzaffar Loading

davidlange6 commented Apr 24, 2017

cmsbuild commented Apr 24, 2017

Dr15Jones commented Apr 24, 2017

bbockelm commented Apr 24, 2017

Dr15Jones commented Apr 24, 2017

smuzaffar commented Nov 14, 2023

cmsbuild commented Apr 22, 2017 •

edited by smuzaffar

Loading