Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Xrootd requests at the CMSSW-level #18440

Open
bbockelm opened this issue Apr 22, 2017 · 7 comments
Open

Timeout Xrootd requests at the CMSSW-level #18440

bbockelm opened this issue Apr 22, 2017 · 7 comments

Comments

@bbockelm
Copy link
Contributor

We currently rely on Xrootd's timeout mechanism -- issues with the current release (xrootd 4.5.0) have shown that this isn't 100% reliable.

We don't have our own mechanism because the Xrootd callbacks aren't cancellable -- they might fire at any time after we've given up, even after we've destroyed the corresponding object.

There were two ideas bounced around:

  • When we timeout, purposely leak the RequestManager object. The (eventual) Xrootd callback will access the zombie object. Since the timeout/failure will generate an CMSSW exception for a read failure, the job is likely going to fail anyway -- no harm in leaking!.
  • Instead of passing a pointer to the RequestManager object, create a new callback-specific object on the heap. The callback-specific object would hold a weak_ptr to the original RequestManager. We proceed with the callback only if the weak_ptr can be materialized to a strong pointer.
    • The downside is that we'll have to do this for every read request, meaning we introduce more heap allocations per read request. Possibly expensive.
@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 22, 2017

A new Issue was created by @bbockelm Brian Bockelman.

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@davidlange6
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Dr15Jones
Copy link
Contributor

@bbockelm I'm confused, if the RequestManager can be leaked then it is on the heap. If it is on the heap then it could be held by a std::shared_ptr<RequestManager>. If it is held by a std::shared_ptr<> then one can create as many std::weak_ptr<> as we want and hand those to the callback specific object.

What am I missing?

@bbockelm
Copy link
Contributor Author

@Dr15Jones - yup, that's almost exactly the second option outlined. Currently, there is no callback-specific object passed around (instead, the callback is given a pointer to the long-lived RequestManager), meaning we'd have to create a callback-specific object on the heap for each IO request to hold the std::weak_ptr<>.

I think this is acceptable since there are other places that allocate per-request, so we're already paying the cost of per-IO heap allocations.

@Dr15Jones
Copy link
Contributor

@bbockelm I'm still confused. If the callback is given a pointer to the 'long-lived' RequestManager, I don't see why it couldn't instead just be given a std::weak_ptr<> to the same RequestManager.

@smuzaffar
Copy link
Contributor

is this still an open issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants