Kinect blocks on events (osx) #229

Closed
fernLightning opened this Issue May 10, 2011 · 5 comments

Projects

None yet

2 participants

@fernLightning

Occurs on shutdown, freenect_process_events(), freenect_stop_depth(), freenect_stop_video(), or on shutdown if had iso errors during init....

Current solution so far (for osx), modified usb_libusb10.c

  • replaced all instances of "libusb_handle_events(..)", with "struct timeval tv = {.tv_sec=1, .tv_usec=0}; libusb_handle_events_timeout(..)" to make things timeout faster.
    • made fnusb_start_iso() do a "strm->dead_xfers++;" when it "Failed to submit isochronous transfer" - to ensure shutdown waits on the correct number of events.
Member
zarvox commented May 14, 2011

Yes, we do need to add some proper handling of transfers that come back with some libusb error. Rather than accumulate a pile of workarounds, I'd like to rework the transfers to correctly handle the differences between failed, cancelled, and active transfers, at which point I'd expect this to all work properly.

And you're not alone: issue #227

Still locks up on device disconnect. The only solution I've found is to provide means to escape the while loop after a timeout. I'm unsure of the side effects of this, but it is working reliably for me.

int fnusb_stop_iso(fnusb_dev _dev, fnusb_isoc_stream *strm) {
...
struct timeval start;
gettimeofday(&start, NULL);
while (strm->dead_xfers < strm->num_xfers) {
struct timeval tv = {.tv_sec=1, .tv_usec=0}; // dont block forever, wait at most 1sec
libusb_handle_events_timeout(ctx->usb.ctx, &tv);
gettimeofday(&tv, NULL);
long msecs = (tv.tv_sec - start.tv_sec)_1000;
msecs += (tv.tv_usec - start.tv_usec)/1000;
if(msecs > 4000) {
FN_WARNING("Stopping - giving up after %ldms\n", msecs);
break;
}
}
...
}

Member
zarvox commented Sep 27, 2011

Hmm, so we cancel the transfers, but they never come back as dead? At first glance, that sounds like a libusb bug - we should be getting completed transfers with LIBUSB_TRANSFER_CANCELLED or LIBUSB_TRANSFER_NO_DEVICE.

I'll try to reproduce this - you say this happens to the demos on OSX when you unplug the Kinect?

It sometimes happens on disconnect, not always.
It happens 1 out of 10 times here in the office with one kinect, but seems to happen far more often on site (where we have three kinects on the one machine).

I've not tried the Demos recently.

@zarvox zarvox added a commit that referenced this issue Nov 21, 2011
@zarvox zarvox usb_libusb10.c: Keep track of dead transfers more correctly.
We failed to catch if the resubmission of a transfer failed in the transfer's
isochronous callback.  This is bad because this is a very likely failure mode
if we accidentally unplug the Kinect - the last set of transfers complete
successfully, but the device is no longer there when we resubmit them.

It's okay for us to not know which particular transfers are dead, since
libusb_cancel_transfer() will handle transfers that don't exist by returning
LIBUSB_ERROR_NOT_FOUND if the transfer isn't pending.

This is related to #229, but I'm not sure if this is sufficient to fix it.
libusb on OSX doesn't always return what I expect it to.

Signed-off-by: Drew Fisher <drew.m.fisher@gmail.com>
75a3eb2
@zarvox zarvox added a commit that closed this issue Nov 21, 2011
@zarvox zarvox Behave sanely when the Kinect is unplugged.
libusb_handle_events is thread-safe, but not reentrant.  Which means that we
can't call fnusb_stop_iso which calls libusb_handle_events from inside the
isochronous callback function.

Instead, we add a member to fnusb_dev that tracks if the device is permanently
dead, and set it to true when we get a LIBUSB_TRANSFER_NO_DEVICE (or on resubmission,
LIBUSB_ERROR_NO_DEVICE).

Then, in freenect_process_events, after calling libusb_handle_events
internally, we check to see if any of the devices in the context have this flag
set.  If they do, then we should return some nonzero error value, which is
currently -1.

In the future, we intend to provide callbacks so that clients can receive
explicit notifications when a device has disappeared, so the code can robustly
handle that situation.  Future work.

Fixes #229 (I think)

Signed-off-by: Drew Fisher <drew.m.fisher@gmail.com>
3f7e959
@zarvox zarvox closed this in 3f7e959 Nov 21, 2011
Member
zarvox commented Nov 21, 2011

The latest updates in unstable plus the one that adds freenect_process_events_timeout() should hopefully clear up the issues you noted.

In addition, we now return -1 from freenect_process_events if any of the underlying devices in the specified context have disappeared. The library knows which device vanished, but we're still trying to figure out a reasonably clean way to pass that up to the client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment