Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating device causes mir-kiosk to crash #60

Closed
daleknauss opened this issue Jun 25, 2021 · 14 comments · Fixed by canonical/mir#2094
Closed

Updating device causes mir-kiosk to crash #60

daleknauss opened this issue Jun 25, 2021 · 14 comments · Fixed by canonical/mir#2094
Assignees

Comments

@daleknauss
Copy link

Hello,

As part of our system, we apply firmware updates to a number of attached devices. Since around the time of the 2.4.0-snap124 snap release, we have started to see mir-kiosk crash when applying an update. Here are the relevant logs:

Jun 25 00:17:26 FMTRMRD29000003 mir-kiosk.daemon[5160]: ERROR: /build/mir-iJR6Nn/mir-2.4.0/src/platform/udev/udev_wrapper.cpp(60): Throw in function {anonymous}::DeviceImpl::DeviceImpl(udev_device*)
Jun 25 00:17:26 FMTRMRD29000003 mir-kiosk.daemon[5160]: Dynamic exception type: boost::wrapexcept<std::runtime_error>
Jun 25 00:17:26 FMTRMRD29000003 mir-kiosk.daemon[5160]: std::exception::what: Udev device does not exist

Any ideas?

@RAOF
Copy link

RAOF commented Jun 25, 2021

Hm. That looks like Mir querying details of a udev device that has gone away. But I can't think of any relevant code that has changed. Perhaps unrelated changes have changed the timing and exposed a race?

@RAOF RAOF self-assigned this Jun 25, 2021
@daleknauss
Copy link
Author

@RAOF as part of the update, the device would go away and reappear so I think your assessment is correct. Please let me know if there are any other details I can provide that might help with this issue.

@AlanGriffiths
Copy link
Contributor

I concur with @RAOF: there are no recent changes in this code, so I suspect the change is unrelated to the 2.4 release.

The root cause of the crash would appear to be that the exception shown is not being handled. Unfortunately, the above log snippet doesn't indicate the context in which the "Udev device does not exist". Is there anything that might be interesting in the log immediately before or after this?

@daleknauss
Copy link
Author

Here are the rest of the logs I could pull out starting from when the update happens. Those unbind and bind failures are fairly common even before this issue.

A few other thoughts: I've reverted to a snap version that was heavily tested and working fine before the latest mir-kisok snap was released. However, I haven't yet done the work to check if one of our Canonical maintained snaps has been updated in the same window. It's possible that a change to another snap might be the root cause. I will be exploring that further.

Jun 23 00:48:52 FMTRMRD2A000045 kernel: uvcvideo: Found UVC 1.10 device (3301:1000)
Jun 23 00:48:52 FMTRMRD2A000045 kernel: input: Device: Device as /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0/input/input20
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper bind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper bind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6688]: Process '/usr/lib/snapd/snap-device-helper bind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 kernel: usb 1-6: USB disconnect, device number 3
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper bind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper bind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6686]: Process '/usr/lib/snapd/snap-device-helper bind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-logind[1902]: Failed to open /dev/input/event8: No such device
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6732]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6732]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6732]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0 0:0' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-1 /devices/pci0000:00/0000:00:14.0/usb1/1-6 189:2' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-2 /devices/pci0000:00/0000:00:14.0/usb1/1-6 189:2' failed with exit code 1.
Jun 23 00:48:52 FMTRMRD2A000045 systemd-udevd[6730]: Process '/usr/lib/snapd/snap-device-helper unbind snap-3 /devices/pci0000:00/0000:00:14.0/usb1/1-6 189:2' failed with exit code 1.
Jun 23 00:48:53 FMTRMRD2A000045 mir-kiosk.daemon[2132]: [2021-06-23 00:48:53.170991] < - debug - > mirserver: Handling Terminated from pid=2132
Jun 23 00:48:53 FMTRMRD2A000045 mir-kiosk.daemon[2132]: ERROR: /build/mir-iJR6Nn/mir-2.4.0/src/platform/udev/udev_wrapper.cpp(60): Throw in function {anonymous}::DeviceImpl::DeviceImpl(udev_device*)
Jun 23 00:48:53 FMTRMRD2A000045 mir-kiosk.daemon[2132]: Dynamic exception type: boost::wrapexcept<std::runtime_error>
Jun 23 00:48:53 FMTRMRD2A000045 mir-kiosk.daemon[2132]: std::exception::what: Udev device does not exist
Jun 23 00:48:53 FMTRMRD2A000045 systemd[1]: snap.mir-kiosk.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 23 00:48:53 FMTRMRD2A000045 systemd[1]: snap.mir-kiosk.daemon.service: Failed with result 'exit-code'.
Jun 23 00:48:54 FMTRMRD2A000045 kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.

@AlanGriffiths
Copy link
Contributor

@daleknauss @RAOF came up with a likely fix that I've incorporated into a new "2.4.1~rc2-snap124" build on the beta channel.

If you have got an environment where you can reproduce the problem as it happens for you, please give this version a try.

It's not yet available for all architectures, but it should be there soon. I'm reaching EOD so letting you know now.

@daleknauss
Copy link
Author

@AlanGriffiths I'm still seeing that crash on 2.4.1~rc2-snap124 (7239):

2021-06-29T18:50:05Z mir-kiosk.daemon[7811]: [2021-06-29 18:50:05.313501] < -warning- > evdev-input: Attempted to remove unmanaged fd 43
2021-06-29T18:50:09Z mir-kiosk.daemon[7811]: [2021-06-29 18:50:09.377665] < - debug - > mirserver: Handling Terminated from pid=7811
2021-06-29T18:50:09Z mir-kiosk.daemon[7811]: ERROR: /build/mir-ES5QiO/mir-2.4.1~rc2-g1b8f979fcb/src/platform/udev/udev_wrapper.cpp(60): Throw in function {anonymous}::DeviceImpl::DeviceImpl(udev_device*)
2021-06-29T18:50:09Z mir-kiosk.daemon[7811]: Dynamic exception type: boost::wrapexcept<std::runtime_error>
2021-06-29T18:50:09Z mir-kiosk.daemon[7811]: std::exception::what: Udev device does not exist
2021-06-29T18:50:09Z systemd[1]: snap.mir-kiosk.daemon.service: Main process exited, code=exited, status=1/FAILURE

@AlanGriffiths
Copy link
Contributor

@AlanGriffiths I'm still seeing that crash on 2.4.1~rc2-snap124

Thanks. I'll take another look armed with that additional information. (I'm pretty sure we fixed a problem, a shame it wasn't the problem you're seeing.)

@AlanGriffiths
Copy link
Contributor

Oops: Overzealous automated close!

@AlanGriffiths AlanGriffiths reopened this Jun 30, 2021
@AlanGriffiths
Copy link
Contributor

@daleknauss there's now an ~rc3 release with an additional fix.

I'm still failing to reproduce the actual problem you're seeing. Any hints you might have on how to reproduce would be welcome.

@daleknauss
Copy link
Author

@AlanGriffiths 2.4.1~rc3-snap124 appears to have resolved the issue. Thank you for resolving this so quickly.

Roughly how long should we expect these fixes to make it onto stable?

@AlanGriffiths
Copy link
Contributor

@AlanGriffiths 2.4.1~rc3-snap124 appears to have resolved the issue.

Thanks @daleknauss, I need to complete some tests and then I'll tag for release. The time to stable then depends on schedules I don't control, but should be of the order of two weeks.

Meanwhile, I'd like to confirm our understanding of what went wrong. You should be getting some "warning" messages in the log (instead of a crash) with the text "Failed to handle UDev". Can you confirm that and share the log for a few lines before and after?

@AlanGriffiths
Copy link
Contributor

@daleknauss one further thought: were you aware of the "Call for testing"?

@AlanGriffiths
Copy link
Contributor

@daleknauss FYI I have reproduced. As a result, I found the log warning message leaves something to be desired (canonical/mir#2095), but fixing that can wait for the next full release.

@daleknauss
Copy link
Author

Thanks Alan. Definitely aware of calls for testing but don't always catch them when they're happening. Thank you. I'll try to see if I can get any more details on our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants