Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SysFsDriver keeps gpio line in-use when application crashes #1992

Closed
HakanL opened this issue Dec 5, 2022 · 23 comments
Closed

SysFsDriver keeps gpio line in-use when application crashes #1992

HakanL opened this issue Dec 5, 2022 · 23 comments
Labels
bug Something isn't working Priority:2 Work that is important, but not critical for the release tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly

Comments

@HakanL
Copy link
Contributor

HakanL commented Dec 5, 2022

Describe the bug

When using the SysFsDriver and the application crashes then the Gpio line is left in-use and can't be used again with the libgpiod driver.

Steps to reproduce

  1. Run gpioinfo 0 to verify that line 17 is unused.
  2. Download and run this demo app: https://github.com/HakanL/IotGpioDriverError which crashes (throws an exception).
  3. Run gpioinfo 0 again to see that line 17 says now that it's in-use by sysfs.

If I try to run the demo app again then that works, but if I switch to use libgpiod instead then I'll get error code 16 (resource in use / EBUSY).

Expected behavior

Any gpio resource allocated by the SysFsDriver should be released, even if the application crashes.

Actual behavior

Used gpio lines are kept in-use.

Versions used

  • dotnet 7.0
  • System.Device.Gpio: 2.2.0
  • Iot.Device.Bindings: 2.2.0

Tracking external issue: dotnet/runtime#79155

@HakanL HakanL added the bug Something isn't working label Dec 5, 2022
@ghost ghost added the untriaged label Dec 5, 2022
@pgrawehr
Copy link
Contributor

pgrawehr commented Dec 5, 2022

Thanks for this observation. I don't know whether there's an easy fix for this problem, since it feels awkward having to use the sysfs driver to properly initialize libgpiod after such a crash.

@raffaeler
Copy link
Contributor

@HakanL Could you please post the stack trace of the fault when running on libgpiod? I don't have a RPi here now.

@HakanL
Copy link
Contributor Author

HakanL commented Dec 5, 2022

@raffaeler Sure, here's the stack trace (but IMHO the issue is in the SysFsDriver, the Gpio line is marked as in-use even when the process is completely gone.

GPIO Controller Driver = RaspberryPi3Driver
System.IO.IOException: Error while requesting event listener for pin 17, error code: 16
   at System.Device.Gpio.Drivers.LibGpiodDriverEventHandler.SubscribeForEvent(SafeLineHandle pinHandle)
   at System.Device.Gpio.Drivers.LibGpiodDriverEventHandler..ctor(Int32 pinNumber, SafeLineHandle safeLineHandle)
   at System.Device.Gpio.Drivers.LibGpiodDriver.PopulateEventHandler(Int32 pinNumber)
   at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
   at System.Device.Gpio.Drivers.LibGpiodDriver.AddCallbackForPinValueChangedEvent(Int32 pinNumber, PinEventTypes eventTypes, PinChangeEventHandler callback)
   at System.Device.Gpio.Drivers.RaspberryPi3LinuxDriver.AddCallbackForPinValueChangedEvent(Int32 pinNumber, PinEventTypes eventTypes, PinChangeEventHandler callback)
   at System.Device.Gpio.Drivers.RaspberryPi3Driver.AddCallbackForPinValueChangedEvent(Int32 pinNumber, PinEventTypes eventTypes, PinChangeEventHandler callback)
   at System.Device.Gpio.GpioController.RegisterCallbackForPinValueChangedEvent(Int32 pinNumber, PinEventTypes eventTypes, PinChangeEventHandler callback)
   at Iot.Device.Button.GpioButton..ctor(Int32 buttonPin, TimeSpan doublePress, TimeSpan holding, GpioController gpio, Boolean shouldDispose, PinMode pinMode, TimeSpan debounceTime)
   at Iot.Device.Button.GpioButton..ctor(Int32 buttonPin, GpioController gpio, Boolean shouldDispose, PinMode pinMode, TimeSpan debounceTime)
   at Program.<Main>$(String[] args) in C:\Projects\Other\IotButton1\Program.cs:line 20

This is what the output of gpioinfo shows:
line 17: "GPIO17" "sysfs" input active-high [used]

@HakanL
Copy link
Contributor Author

HakanL commented Dec 5, 2022

Thanks for this observation. I don't know whether there's an easy fix for this problem, since it feels awkward having to use the sysfs driver to properly initialize libgpiod after such a crash.

Understood, it may not be possible to release the resource in the scenario (in the SysFsDriver). This is the first time I've seen an application on Linux keeping resources after it's killed, which is the reason why I wasn't expecting this behavior (and waited a long time to reboot). I also tested to run the application again with the SysFsDriver, and exiting it properly (after it had crashed before), to see if it would release the line. The application runs fine (using the gpio line), but it doesn't release the line when exiting properly. It is possible of course that it's an issue with the SysFs "system", but that is outside my expertise.

@raffaeler
Copy link
Contributor

Sure, here's the stack trace (but IMHO the issue is in the SysFsDriver, the Gpio line is marked as in-use even when the process is completely gone.

Yes, I got this, but I wanted to understand where exactly the problem happens.

I guess that the SysFsDriver should subscribe some process-termination hook to close the native resource, but I am not familiar with that code.

@krwq krwq added tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly Priority:1 Work that is critical for the release, but we could probably ship without and removed tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly untriaged labels Dec 8, 2022
@krwq
Copy link
Member

krwq commented Mar 2, 2023

[Triage] @HakanL can you share minimal repro? We're having hard time telling if your scenario is by design or not

@krwq krwq added Needs: Author Feedback We are waiting for author to react to feedback (action required) Priority:3 Work that is nice to have and removed Priority:1 Work that is critical for the release, but we could probably ship without labels Mar 2, 2023
@HakanL
Copy link
Contributor Author

HakanL commented Mar 2, 2023

@krwq I have a link to a repo above, it's very simple and just shows that if there's an exception (any exception, it's not related to the actual exception I'm throwing, that's just to show the issue) then the gpio line is captured even after the process terminates. In other words I can't re-run the application until I reboot.

@ghost ghost removed the Needs: Author Feedback We are waiting for author to react to feedback (action required) label Mar 2, 2023
@joperezr
Copy link
Member

joperezr commented Mar 2, 2023

Oh, got it, thanks for the repro, we somehow missed that originally. What we were more curious about was whether or not you had wrapped the GpioController inside a using statement (as we would have expected that Dispose would cleanup resources), but looks like you are doing that so we should take another look to see what is going on and why is Dispose not cleaning up those files.

@raffaeler
Copy link
Contributor

@HakanL thanks for the repro that was impeccable. I made some tests and the code inside GpioController and SysFsDriver is fine.
This issue is related to this other known problem on .NET on Unix machines.

More specifically, this comment explains the reasons why it has not been already fixed.

We will discuss again this issue in the Triage to understand whether there is some other possible solution to resolve this scenario.

@HakanL
Copy link
Contributor Author

HakanL commented Mar 3, 2023

@raffaeler Thanks for looking into it. But why would it then be different for the GpioControllerLibGpiodDriver, is it using some other mechanism to release the resources?

@raffaeler
Copy link
Contributor

I am not sure to understand your question.
The SysFsDriver relies on the linux file system to open and manage the pin. Using files you manage the pins. If the app does not release the pin, files are still on the file system and the pin is not considered closed even if the process closes.
The GpioController.Dispose calls the driver.Dispose which in your case is SysFsDriver.
The problem here is that GpioController.Dispose is not called, even if the using at the beginning is meant to call it. This problem is (.NET on linux)-specific as it is well-explained in the link I posted.

@HakanL
Copy link
Contributor Author

HakanL commented Mar 3, 2023

Sorry, used the wrong name. I meant why is it different for the LibGpiodDriver driver? It doesn't seem to have the same issue.

@raffaeler
Copy link
Contributor

The LibGpioDriver is far more efficient as it uses a library (APIs) instead of creating/reading/writing files on the sys/class/gpio directory.
This implies that nothing is left on the file system after the process crashes.

@HakanL
Copy link
Contributor Author

HakanL commented Mar 3, 2023

Ah, then I understand, thank you.

@raffaeler
Copy link
Contributor

BTW, generally speaking, you may always want to use libgpio (apt-get install libgpiod-dev) for perf reasons.

@HakanL
Copy link
Contributor Author

HakanL commented Mar 3, 2023

Makes sense, I can't remember why I used the SysFsDriver, it may have just been an attempt to see the difference. But I spent a long time trying to figure out why I would get the resource busy error (always expecting an issue with your own code first of course), so I figured I should report it as it was inconsistent between the drivers.

@raffaeler
Copy link
Contributor

The reason may be very simple: the libgpiod driver is not installed by default on Raspbian. So you had to apt-get it by yourself.
Also, in future versions of the iot library it will be straightforward to see what driver has been automatically selected when you don't specify one.

@HakanL
Copy link
Contributor Author

HakanL commented Mar 3, 2023

Ahh, I think you're right, I used the default constructor and it only worked once (because I had a crash). That's when I started down the path of drivers. Several months ago, hard to remember ;)

@krwq krwq added untriaged and removed Priority:3 Work that is nice to have labels Mar 3, 2023
@krwq
Copy link
Member

krwq commented Mar 3, 2023

we need to re-triage this with new info

@pgrawehr
Copy link
Contributor

pgrawehr commented Mar 5, 2023

I think the problem here is very simple:

  • When an application using the sysfs driver crashes, it leaves open any pins.
  • The libgpiod driver cannot open a pin that has been opened by sysfs.

I think there is no perfect solution to point 1, since there is no way to clean up a file if the application crashes or worse, is killed.

The problem does not happen if either:

  • Always the same driver is used (the sysfs driver recovers correctly after a crash)
  • The application terminates correctly and disposes it's resources correctly

@raffaeler
Copy link
Contributor

@pgrawehr I made some tests and the problem has a different nature.
The exception should cause the dispose method to be called. This does not happen on Linux as it should (see the link I posted). It's not exactly a .NET bug, rather a difficulty for the .NET runtime team to find a way to call it while maintaining useful info in the crash dump.

So yes, the pin is left open but this is a problem peculiar to our library as the SysFsDriver, for its nature, uses files that are preserved after the process get killed.
Since this happens because of how .NET currently work on unix systems, I believe we should provide a workaround to overcome this problem, at least when the app is using our drivers.

But again, I agree with @krwq that we should talk about this in the next triage, now that the we know the exact cause of the issue.

@krwq krwq added tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly Priority:2 Work that is important, but not critical for the release labels Mar 9, 2023
@ghost ghost removed the untriaged label Mar 9, 2023
@Ellerbach
Copy link
Member

[Triage] Regarding this thread, there is nothing really that can be done on the .NET IoT side. The only recommendation is to use try/catch along your application and making sure it won't crash.
So, closing this issue. If anyone has a genius idea on how to resolve this issue, feel free to add here and better propose a PR!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2023
@raffaeler
Copy link
Contributor

Few seconds ago, it was committed a change in .NET 9 repo that adopts a new exception handling mechanism (on by default) that solves this issue.
See: dotnet/runtime#79155 (comment)

Of course, none of us had the opportunity to test this in the context of this thread.
HTH

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working Priority:2 Work that is important, but not critical for the release tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

No branches or pull requests

6 participants