Fix rare segfault on wait_for_update #84

LinusCDE · 2021-12-21T03:55:22Z

Found this, when using this implementation for plato:

The cause was that sem_timedwait was called on sem_open's result SEM_FAILED. That call failed with errno == 22 ("Invalid value"). Seems that the rust code called the code too fast and and probably made more likely by more frequent wait()'s in plato due to the ui system. Plato uses a bus system for almost everything, so afaik, calls should never be happen in parallel.

The implemenation was 1:1 like the reference implementation. While this didn't ever happen on the official rm2fb client, the cause might still be interesting to investigate and could possibly hint to a problem in the original impl. (cc @raisjn)

Added a way to do several attempts with delays which should provide a good enough workaround. Also turned later occuring segfault into a proper panic, should 3 attempts/10ms delay be not enough for some app.

raisjn · 2021-12-21T04:14:18Z

interesting find! it's a bit surprising that it happens here. rm2fb should have NULL ptr handling added to make sure it doesn't also segfault, i wonder what's happening. EINVAL for sem_open() indicates something weird is going on.

fenollp · 2021-12-21T06:42:46Z

LGTM!
Just maybe use a u8 instead of usize for the counter...

raisjn · 2021-12-23T04:45:21Z

i'm still curious about the cause here - can you add print statements to help diagnose / see what the requested shm path is? if it isn't happening in rm2fb (we would see segfaults in plato if it was, right?), it is likely better to figure out and fix the underlying cause.

EINVAL is given back when the name is invalid in shm_open()

LinusCDE · 2021-12-23T13:42:49Z

Reverted the attempt stuff and added a lot of print debugging (commit LinusCDE@ae72c35).

Built current plato (master branch) against this lib: dist.tar.gz. You can just extract this on device and run the .plato.sh.

If you wish to see the exact sysv messages sent, you can add this spy lib between: libswtfb_sysv_spy.so.xz (Source)
Is also a libchook, you can add with LD_PRELOAD=/home/root/libsysv_spy.so. You can also add both this and the rm2fb client like this: LD_PRELOAD="/home/root/libsysv_spy.so /opt/lib/librm2fb_client.so.1" (used this originally for fixing bugs when implementing the client).

This really seems to be something that is really timing sensitive. Because running the binary with print statements works perfectly fine. You can see for yourself by replacing the "plato" binary with this one. The only change is that I just commented out every new println! statement.

It is also probably noteworthy to mention, that println! on rust by default locks a mutex. So it will probably cause a slightly longer delay than std::cout or printf because of the extra synchronization.

The O_CREAT option needs the mode parameter specified. The server code did do this, but not the client side, resulting in an error only if the client actually got to creating the semaphore first.

LinusCDE · 2021-12-23T17:31:30Z

The new commit should fix the issue for good. Seems that the O_CREAT needs the mode for creating a shared memory file if it doesn't exist. If the code waited a bit longer after the sysv message, the update would be complete and the server side would have created the semaphore correctly. The client didn't specify these arguments and if it actually needed to create it (because executed first), the code would error, possibly on missing/wrong arguments for creating the file.

Not sure why the c++ client/wrapper didn't cause this. My only guess would be that the wrapper uses Entware's libc (currently 2.27) which defaults to some correct values if they are not specified. Since we usually use a generic compiler for rust on arm, the libc crate is most likely the system one (currently 2.31) which might just behave differently when the arguments.

Specifying 0o644 as on the server side fixes the issue. @raisjn this might also be a nice upstream change. Why not mandatory this is probably better to make it explicit to prevent problems when compiling with/for different glibc/pthread versions. Also, do you know the use of the 4th argument (0). I doesn't seem to affect anything so I left it out for now.

Edit: Mixed up which libc is which version. This seems to be more like a, we removed this default behaviour for clarification. So updating it for the c++ client is definitely recommended as it might be a case of future proofing, should entware's libc update.

LinusCDE · 2021-12-23T17:55:18Z

Had some weird behaviour. Not sure if really needed, added the 0 argument just to be sure.

See canselcik/libremarkable#84

Fix rare segfault on wait_for_update

6f7dd14

Apply fenollp's suggestion to use u8 instead of usize

7ef3a35

fenollp approved these changes Dec 22, 2021

View reviewed changes

Actually fix the reason for the segfault

46267fc

The O_CREAT option needs the mode parameter specified. The server code did do this, but not the client side, resulting in an error only if the client actually got to creating the semaphore first.

LinusCDE added 2 commits December 23, 2021 18:53

Small syntax fix

eff63d5

Add "0" argument to sem_open

bf4c236

Re-add custom panic

b84f5c3

LinusCDE added a commit to LinusCDE/remarkable2-framebuffer that referenced this pull request Dec 23, 2021

Update sem_open in client

c86e35f

See canselcik/libremarkable#84

LinusCDE mentioned this pull request Dec 23, 2021

Update sem_open in client ddvk/remarkable2-framebuffer#85

Merged

raisjn pushed a commit to ddvk/remarkable2-framebuffer that referenced this pull request Dec 23, 2021

[client] Update sem_open in client to supply mode 0644 (#85)

8366f70

See canselcik/libremarkable#84

fenollp merged commit 22cea77 into canselcik:master Jan 6, 2022

LinusCDE added this to the 0.6.0 milestone Jan 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rare segfault on wait_for_update #84

Fix rare segfault on wait_for_update #84

LinusCDE commented Dec 21, 2021

raisjn commented Dec 21, 2021

fenollp commented Dec 21, 2021

raisjn commented Dec 23, 2021

LinusCDE commented Dec 23, 2021

LinusCDE commented Dec 23, 2021 •

edited

LinusCDE commented Dec 23, 2021

Fix rare segfault on wait_for_update #84

Fix rare segfault on wait_for_update #84

Conversation

LinusCDE commented Dec 21, 2021

raisjn commented Dec 21, 2021

fenollp commented Dec 21, 2021

raisjn commented Dec 23, 2021

LinusCDE commented Dec 23, 2021

LinusCDE commented Dec 23, 2021 • edited

LinusCDE commented Dec 23, 2021

LinusCDE commented Dec 23, 2021 •

edited