Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random segfault Nvidia Shield Tube (32-bit) during playback #368

Closed
cbovy opened this issue Jul 6, 2021 · 18 comments
Closed

Random segfault Nvidia Shield Tube (32-bit) during playback #368

cbovy opened this issue Jul 6, 2021 · 18 comments
Assignees

Comments

@cbovy
Copy link

cbovy commented Jul 6, 2021

  • Platform:
    Nvidia Shield
    Version 8.2.3

  • MythTV version:
    20210703-arm-v31.0 fixes/31
    Qt version 5.14.1

What steps will reproduce the bug?

It is hard to reproduce. It happens randomly while watching recording or watching LiveTV.

How often does it reproduce? Is there a required condition?

Not known at the moment.

What is the expected behaviour?

The expected behaviour is to continue showing the recording.

What do you see instead?

A full crash of the MythFrontend app on the Shield.

Additional information

The Myth Backend version is:
2:31.0+fixes.202106102123.0680b37c68~ubuntu20.10.1

Full backtrace at: https://pastebin.com/NRaYswn9

@bennettpeter
Copy link
Member

I can see is that it is failing in socket code. Are you doing anything unusual network-wise? Are you using a network remote to control playback? Is it possible there is a network failure somewhere?

This is in code that is not specific to Android, so it is possible that the same failure could occur in a Linux frontend.

It looks like you are using the 32-bit build. The original shield is able to run the 64-bit build. They should both work, but you could try the 64-bit build to see if that helps.

@cbovy
Copy link
Author

cbovy commented Jul 6, 2021

I'm running MythTv 32-bit because it is the 2019 'tube' variant, which is 32 bit I believe.

The Shield is connected via Ethernet through a switch to the backend (1 Gbps, full duplex)
No, no network remote control. Other apps (like NetFlix) are not having issues.

I'm only having Nvidia Shields as frontends. I could try a Frontend on my Ubuntu laptop and see if I can reproduce.

Is there any way to debug the socket communication on the frontend and backend side, and compare?
Any ideas to prevent a segfault? Happy to try something out. Build environment is ready to rebuild.

@ddp526
Copy link

ddp526 commented Sep 7, 2021

Hi, I know 'me too' posts are not welcome, but I did want to add that I have very similar issues, but with a Sony android TV (32-bit build only compatible build). I have random infrequent crashes, and it has been happening for a long time for me (year+, over every build I have tried), but I have only recently got motivated enough to see what might be going on. I think the pertinent log lines (via adb logcat), from 2 separate errors recently:

09-05 22:24:52.261 27431 27524 E mfe     : signalhandling.cpp:291:handleSignal  Received Segmentation fault: Code 1, PID 732264568, UID 0, Value 0x00000000
09-01 23:47:24.489  9585  9618 E mfe     : signalhandling.cpp:291:handleSignal  Received Segmentation fault: Code 1, PID 1072179320, UID 0, Value 0x00000000

Sorry, I don't have full traces, etc. and I can't be quite sure this seg fault is the same code path? Fwiw:
mythfrontend - mythfrontend-20210522-arm-v32-Pre-2763-g034eb86a3f (old I know, can try an update if you think it would help)
mythbackend - 2:32.0master.202109032033.8899ca5fd6ubuntu20.04.1
connected over ethernet, and no remote-control via the network.

the crashes are happening mid-playback, no 'interaction' via remote, or otherwise when it crashes. The logs on the frontend show a checkWifi state (it is disabled for me) as the last item before each crash, but I'm not convinced this is related as these are in my logs every 5 seconds, so I think it is just the last item that dropped a log line out.

HTH

@cbovy
Copy link
Author

cbovy commented Feb 22, 2022

This issue is still teasing me. I've done some further investigation. I've also added a recent backtrace.
I'm using the 32-bit version of MythFrontend, as the Nvidia Shield 2019 Tube is only supporting 32-bit applications.
The segfault happens with the version from @bennettpeter but also with my own compiled version. Backtrace attached is made with my own compiled version.

Shield running on: 9.0.0 but also on 9.0.1 (issue was also present before)
Version Shield: mythfrontend-20220220-arm-v32-Pre-3554-gb2a21798d6
Version Backend: 2:32.0+fixes.202202180054.0d9d21abaa~ubuntu21.10.1
The Shield is normally connected over Ethernet, on 1000Mbps link.

The segfault is only happening when the Shield is connected over the Ethernet link. When connected via Wireless, no segfault happened so far.
(The Ethernet connection (including same cables) have been tested with another frontend (on notebook, running 2:32.0+fixes.202202212348.dfc8d074d8~ubuntu21.10.1) and no errors occur. I think I can rule out the cabling.)

Some questions I have open:

  • Is there a 'buggy' ethernet driver and/or network stack in the Shield 2019 Tube version?
  • Any known issue in Qt (5.14.1?) in relation to 32-bit?
  • Is upgrading Qt to higher version a potential solution?
  • Is there a way to prevent the segfault in Mythfrontend?
  • Is there anything else I can try using gdb?
  • @ddp526 can you confirm the above behavior?

Full backtrace: gdb.txt

@ddp526
Copy link

ddp526 commented Feb 25, 2022

will try - may take some time, the only shield TV that can do wireless is used infrequently. But, it crashed last night, which prompted me to switch it to wireless. If it last a week or two like this, then it may well confirm your theory. BTW, I get these crashes on my Shield (Tube version), but also a Sony TV as well.

@ddp526
Copy link

ddp526 commented Mar 1, 2022

My crashes continued (have had 2 since the last post) on wireless, so seems there is either 2 issues, or its not the wired network code.

HTH

@cbovy
Copy link
Author

cbovy commented Mar 3, 2022

Thanks @ddp526
I can also confirm that I have crashed using wireless only, although it is less.
Then the 32-bit in combination with Qt can be the issue.
Any idea? I'm open for any suggestion to test.

Regards, Charles

@cbovy
Copy link
Author

cbovy commented Mar 4, 2022

I've recompiled with Qt 5.15.3 but same crashes are happening, unfortunately.

@cbovy
Copy link
Author

cbovy commented Mar 6, 2022

I've been doing test with Shield Tube (32-bit) and Shield Pro (64-bit). The Shield Pro is running perfectly fine, without any issues. The Shield Tube is having the crashes.
I'll try to compile with latest Sdk and NDK, and see if that makes a difference. Latest Qt (5.15.3) is not fixing the issue.

@bennettpeter
Copy link
Member

Look at your backtrace for something like this near the start:

Thread 21 "MythSocketThrea" received signal SIGSEGV, Segmentation fault.

Search the trace for the string "Thread 21". (substutute the actual number if it is not 21).

Thread 21 (Thread 25542.25682):
#0 0x91f9e374 in MythSocket::qt_static_metacall (_o=0xa5855850, _c=QMetaObject::InvokeMetaMethod, _id=12, _a=0xa6f37a98) at moc/moc_mythsocket.cpp:159
_t = 0xa5855850

Search your android build directories where you built that same version that crashed. Look for moc_mythsocket.cpp. It should be in android/build/mythtv/libs/libmythbase/moc (for 32bit)
See what line 159 of moc_mythsocket.cpp has (substitute the actual number from above)
On one build of mine it is case 12: _t->ReadReal and on another it is case 13: _t->ResetReal
This will tell us what function it is trying to call when it fails. We can then add some logging to try and determine why it fails at that point.

@cbovy cbovy changed the title Random segfault Nvidia Shield during playback Random segfault Nvidia Shield Tube (32-bit) during playback Mar 6, 2022
@cbovy
Copy link
Author

cbovy commented Mar 6, 2022

Thanks @bennettpeter for looking into this.
It is indeed _id=12, which is calling the following functions according to moc_mythsocket.cpp:

case 12: _t->ReadReal((*reinterpret_cast< char*(*)>(_a[1])),(*reinterpret_cast< int(*)>(_a[2])),(*reinterpret_cast< std::chrono::milliseconds(*)>(_a[3])),(*reinterpret_cast< int*(*)>(_a[4]))); break;

Happy to recompile and test again.

@bennettpeter
Copy link
Member

It is using Qt slots to call MythSocket::ReadReal, however it is failing in the caller before it can actually do the call. The parameters all seem to have values, but something must be corrupted. Possibly the MythSocket has been destroyed between QMetaObject::invokeMethod in MythSocket::Read and _t->ReadReal in MythSocket::qt_static_metacall.

I suggest turning on logging for sockets and network, to see if the socket is destroyed just before the crash, or to see if anything else useful can be found from the log.

To turn on logging:
Start mythfrontend on the Shield. Go into frontend settings and turn on network remote control (Setup > General > Remote Control > Enable Network Remote Control.).

Shutdown and restart mythfrontend

On Linux use telnet:
telnet 6546
set verbose socket,network
exit

After the crash, access the log using this from linux
adb logcat |& tee tmp/android.log

This will show the last few minutes of log. You need to get to it soon, because it only stores a few minutes worth of log. Alternatively you can start capturing the log before the crash and keep capturing it until after the crash, but that may slow things down or give you a lot of unnecessary data, especially of you cannot tell when the crash will happen.

@cbovy
Copy link
Author

cbovy commented Mar 7, 2022

I've run the session. Please find the details below and attached.

Thread 22 "MythSocketThrea" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 19916.20062]
0x91a3b374 in MythSocket::qt_static_metacall (_o=0x95ce5cf0, _c=QMetaObject::InvokeMetaMethod, _id=12, _a=0xa607da98) at moc/moc_mythsocket.cpp:159
159 case 12: _t->ReadReal((reinterpret_cast< char()>(_a[1])),(reinterpret_cast< int()>(_a[2])),(reinterpret_cast< std::chrono::milliseconds()>(_a[3])),(reinterpret_cast< int()>(_a[4]))); break;
(gdb) print _o
$1 = (QObject *) 0x95ce5cf0
(gdb) print _c
$2 = QMetaObject::InvokeMetaMethod
(gdb) print _id
$3 = 12
(gdb) print _a
$4 = (void **) 0xa607da98
(gdb) print _a[1]
$5 = (void *) 0x2607e0d4
(gdb) print _a[2]
$6 = (void *) 0xa607e0d0
(gdb) print _a[3]
$7 = (void *) 0xa607e0d8
(gdb) print _a[4]
$8 = (void *) 0xa607e0a8

android4.log

Please let me know if additional logs are required (I did a snippet of the logs, but more available).

@cbovy
Copy link
Author

cbovy commented Mar 7, 2022

Not sure if it helps, but _a[1] seems inaccessible during segfault.
During regular playback, I can do the same, and then_a[1] gives output.

Thread 65 "MythSocketThrea" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 30400.31606]
0x91a05374 in MythSocket::qt_static_metacall (_o=0x956eea50, _c=QMetaObject::InvokeMetaMethod, _id=12, _a=0xa5bbca98) at moc/moc_mythsocket.cpp:159
159 case 12: _t->ReadReal((reinterpret_cast< char()>(_a[1])),(reinterpret_cast< int()>(_a[2])),(reinterpret_cast< std::chrono::milliseconds()>(_a[3])),(reinterpret_cast< int()>(_a[4]))); break;
(gdb) x _a[1
0x25bbd0d4: Cannot access memory at address 0x25bbd0d4
(gdb) x _a[2]
0xa5bbd0d0: 0 '\000'
(gdb) x _a[3]
0xa5bbd0d8: 30 '\036'
(gdb) x _a[4]
0xa5bbd0a8: -52 '\314'
(gdb) x _a
0xa5bbca98: 0 '\000'
(gdb) p _a
$1 = (void **) 0xa5bbca98

@bennettpeter
Copy link
Member

The log you supplied seems to be from Live TV. It will be easier to debug if this happens on a recording playback. Does it happen on recordings, or only on live TV? If it happens on recordings please get a log from a failure while playing back a recording that is complete (i.e. one that is finished recording before you watch). Or does it only happen when playing a recording that is still in progress?

@bennettpeter bennettpeter self-assigned this Mar 8, 2022
@cbovy
Copy link
Author

cbovy commented Mar 8, 2022

Apologies, I'll make a backtrace and log from a finished recording, and update the ticket.
The issue occurs on both Recordings and Live TV, but won´t happen on Videos.

@cbovy
Copy link
Author

cbovy commented Mar 9, 2022

I rerun the crash, but now it crashes in on ReadStringListReal, but still in the Socket code. Log below as well.

Thread 23 "MythSocketThrea" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 23549.23944]
0x91a45294 in MythSocket::qt_static_metacall (_o=0xa7e11f50,
_c=QMetaObject::InvokeMetaMethod, _id=7, _a=0xacc12aa0)
at moc/moc_mythsocket.cpp:154
154 case 7: _t->ReadStringListReal((reinterpret_cast< QStringList()>(_a[1])),(reinterpret_cast< std::chrono::milliseconds()>(_a[2])),(reinterpret_cast< bool()>(_a[3]))); break;
(gdb) x _a[1]
0x2cc130c8: Cannot access memory at address 0x2cc130c8
(gdb) x _a[2]
0xacc130d8: 0x00001b58
(gdb) x _a[3]
0xacc130b4: 0xacc130d7

android4.log

@bennettpeter
Copy link
Member

Looking at your debug output, it looks like the first digit of _a[1] has been overwritten. It happens in both of the cases, ReadReal and ReadStringListReal. In one case _a[1] begins with 0x2cc1 and all the other addresses begin with 0xacc1. The others cases have the same problem. It seems the first digit of the address in _a[1] is being changed from 0xa to 0x2 in each case.

This is happening between the call to QMetaObject::invokeMethod in mythsocket.cpp and the call from Qt to MythSocket::qt_static_metacall in moc_mythsocket.cpp. It seems to be something going wrong in Qt slot processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants