Copter: spi_timeout on 4.0rc2 #12773

andyp1per · 2019-11-05T09:11:57Z

I have a user who is reporting the error:
“prearm: Internal Error 0x4000”
This is spi_fail which is triggered by an spi timeout. The code says this can never happen but I know that tridge just changed the timeout value. This user also had telemetry issues.

He was using 4.0rc2 with my FFT PR so it's possible that the PR is the culprit, but I thought that @tridge and @rmackay9 woudl want to know just in case it's something serious

Bug report

Issue details

Please describe the problem

Version
Copter 4.0rc2 + PR11886

Platform
[ ] All
[ ] AntennaTracker
[ X] Copter
[ ] Plane
[ ] Rover
[ ] Submarine

Airframe type
Quad

Hardware type
OmnibusNano v6

Logs
2019-11-04 16-57-37CRASH.log

tridge · 2019-11-05T10:47:02Z

I don't see that pre-arm error in that log?

1wick · 2019-11-05T13:40:30Z

Hello,
It was me.. I did not have prearm logging enabled.
I put 4.0rc2 on it after the trouble, and it seems ok.
I will put Andy's firmware back on it later when I have some time to see if I get the same results. If so, I'll be sure to enable prearm logging and send in a log.
I got a couple internal error 0x8000 and a bunch of 0x4000's. Telemetry was buggy.. Not all info sent. Downloading parameters took a long time.

Also, there was no "crash".. I thought I had lost RC control of the copter while it was conveniently drifting back towards my yard, and I killed the motors when it was over my yard. But, what was actually happening is I lost telemetry, and a GCS Failsafe was triggered. I didn't see that on my OSD, and had no idea what was going on, so basically manually crashed the copter to avoid what I though was a flyaway. But, turns out it was just trying to fly itself home....
However, I think the buggy telemetry is what triggered the GCS failsafe.. My RC/Crossfire logs show I had a great signal the entire time, and I wasn't far from myself, so I don't think it was a radio issue.

andyp1per · 2019-11-05T15:48:15Z

@tridge FFT was disabled so no thread there. Internal errors also include 0x8000!
2019-11-04 16-49-34CRASH.tlog.zip
2019-11-04 16-57-37CRASH.bin.zip

1wick · 2019-11-05T16:15:50Z

The 0x8000 error happened after I calibrated level. But then I rebooted, and got the 0x4000 errors.
Then I rebooted again, and was able to arm.
FFT was disabled.. I was going to test ESC based filter. (Which is working great in regular 4.0rc2)

FYI, if you look at the tlog, RSSI is odd because I use crossfire.. It has "mode2" which goes up to 200.. Then mode1 is 0-100, so that's why it's over 100 sometimes.
And the telemetry signal is never over about 20, which is also a crossfire thing.. That's basically full signal.

andyp1per · 2019-11-05T16:42:33Z

It would be useful to know if this is consistent with my changes in which case the onus is on me to find the culprit :)

1wick · 2019-11-05T17:04:08Z

I just had a couple flights, and got several GCS failsafes on both. This is on the official 4.0rc2.
I have been using crossfire for a couple years and have never gotten a GCS failsafe. And while it happens, my reception is fine. I don't know exactly what triggers the failsafe, but something changed.
No 0x#### errors.
I'll load your firmware later and see if I get them w/ it again.

andyp1per · 2019-11-05T17:16:31Z

rc2 includes the failsafe refactoring work. I have a feeling this made GCS failsafe more of a thing but haven't looked too closely

1wick · 2019-11-05T17:27:18Z

Oh, yeah. First thing he mentioned in the rc2 post.
OK, good to know.

andyp1per · 2019-11-05T18:42:40Z

I ran this on my KakuteF7Mini - absolutely no issues at all. FFT tracked the ESC's nicely, no errors. No internal errors, so I conclude that this is hardware related. Log attached.
19-11-05_18-35-38_2.bin.zip

andyp1per · 2019-11-05T19:37:40Z

@1wick The OmnibusNano V6 has a specific telemetery UART like the Kakute but on the Kakute we disable DMA because of BLHeli issues - I wonder if this is related, would you be ok trying a build that disables this on the Nano?

1wick · 2019-11-05T21:25:55Z

I want to put the 'problem' firmware back on it, to see if the issue is repeatable.
Then sure, I'd be happy to try the nodma version.
Busy day though.. Not sure I'll get to it for a day or two.

andyp1per · 2019-11-06T20:45:25Z

I tried this on an Omnibus F4 Pro - no issues at all.

Pedals2Paddles · 2019-11-06T23:59:28Z

I have nothing useful to add to this but wanted to say yay for the new GCS Failsafe working!

It will trigger when the GCS heartbeat is lost for > 5000ms. So likely a symptom of your telemetry issues.

andyp1per · 2019-11-10T17:23:35Z

@tridge it seems this is repeatable on vanilla 4.0rc2, see #12800

Turning off DMA makes no difference, so something else going on here.

tridge · 2019-11-11T21:07:59Z

thanks, I've ordered a nano v6 to test with
meanwhile, can someone please test with:
define HAL_SPI_CHECK_CLOCK_FREQ 1
in hwdef.dat?
I suspect we have a bad DMA channel that we need to exclude

tridge · 2019-11-11T21:08:23Z

with HAL_SPI_CHECK_CLOCK_FREQ set I'd like to see startup msgs on USB

andyp1per · 2019-11-11T22:31:42Z

../../libraries/AP_HAL_ChibiOS/SPIDevice.cpp: In static member function 'static void ChibiOS::SPIDevice::test_clock_freq()':
../../libraries/AP_HAL_ChibiOS/SPIDevice.cpp:429:61: error: format '%u' expects argument of type 'unsigned int', but argument 3 has type 'uint32_t {aka long unsigned int}' [-Werror=format=]
hal.console->printf("Waiting %u\n", AP_HAL::millis());

Oops!

1wick · 2019-11-11T22:58:40Z

with HAL_SPI_CHECK_CLOCK_FREQ set I'd like to see startup msgs on USB

Tridge, What do I use to get the startup messages? Will mavproxy get a log of what you need?

andyp1per · 2019-11-12T09:08:16Z

I've fixed the compilation error in the PR and removed the NODMA.

1wick · 2019-11-12T22:40:37Z

@tridge OK, Andy compiled a version w/ the requested changes.
Mission planner is the same. If I calibrate level, I get 0x8000 errors.

I wasn't sure how to get the USB startup messages you mentioned.. But I did connect the FC to mavproxy and did a simple accel calibration, and it acted odd.. Didn't print a 0x8000 error, but the FC kept resetting.
It's a new FC with nothing attached, so the throttle, compass, and battery warnings are expected.

STABILIZE> accelcalsimple
STABILIZE> Simple accel calAPM: Initialising APM
**
PASSED
Got MAVLink msg: COMMAND_ACK {command : 241, result : 0}
Calibrated
APM: EKF2 IMU0 forced reset
APM: EKF2 IMU0 tilt alignment complete
APM: PreArm: Throttle below Failsafe
APM: PreArm: Compass not healthy
APM: PreArm: Battery 1 low voltage failsafe
APM: PreArm: Throttle below Failsafe
APM: PreArm: EKF attitude is bad
APM: PreArm: Compass not healthy
APM: PreArm: EKF2 still initialising
APM: PreArm: Battery 1 low voltage failsafe
APM: EKF2 IMU0 forced reset
APM: EKF2 IMU0 tilt alignment complete
APM: EKF2 IMU0 forced reset
APM: EKF2 IMU0 tilt alignment complete
APM: PreArm: Throttle below Failsafe
APM: PreArm: DCM Roll/Pitch inconsistent by 26 deg
APM: PreArm: EKF attitude is bad
APM: PreArm: Compass not healthyge failsafe
APM: PreArm: EKF2 still initialising
APM: PreArm: Battery 1 low voltage failsafe
APM: EKF2 IMU0 forced reset
APM: EKF2 IMU0 tilt alignment complete
APM: EKF2 IMU0 forced reset
APM: EKF2 IMU0 tilt alignment complete
APM: EKF2 IMU0 forced reset
APM: PreArm: Throttle below Failsafe
APM: PreArm: DCM Roll/Pitch inconsistent by 25 deg
APM: PreArm: EKF attitude is bad
APM: PreArm: Compass not healthy
APM: PreArm: EKF2 still initialising
APM: PreArm: Battery 1 low voltage failsafe
APM: EKF2 IMU0 tilt alignment complete
APM: EKF2 IMU0 forced reset
APM: EKF2 IMU0 tilt alignment complete
APM: PreArm: Throttle below Failsafe
APM: PreArm: DCM Roll/Pitch inconsistent by 24 deg
APM: PreArm: Compass not healthy
APM: PreArm: Battery 1 low voltage failsafe
no link
link 1 down
link 1 OK
heartbeat OK
APM: EKF2 IMU0 forced reset
APM: EKF2 IMU0 tilt alignment complete

If anyone can give me any clues about how to see the other info you want, I'll get it.

1wick · 2019-11-13T19:51:07Z

I want to mention.. I had an issue today that I thought might have been just on Andy's firmware w/ the FFT, but it just did it with regular 4.0rc2.

I often plug my copter in as I'm charging its battery.. Then the FC warms up and it gets GPS lock, etc.
Well, w/ 4.0, after the copter is on for a while its telemetry link is bad..
If I plug it in and connect, it works fine.
If I plug it in for 10 minutes, then try connecting, the parameter download is very slow, I get repeated "Already got [parameter]" messages, and when it finally does connect, a lot of the telemetry info in MP doesn't show up.. This time, voltage and the HUD were working. But none of the gps or altitude or speed, etc, parameters were updating.
It's ok again after a reboot.

-I use crossfire for telemetry which does things a bit differently, so it might be related to crossfire. But, it worked fine in 3.6 and 3.7(from about September).

andyp1per · 2019-11-13T23:47:36Z

I see a similar thing with logging. If I leave it powered but sitting there for too long I quite often get bad logging errors that only go away when I reboot.

andyp1per · 2019-11-17T21:08:23Z

@tridge I was not able to get any debug output on an Omnibus F4 Pro with this setting.

I ran:

mavproxy.py --console --master /dev/ttyS14

and did not see the debug, but saw other hal.console->printf() messages. I tried adding a 5s delay, but still no luck. Any other way of getting the console output?

Huibean · 2019-12-02T01:36:17Z

I also getting this error in Copter 4.0

1wick · 2019-12-02T02:18:19Z

It's 100% repeatable for me..
Calibrate level, get the errors.

But, it's fine after a hard reboot. So, not affecting flight or anything. When I initially had the error, I had other issues too which were unrelated..

Huibean · 2019-12-02T08:59:40Z

this is USB output for SPI clock freq test

Waiting for USB
Waiting 1002
Waiting 2002
Waiting 3002
CLOCKS=
1:1080000002:540000003:540000004:1080000005:1080000006:108000000
SPI[1] clock=1685596
SPI[4] FAIL 0x20001110 0x20001518

andyp1per · 2019-12-02T11:56:01Z

SPI4 is not defined in the OmnibusNanov6 hwdef.dat - maybe that is the problem?

Huibean · 2019-12-02T12:02:17Z

@andyp1per suppose not, this is a loop with defined spi_devices, I have defined SPI4 in my case

andyp1per · 2019-12-02T12:09:49Z

@Huibean please can you try with the vanilla hwdef.dat? A customzied version will not really tell us anything!

Huibean · 2019-12-03T02:52:05Z

thanks @rmackay9 , I am trying to find out why the timeout happened

Huibean · 2019-12-03T04:52:11Z

@tridge @rmackay9 finally figure out why the error happen
I have print every osalSysLock and osalSysUnlock in i2c and SPI loop, when SPI timeout happens it shows SPI transform exec in i2c's locking process, so that's why it takes so long to finish and make timeout

i2c osalSysLock
i2c osalSysUnlock
�3	�Barometer 1 calibration complete��i2c osalSysLock
i2c osalSysUnlock
SPI osalSysLock
i2c osalSysLock
i2c osalSysUnlock
i2c osalSysLock
i2c osalSysUnlock
i2c osalSysLock
i2c osalSysUnlock
i2c osalSysLock
i2c osalSysUnlock
i2c osalSysLock
i2c osalSysUnlock
SPI osalSysUnlock
SPI timeout!

andyp1per · 2019-12-03T10:08:37Z

@Huibean on the devcall last night we discussed this - it's clear its very specific to the STM32 chip version, hwdef.dat and hardware - it's very easy to mess this up. For the OmnibusNanov6 we think these are all defined correctly and it might be an error in the STM32 datasheet. For your case it's not so clear. Please can you post all of these?

andyp1per · 2019-12-03T21:46:37Z

@1wick we really need the debug log to find out which SPI device is causing the issue. Are you running on windows/linux/macos?

1wick · 2019-12-03T22:05:25Z

Windows and Linux. I usually use Windows w/ my flight stuff because that's what's on my laptop.. But my desktop is linux.
What do I need to do?
(And like I said before, since this issue only occurs for me during a level calibration, it's not really a big deal.. But if you wan to fix it, I'll definitely help get you some data)

andyp1per · 2019-12-03T22:10:47Z

@1wick so @Huibean seems to think if you do it on Linux you can cat the serial port at startup and see the debug. I am able to see really early debug through a combination of mavproxy and trying to connect just as the board has booted, but cat might be better because it will probably just block waiting for output. cat /dev/ttyusb or something (not sure what the usb port would be called on linux).

1wick · 2019-12-03T22:26:43Z

Well, getting closer..
Found the USB port, cat. I get output, but not sure it's anything useful.

-This is on my spare FC w/ nothing attached.

Waiting for USB
Waiting 1003
Waiting 2004
Waiting 3005
CLOCKS=
1:84000000 2:42000000 3:42000000 4:84000000 5:84000000 6:84000000
SPI[1] clock=1311139
SPI[2] clock=1310929
SPI[3] clock=1311139
Storage: Using flash pages 1 and 2

Init ArduCopter V4.0.0-dev (5f202a47)

Free RAM: 109536
FW Ver: 120

load_all took 16986us
0 0 0 AP_OSD: osd font is up to date.
Started MAX7456 OSD
�Initialising APM
�SPI Flash 0x00EF4018 found pages=65536 erase=64k
AP_Logger_Block: buffer size=16384
�3��AK09916: Reset Failed
No Compass backends available
Compass initialisation failed
�
MPU: temp reset IMU[0] -1012 -66
*�3�Initialising APM�s�oi�T]r

�u��3��f�3�EKF2 IMU0 tilt alignment complete
�� o):��3/��3:��3E�'p�No��e�
(�3O�EKF2 IMU0 forced reset
$[�3`��4�do{�
��3k�,��3v�"
��3��BH�3��

andyp1per · 2019-12-04T08:52:04Z

@1wick Ok so that's weird - there is no SPItimeout, where is the internal error? I don't see it. It's important to get the logging for a situation where the error occurs.

1wick · 2019-12-04T13:57:57Z

The error only occurs after a level calibration, and is fine after a reboot..
So, I don't think there will be a timeout during booting?

And I already uploaded the mavpoxy logs and MP screen shot of the error after doing a calibration, then the FC acting strange afterwords.

After a calibration, then a hot reboot (telling the FC to restart from w/in MP) I do get more errors.. Maybe I can try to do that from MavProxy on Linux while watching the USB port..

1wick · 2019-12-04T14:24:46Z

Nope, not possible.. As soon as I connect w/ APM Planner or MavProxy, the CAT session ends.

The error goes..
Calibrate level, get 0x8000. Hot reboot, I get 0x4000. Cold reboot, it's fine.

1wick · 2019-12-04T16:54:56Z

I've been trying to catch it, but can't.. Now, more often than not, the FC totally locks up after the hot reboot, and I can't connect to see the output.
The newly updated MP displays "Main Thread Stuck" after doing the calibration.. which is I guess what 0x8000 means. Then I tell it to Preflight_reboot_shutdown, and it locks up.

1wick · 2019-12-05T01:49:12Z

Andy,
FYI, I figured out the console in APM Planner gets the startup messages.. If you open the console, then plug in the board and immediately connect in the console, it displays the "waiting for USB" and subsequent messages.

andyp1per · 2019-12-05T12:14:31Z

The error goes..
Calibrate level, get 0x8000. Hot reboot, I get 0x4000. Cold reboot, it's fine.

@tridge I'm not sure if you are following this but the omnibus nano issue appears more involved - there is no SPI error at startup, only after a level calibration

1wick · 2019-12-05T13:41:28Z

I was on an old version of 3.7 before upgrading to 4.0, and 3.7 from last year did not have this issue.
I never really ran 3.6.. But, I could put the official 3.6 on it to see if it has the same issue. And I can install some other previous firmwares, if they are on the firmware site. I'll do some more testing later.
I have pretty frequent telemetry issues, and occasional bootup lockups w/ 4.0, too, which I don't recall having w/ 3.7. I did catch a boot lockup yesterday while I was monitoring the USB port, and the last line it printed was about preparing the logging system.

-and like I said, don't worry about it just for me, but I'm happy to help if it's something you want to figure out. I don't think many people are using this board, and I'm about to build a new copter w/ the nano kakute f7

andyp1per · 2019-12-05T14:56:51Z

If you can build now then git bisect on the problem would probably be really useful!

giacomo892 · 2020-06-15T08:15:03Z

I had an occurrence of this on 14e01e2 running on a F405-STD board.
When I landed I wasn't able to rearm and looks like it is connected with a failsafe here too.

14/06/2020 09:07:26 : PreArm: Internal errors 0x4000 l:211 ,spi_fail
14/06/2020 09:07:26 : PreArm: Radio failsafe on
14/06/2020 09:07:26 : Radio Failsafe - Disarming
14/06/2020 09:05:50 : Mission: 6 WP
14/06/2020 09:05:50 : Mission: 5 ChangeSpeed
14/06/2020 09:05:50 : Reached command #4
14/06/2020 09:05:44 : Mission: 4 WP
14/06/2020 09:05:44 : Reached command #3

I'm "pretty" sure I didn't turn my radio off as wanted to take off again.

Already send the log to @tridge

shellixyz · 2020-08-12T10:22:03Z

I just updated one of my quads to master and got this issue the first time I tried to fly with it. Only had it once. I just power cycled the board and it was gone. I had never got this issue with the previous revision I was using. The board is a Matek F405-CTR.

Old revision: cefd991
New revision: e137698

peterbarker · 2020-10-05T00:29:38Z

Does anybody have a repeatable test case? @shellixyz has given us endpoints for good-and-bad, but a repeatable setup is required for a bisect...

@1wick are you in a position to bisect given the known-good/known-bad commits? Sounds like you've got a reproducible case.

vierfuffzig · 2020-10-05T05:45:05Z

i‘ve seen the int spi error just recently on a friend‘s omnibusnanov6 running plane master from ~3 weeks ago. it just happened once after boot, didn‘t reoccur for multiple flight hours through the day. minimal setup, only GPS and RC. i myself run two omnibus nanos on current plane master too and haven‘t seen it happen yet.

shellixyz · 2020-10-05T15:04:16Z

Seems sporadic, I can't reproduce the issue reliably

Huibean · 2020-10-06T08:00:10Z

this happen frequently after I active the optical flow pixart driver

Huibean · 2020-10-08T03:34:04Z

in my case, expect_delay_start value is always 0 and trigger the spi timeout

ardupilot/libraries/AP_HAL_ChibiOS/Scheduler.cpp

Line 341 in e471b23

if (expect_delay_start != 0) {

giacomo892 · 2020-11-12T13:54:48Z

Just had it on latest master on Plane. I enabled TERRAIN for the very first time ever, and got it on the first time I tried to arm the plane. Rebooted and all good.
Again not able to reproduce.

IamPete1 · 2022-08-05T22:19:11Z

Any updates to this issue? There don't seem to be any new reports, can we close?

shellixyz · 2022-08-06T02:15:13Z

It can probably be closed. I haven't got the issue since my comment

andyp1per added the BUG label Nov 5, 2019

giacomo892 mentioned this issue Sep 28, 2020

Copter: Internal errors 0x4000 I:211 spi_fail #15452

Open

peterbarker added the InternalError label Nov 17, 2020

IamPete1 added the ClosePending label Aug 5, 2022

IamPete1 closed this as completed Aug 6, 2022

IamPete1 removed the ClosePending label Aug 6, 2022

Copter: spi_timeout on 4.0rc2 #12773

Copter: spi_timeout on 4.0rc2 #12773

Comments

andyp1per commented Nov 5, 2019 • edited Loading

Bug report

tridge commented Nov 5, 2019

1wick commented Nov 5, 2019 • edited Loading

andyp1per commented Nov 5, 2019

1wick commented Nov 5, 2019

andyp1per commented Nov 5, 2019

1wick commented Nov 5, 2019

andyp1per commented Nov 5, 2019

1wick commented Nov 5, 2019

andyp1per commented Nov 5, 2019

andyp1per commented Nov 5, 2019

1wick commented Nov 5, 2019

andyp1per commented Nov 6, 2019

Pedals2Paddles commented Nov 6, 2019

andyp1per commented Nov 10, 2019 • edited Loading

tridge commented Nov 11, 2019

tridge commented Nov 11, 2019

andyp1per commented Nov 11, 2019

1wick commented Nov 11, 2019 • edited Loading

andyp1per commented Nov 12, 2019

1wick commented Nov 12, 2019

1wick commented Nov 13, 2019 • edited Loading

andyp1per commented Nov 13, 2019

andyp1per commented Nov 17, 2019

Huibean commented Dec 2, 2019

1wick commented Dec 2, 2019 • edited Loading

Huibean commented Dec 2, 2019

andyp1per commented Dec 2, 2019

Huibean commented Dec 2, 2019

andyp1per commented Dec 2, 2019

Huibean commented Dec 3, 2019

Huibean commented Dec 3, 2019

andyp1per commented Dec 3, 2019

andyp1per commented Dec 3, 2019

1wick commented Dec 3, 2019

andyp1per commented Dec 3, 2019

1wick commented Dec 3, 2019 • edited Loading

Free RAM: 109536 FW Ver: 120

andyp1per commented Dec 4, 2019

1wick commented Dec 4, 2019 • edited Loading

1wick commented Dec 4, 2019 • edited Loading

1wick commented Dec 4, 2019

1wick commented Dec 5, 2019

andyp1per commented Dec 5, 2019

1wick commented Dec 5, 2019 • edited Loading

andyp1per commented Dec 5, 2019

giacomo892 commented Jun 15, 2020

shellixyz commented Aug 12, 2020 • edited Loading

peterbarker commented Oct 5, 2020

vierfuffzig commented Oct 5, 2020 • edited Loading

shellixyz commented Oct 5, 2020

Huibean commented Oct 6, 2020 • edited Loading

Huibean commented Oct 8, 2020

giacomo892 commented Nov 12, 2020

IamPete1 commented Aug 5, 2022

shellixyz commented Aug 6, 2022

andyp1per commented Nov 5, 2019 •

edited

Loading

1wick commented Nov 5, 2019 •

edited

Loading

andyp1per commented Nov 10, 2019 •

edited

Loading

1wick commented Nov 11, 2019 •

edited

Loading

1wick commented Nov 13, 2019 •

edited

Loading

1wick commented Dec 2, 2019 •

edited

Loading

1wick commented Dec 3, 2019 •

edited

Loading

Free RAM: 109536
FW Ver: 120

1wick commented Dec 4, 2019 •

edited

Loading

1wick commented Dec 4, 2019 •

edited

Loading

1wick commented Dec 5, 2019 •

edited

Loading

shellixyz commented Aug 12, 2020 •

edited

Loading

vierfuffzig commented Oct 5, 2020 •

edited

Loading

Huibean commented Oct 6, 2020 •

edited

Loading