-
Notifications
You must be signed in to change notification settings - Fork 16.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copter: spi_timeout on 4.0rc2 #12773
Comments
I don't see that pre-arm error in that log? |
Hello, Also, there was no "crash".. I thought I had lost RC control of the copter while it was conveniently drifting back towards my yard, and I killed the motors when it was over my yard. But, what was actually happening is I lost telemetry, and a GCS Failsafe was triggered. I didn't see that on my OSD, and had no idea what was going on, so basically manually crashed the copter to avoid what I though was a flyaway. But, turns out it was just trying to fly itself home.... |
@tridge FFT was disabled so no thread there. Internal errors also include 0x8000! |
The 0x8000 error happened after I calibrated level. But then I rebooted, and got the 0x4000 errors. FYI, if you look at the tlog, RSSI is odd because I use crossfire.. It has "mode2" which goes up to 200.. Then mode1 is 0-100, so that's why it's over 100 sometimes. |
It would be useful to know if this is consistent with my changes in which case the onus is on me to find the culprit :) |
I just had a couple flights, and got several GCS failsafes on both. This is on the official 4.0rc2. |
rc2 includes the failsafe refactoring work. I have a feeling this made GCS failsafe more of a thing but haven't looked too closely |
Oh, yeah. First thing he mentioned in the rc2 post. |
I ran this on my KakuteF7Mini - absolutely no issues at all. FFT tracked the ESC's nicely, no errors. No internal errors, so I conclude that this is hardware related. Log attached. |
@1wick The OmnibusNano V6 has a specific telemetery UART like the Kakute but on the Kakute we disable DMA because of BLHeli issues - I wonder if this is related, would you be ok trying a build that disables this on the Nano? |
I want to put the 'problem' firmware back on it, to see if the issue is repeatable. |
I tried this on an Omnibus F4 Pro - no issues at all. |
I have nothing useful to add to this but wanted to say yay for the new GCS Failsafe working! It will trigger when the GCS heartbeat is lost for > 5000ms. So likely a symptom of your telemetry issues. |
thanks, I've ordered a nano v6 to test with |
with HAL_SPI_CHECK_CLOCK_FREQ set I'd like to see startup msgs on USB |
../../libraries/AP_HAL_ChibiOS/SPIDevice.cpp: In static member function 'static void ChibiOS::SPIDevice::test_clock_freq()': Oops! |
Tridge, What do I use to get the startup messages? Will mavproxy get a log of what you need? |
I've fixed the compilation error in the PR and removed the NODMA. |
@tridge OK, Andy compiled a version w/ the requested changes. I wasn't sure how to get the USB startup messages you mentioned.. But I did connect the FC to mavproxy and did a simple accel calibration, and it acted odd.. Didn't print a 0x8000 error, but the FC kept resetting. STABILIZE> accelcalsimple If anyone can give me any clues about how to see the other info you want, I'll get it. |
I want to mention.. I had an issue today that I thought might have been just on Andy's firmware w/ the FFT, but it just did it with regular 4.0rc2. I often plug my copter in as I'm charging its battery.. Then the FC warms up and it gets GPS lock, etc. -I use crossfire for telemetry which does things a bit differently, so it might be related to crossfire. But, it worked fine in 3.6 and 3.7(from about September). |
I see a similar thing with logging. If I leave it powered but sitting there for too long I quite often get bad logging errors that only go away when I reboot. |
@tridge I was not able to get any debug output on an Omnibus F4 Pro with this setting. I ran: mavproxy.py --console --master /dev/ttyS14 and did not see the debug, but saw other hal.console->printf() messages. I tried adding a 5s delay, but still no luck. Any other way of getting the console output? |
I also getting this error in Copter 4.0 |
It's 100% repeatable for me.. But, it's fine after a hard reboot. So, not affecting flight or anything. When I initially had the error, I had other issues too which were unrelated.. |
this is USB output for SPI clock freq test Waiting for USB
Waiting 1002
Waiting 2002
Waiting 3002
CLOCKS=
1:1080000002:540000003:540000004:1080000005:1080000006:108000000
SPI[1] clock=1685596
SPI[4] FAIL 0x20001110 0x20001518 |
SPI4 is not defined in the OmnibusNanov6 hwdef.dat - maybe that is the problem? |
@andyp1per suppose not, this is a loop with defined spi_devices, I have defined SPI4 in my case |
@Huibean please can you try with the vanilla hwdef.dat? A customzied version will not really tell us anything! |
thanks @rmackay9 , I am trying to find out why the timeout happened |
@tridge @rmackay9 finally figure out why the error happen i2c osalSysLock
i2c osalSysUnlock
�3 �Barometer 1 calibration complete��i2c osalSysLock
i2c osalSysUnlock
SPI osalSysLock
i2c osalSysLock
i2c osalSysUnlock
i2c osalSysLock
i2c osalSysUnlock
i2c osalSysLock
i2c osalSysUnlock
i2c osalSysLock
i2c osalSysUnlock
i2c osalSysLock
i2c osalSysUnlock
SPI osalSysUnlock
SPI timeout! |
@Huibean on the devcall last night we discussed this - it's clear its very specific to the STM32 chip version, hwdef.dat and hardware - it's very easy to mess this up. For the OmnibusNanov6 we think these are all defined correctly and it might be an error in the STM32 datasheet. For your case it's not so clear. Please can you post all of these? |
@1wick we really need the debug log to find out which SPI device is causing the issue. Are you running on windows/linux/macos? |
Windows and Linux. I usually use Windows w/ my flight stuff because that's what's on my laptop.. But my desktop is linux. |
@1wick so @Huibean seems to think if you do it on Linux you can cat the serial port at startup and see the debug. I am able to see really early debug through a combination of mavproxy and trying to connect just as the board has booted, but cat might be better because it will probably just block waiting for output. cat /dev/ttyusb or something (not sure what the usb port would be called on linux). |
Well, getting closer.. -This is on my spare FC w/ nothing attached. Waiting for USB Init ArduCopter V4.0.0-dev (5f202a47) Free RAM: 109536
|
@1wick Ok so that's weird - there is no SPItimeout, where is the internal error? I don't see it. It's important to get the logging for a situation where the error occurs. |
The error only occurs after a level calibration, and is fine after a reboot.. And I already uploaded the mavpoxy logs and MP screen shot of the error after doing a calibration, then the FC acting strange afterwords. After a calibration, then a hot reboot (telling the FC to restart from w/in MP) I do get more errors.. Maybe I can try to do that from MavProxy on Linux while watching the USB port.. |
Nope, not possible.. As soon as I connect w/ APM Planner or MavProxy, the CAT session ends. The error goes.. |
I've been trying to catch it, but can't.. Now, more often than not, the FC totally locks up after the hot reboot, and I can't connect to see the output. |
Andy, |
@tridge I'm not sure if you are following this but the omnibus nano issue appears more involved - there is no SPI error at startup, only after a level calibration |
I was on an old version of 3.7 before upgrading to 4.0, and 3.7 from last year did not have this issue. -and like I said, don't worry about it just for me, but I'm happy to help if it's something you want to figure out. I don't think many people are using this board, and I'm about to build a new copter w/ the nano kakute f7 |
If you can build now then git bisect on the problem would probably be really useful! |
I had an occurrence of this on 14e01e2 running on a F405-STD board. 14/06/2020 09:07:26 : PreArm: Internal errors 0x4000 l:211 ,spi_fail I'm "pretty" sure I didn't turn my radio off as wanted to take off again. Already send the log to @tridge |
I just updated one of my quads to master and got this issue the first time I tried to fly with it. Only had it once. I just power cycled the board and it was gone. I had never got this issue with the previous revision I was using. The board is a Matek F405-CTR. |
Does anybody have a repeatable test case? @shellixyz has given us endpoints for good-and-bad, but a repeatable setup is required for a bisect... @1wick are you in a position to bisect given the known-good/known-bad commits? Sounds like you've got a reproducible case. |
i‘ve seen the int spi error just recently on a friend‘s omnibusnanov6 running plane master from ~3 weeks ago. it just happened once after boot, didn‘t reoccur for multiple flight hours through the day. minimal setup, only GPS and RC. i myself run two omnibus nanos on current plane master too and haven‘t seen it happen yet. |
Seems sporadic, I can't reproduce the issue reliably |
in my case, expect_delay_start value is always 0 and trigger the spi timeout
|
Just had it on latest master on Plane. I enabled TERRAIN for the very first time ever, and got it on the first time I tried to arm the plane. Rebooted and all good. |
Any updates to this issue? There don't seem to be any new reports, can we close? |
It can probably be closed. I haven't got the issue since my comment |
I have a user who is reporting the error:
“prearm: Internal Error 0x4000”
This is spi_fail which is triggered by an spi timeout. The code says this can never happen but I know that tridge just changed the timeout value. This user also had telemetry issues.
He was using 4.0rc2 with my FFT PR so it's possible that the PR is the culprit, but I thought that @tridge and @rmackay9 woudl want to know just in case it's something serious
Bug report
Issue details
Please describe the problem
Version
Copter 4.0rc2 + PR11886
Platform
[ ] All
[ ] AntennaTracker
[ X] Copter
[ ] Plane
[ ] Rover
[ ] Submarine
Airframe type
Quad
Hardware type
OmnibusNano v6
Logs
2019-11-04 16-57-37CRASH.log
The text was updated successfully, but these errors were encountered: