Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AP_Logger: make block logger conform to logging expectations #14299

Merged
merged 2 commits into from
Sep 5, 2020

Conversation

andyp1per
Copy link
Collaborator

@andyp1per andyp1per commented May 5, 2020

This PR fixes many, many bugs in the block logging code

  • make block logger conform to mavlink expectations of numbering
  • add support for log time to block logger
  • support disarm rotation
  • refactor rotation into backed. Don't start logs when erasing.
  • correct log start logic
  • separate read and write points so that requesting log information does not corrupt the current log
  • when starting a new log stop logging first
  • clear the write buffer when starting a new log
  • insert utc time when requesting info for the current log
  • stop logging and request formats again when starting a new log
  • cope with erase happening while we are logging
  • keep pushing out startup messages even when format messages are done.
  • don't log to the gcs in the io thread
  • don't start new logs in the io thread
  • don't validate logs while erasing
  • flush logs when stopping logging
  • calculate size correctly by accounting for page headers
  • calculate size correctly when the log is wrapped
  • don't return data when asked for more data than in the log
  • optimize locking and use separate semaphore to mediate ring buffer access
  • stop logging when the current log has filled up the chip - otherwise the log file is useless
  • added stats support
  • reset dropped count when a new log is created

The change requires a logging format change for the block logger so will automatically erase existing logs. We should put this in the release notes FYI @rmackay9 @tridge

@peterbarker
Copy link
Contributor

@andyp1per honestly, I have no idea why there was a mapping in the first place - and if there's no reason to have that mapping I'd be quite happy to see it go away. Your patches here show it's less code (i.e. fewer bugs) without the mapping.

@peterbarker
Copy link
Contributor

-rw-r--r-- 1 pbarker pbarker 819200 May  6 09:31 00000002.BIN
-rw-r--r-- 1 pbarker pbarker 417792 May  6 09:31 00000004.BIN
-rw-r--r-- 1 pbarker pbarker      3 May  6 09:31 LASTLOG.TXT

Old:
Log 1 numLogs 1 lastLog 1 size 819200 Wed May 6 09:31:20 2020
New:
Log 4 numLogs 1 lastLog 4 size 417792 Wed May 6 09:31:49 2020

... so we're definitely doing gaps differently. Not worse, just different.

Copy link
Contributor

@peterbarker peterbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks QGC.

QGC appears to be making the assumption that logs are indexed from zero. I'm not saying QGC is correct in this assumption, but in MAVProxy I'm seeing:

Log 1  numLogs 1 lastLog 1 size 0 
Log 1  numLogs 1 lastLog 1 size 0 
Log 1  numLogs 1 lastLog 1 size 0 
Log 1  numLogs 1 lastLog 1 size 0 
Log 1  numLogs 1 lastLog 1 size 0 
Log 1  numLogs 1 lastLog 1 size 0 

And a log list in MAVProxy gives me this:

MANUAL> Requesting log list
Log 4  numLogs 1 lastLog 4 size 417792 Wed May  6 09:31:49 2020

for:

-rw-r--r-- 1 pbarker pbarker 819200 May  6 09:31 00000002.BIN
-rw-r--r-- 1 pbarker pbarker 417792 May  6 09:31 00000004.BIN
-rw-r--r-- 1 pbarker pbarker      3 May  6 09:31 LASTLOG.TXT
pbarker@bluebottle:~/rc/ardupilot((HEAD detached at andyp1per/pr-flash-fix3))$ 

This shows two things:

  • we need to have that mapping or we'll break QGC (I haven't tested MissionPlanner).
  • we have a bug where requesting a range of logs which is out-of-range returns a non-zero number of logs

@DonLakeFlyer for visibility

@andyp1per
Copy link
Collaborator Author

andyp1per commented May 6, 2020

Dang. Currently this means that QGC will not work with FRAM-based boards because they almost always return logs indexed from a higher number. And they are always contiguous.

@DonLakeFlyer
Copy link
Contributor

Hmm, I"m not super familiar with the log download code. I don't quite get what the problem is. According to mavlink spec LOG_REQUEST_LIST.start it says first log id is 0. Is that where the problem is. If so, then how to you request the list of logs if you don't know the first id?

@andyp1per
Copy link
Collaborator Author

@DonLakeFlyer ah interesting! It also says that the last available should be tagged as 0xFFFF so I'm interested as to how you handle these since they seem like synthetic ids. Do you assume ids are contiguous?

@DonLakeFlyer
Copy link
Contributor

DonLakeFlyer commented May 6, 2020

Do you assume ids are contiguous?

Let me see if I can get Gus to help out here. I don't really know that code. Also if all of this is also caused by the mavlink spec notbeing clear on thing like (contiguous or not) that should get cleaned up as well.

@dogmaphobic
Copy link

It's been a looooong time since I last looked into that code. Yes, from what I remember, the assumption is that "0" is the first log file and it goes from there with an increment of 1 (i.e. first log file (0), second log file (1), third (2), etc.) From that perspective, it doesn't make sense to have gaps. If you have 4 log files, they will be 0, 1, 2 and 3.

@andyp1per andyp1per changed the title AP_Logger: the log number is the log number AP_Logger: make block logger conform to mavlink expectations May 6, 2020
@andyp1per
Copy link
Collaborator Author

@peterbarker I knew doing it this way would be harder 😄 - anyway have fixed and tested the block logger to do the same as the file logger. Works on my KakuteF7 Mini. Any other bugs around numbering are not mine as the file logger is now as it was before. I'm not sure how easy it will be to cope with gaps anyway - the block logger does not have this problem, the numbers are always contiguous.

@andyp1per andyp1per requested a review from peterbarker May 6, 2020 21:05
@peterbarker
Copy link
Contributor

peterbarker commented May 7, 2020 via email

@peterbarker
Copy link
Contributor

peterbarker commented May 7, 2020 via email

@DonLakeFlyer
Copy link
Contributor

Unless you have a sequential ID system how do you implement an item based protocol over a lossy link in a sane manner?

@peterbarker
Copy link
Contributor

@DonLakeFlyer I think the LOG_ENTRY message contains sufficient information to accomplish it - each message contains the number of logs present on the system, so you can just count up the number of LOG_ENTRY responses you receive. If you don't get all of them with in femptoseconds, you must restart the LOG_LIST_REQUEST. Presumably you can specify the last log ID in your new request as the lowest log number which has a contiguous sequence up to the last_log_num present in each of the LOG_ENTRY packets... MAVProxy's use of a hash would allow for this to work (but ArduPilot won't ever send gappy log lists; a quick squiz at PX4 Firmware shows that it might, however).

And, of course, you have to assume that nobody is playing games with you - nobody adding or removing files....

@andyp1per
Copy link
Collaborator Author

@rmackay9 FYI, this bug exists in 4.0.3 also

@andyp1per andyp1per force-pushed the pr-flash-fix3 branch 3 times, most recently from ff03025 to aab6720 Compare June 27, 2020 21:15
@andyp1per andyp1per changed the title AP_Logger: make block logger conform to mavlink expectations AP_Logger: make block logger conform to logging expectations Jun 27, 2020
@andyp1per
Copy link
Collaborator Author

@peterbarker could do with another review now

@andyp1per andyp1per force-pushed the pr-flash-fix3 branch 2 times, most recently from eacfe2b to d92b44b Compare June 29, 2020 20:47
@andyp1per
Copy link
Collaborator Author

The stats support allows me to see that we really only can get a max of 125kb/s on the dataflash chips. I tried an experiment with quad page program, but that seems to trigger other problems so I gave up.

@peterbarker
Copy link
Contributor

peterbarker commented Jul 20, 2020 via email

@andyp1per
Copy link
Collaborator Author

@peterbarker sorry 125kB/s, 1000kb/s

@andyp1per andyp1per force-pushed the pr-flash-fix3 branch 2 times, most recently from bf3230a to cc7dcde Compare July 27, 2020 20:44
libraries/AP_Logger/AP_Logger.cpp Outdated Show resolved Hide resolved
@vierfuffzig
Copy link
Contributor

@andyp1per just testet on omnibusnanov6 with excellent results. thanks a lot again!

…ring

add support for log time to block logger
refactor rotation into backed. Don't start logs when erasing
correct log start logic
separate read and write points so that requesting log information does not corrupt the current log
when starting a new log stop logging first
clear the write buffer when starting a new log
insert utc time when requesting info for the current log
stop logging and request formats again when starting a new log
cope with erase happening while we are logging
keep pushing out startup messages even when format messages are done
don't log to the gcs in the io thread
don't start new logs in the io thread
don't validate logs while erasing
flush logs when stopping logging
account for page header when calculating logs sizes
don't return data when asked for more data than in the log
optimize locking and use separate semaphore to mediate ring buffer access
stop logging when the chip is full and send a notification
calculate logs sizes correctly even when they wrap
read log data correctly even when it wraps
add stats support to block logger
reset dropped when starting a new log
fail logging when the chip is full
refactor critical bufferspace checks
increase messagewriter budget to 250us and to 300us for FMT
libraries/AP_Logger/AP_Logger_Backend.cpp Show resolved Hide resolved
libraries/AP_Logger/AP_Logger_Backend.cpp Show resolved Hide resolved
libraries/AP_Logger/AP_Logger_Block.cpp Show resolved Hide resolved
libraries/AP_Logger/AP_Logger_Block.cpp Show resolved Hide resolved
libraries/AP_Logger/AP_Logger_Block.cpp Show resolved Hide resolved
if (log_write_started || df_Read_PageAdr != page) {
StartRead(page);

// Sanity check we haven't been asked for an offset beyond the end of the log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a flow-of-control internal error for this sort of thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but internal errors never clear and these can be transient. So I dunno....

libraries/AP_Logger/AP_Logger_Block.cpp Show resolved Hide resolved
libraries/AP_Logger/AP_Logger_Block.cpp Show resolved Hide resolved
libraries/AP_Logger/AP_Logger_Block.cpp Show resolved Hide resolved
libraries/AP_Logger/AP_Logger_Block.cpp Show resolved Hide resolved
@peterbarker peterbarker merged commit 6ebc762 into ArduPilot:master Sep 5, 2020
@peterbarker
Copy link
Contributor

Merged, thanks!

Appreciate your patience on this one.

Thanks also to all the testers of this PR!

I added some diagnostics for the disarm case I flagged - it's 0.2ms, quite acceptable.

@andyp1per would be nice to get a follow-up PR for the minor stuff I've tagged here.

@andyp1per andyp1per deleted the pr-flash-fix3 branch September 5, 2020 08:00
andyp1per added a commit to andyp1per/ardupilot that referenced this pull request Sep 5, 2020
peterbarker pushed a commit that referenced this pull request Sep 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants