Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self documenting of log data #5864

Closed
magicrub opened this issue Mar 14, 2017 · 18 comments
Closed

Self documenting of log data #5864

magicrub opened this issue Mar 14, 2017 · 18 comments

Comments

@magicrub
Copy link
Contributor

We need a self maintaining mechanism for keeping the log data documented. Something like the params scheme where both the wiki and a GCS is up to date by developer comments.

@Georacer
Copy link
Contributor

Are you referring to the MAVLink dialect or the Dataflash logs?

@magicrub
Copy link
Contributor Author

I was referring to DataFlash, specifically stuff like this which are sprinkled throughout the codebase:
https://github.com/ArduPilot/ardupilot/blob/master/libraries/DataFlash/LogStructure.h#L780

@magicrub magicrub added the Docs label Mar 14, 2017
@Georacer
Copy link
Contributor

For background and reference, as @OXINARF told me here that Dataflash, logs aren't exactly stable.
So, yes, it makes sense to have dynamically generated documentation, to avoid double work; one for coding and one for documenting.

Comparing with the parameter documentation, each parameter is declared in the code along with its description and default values. That means that the author must provide that description, it's not parsed automagically from the code, which is reasonable.

For the same reason, it makes sense that each DF log message type should be an object with accompanying information. Tthe author of each DF log message should document it with supplemental information, at minimum with a description and units.

It's not clear to me right now how the parameter description code is parsed to generate the documentation, but I know where to find that information (I think).

What is important IMO, because old logs from previous versions are often re-visited, is to be able to generate documentation from older versions. This could be problematic for previous versions, but from now on it shouldn't be a problem. The missing old information perhaps could be added by hand somehow.

@hunt0r
Copy link

hunt0r commented Mar 14, 2017

A small comment: In addition to the documentation for usability (units, log message field explanations, etc. I've actually started a document to this effect for my own learning) we could begin with dev-level documentation of anything that's not-open-to-change about the logs.

For example: At present the first FMT message defines the "FMT" message type, including it's length. But there's a "catch-22": How would a log-processor, starting "from scratch" know the length of the first FMT message, before it's read the contents of the message? (which may require knowing the read-length)
Idea 1) The FMT length is fixed, and never going to change. (I think it's 89 now.) So the log reader can always read the first message of a log at that length, which allows it to learn other lengths as it goes.
Idea 2) The first log message will always be a FMT message, defining the FMT message type. Thus a log reader finds the first length byte (currently 2-byte header + 1-byte msg type + 1-byte new-msg type implies the 5th byte is the first length in the log)

Also, the comments in the code suggest the header is subject to change and/or elimination... is this still intended? Or should the header be fixed as something that's not subject to change?

@Georacer
Copy link
Contributor

@hunt0r Actually, I don't think the message specification itself should be an issue for this specific ticket.

For comparison, check how new parameters are declared here, specifically in step 3.

Notice how (even though it is discouraged), all of the code defining and specifying any parameter can (and often does) change.
However, this change never needs to be transcribed manually, because there is automatic code which generates documentation based on the parameter metadata which exists embedded in the code.

Getting back to the case of DF messages, if each type was given metadata in a similar case, then automatic code generation could produce all of the format specification, without needing to actually parse the FMT messasges of the log.
Of course, extra human readable information should be also added, as discussed before (description, units etc).

@magicrub
Copy link
Contributor Author

The self describing bit of the DF should stay the way it is, that ensures it is interpreted and displayed correctly in a gui. However, the description text should not go into the log because it will add 10k of text every time it starts a new log. That withdraw be nice, but unnecessary. The GUI can just get it from the internet like it does with params.

@OXINARF
Copy link
Member

OXINARF commented Mar 15, 2017

There was already an issue for this (#5460). @magicrub please search first next time.
There is a proposal there from Don that is interesting - basically generate log code and documentation from an XML file, like MAVLink. A lot of work would be needed to implement that though.

@Georacer What you are proposing isn't very feasible because it would imply searching the entire codebase for log documentation. Parameters are different - we look at each vehicle's Parameters.cpp file and from there we get all the information needed, including additional files to be parsed.

@peterbarker
Copy link
Contributor

As opposed to MAVLink - which requires an external document to understand, and is a rather formal specification, dataflash log files by design intent are expected to be self-describing. This was my intent with the units branch - to extend the extent to which they are self-describing!

https://github.com/ArduPilot/ardupilot/compare/master...peterbarker:units?expand=1

That patch (apart from adding sanity checking) adds units and multipliers to the metadata the log carries. This information is really useful to log analysis tools as they can e.g. convert radians to degrees where appropriate. For that reason I think we should push ahead with that units patch.

It would also be wonderful to get textual descriptions of the fields into the logs. Sadly, from a practical standpoint, that's not possible on stm32-class flight controllers; we can't carry all of the extra text for that (as 2magicrub points out).

We could pre-process the code like we do for parameters - but pre-processing sucks. Or we could conditionally-compile and include the textual descriptions for platforms that can afford the extra text (Linux!) A compiled tool could then be run to generate an XML document to be used when the log does not include the extra text. The XML document would obviously suffer from the same versioning problem we have for parameter files (but we'll solve it at the same time we get around to solving the parameter one).

@OXINARF It's interesting you bring up Don's efforts there. We recently moved further away from that sort of system when we got Log_Write() - because setting all of the data structures up was too much like hard work for devs who wanted their DF message right now!

@Georacer
Copy link
Contributor

@OXINARF I might just be dense, but I don't see why parsing the codebase for DF msg definitions wouldn't be very feasible.
Yes, I understand that parameters are more localized in the codebase and can be parsed quasi-independently of the rest of the project, but how worse could the situation with DF msgs be?

Would it be a deal breaker if a tool needed, say, 5 minutes to parse the entire codebase? Or am I completely deluded?

@OXINARF
Copy link
Member

OXINARF commented Mar 16, 2017

dataflash log files by design intent are expected to be self-describing

I always thought of it being a self-described binary format, not necessarily having a description of the information. As you said, that looks impossible to do with a PX4 board. I don't oppose having units and multipliers, but I think that without further information it is quite useless - you won't graph a variable if you don't know what it is and units aren't going to change that.

It's interesting you bring up Don's efforts there. We recently moved further away from that sort of system when we got Log_Write() - because setting all of the data structures up was too much like hard work for devs who wanted their DF message right now!

And you know I love Log_Write! But I think it wouldn't be needed if you just wrote the message format in, for example, a XML document and then you would know you had a method with a specific name format ready to be called. At least for me, I love Log_Write because I don't have to write code in several places just to have write a log message - but two places would be acceptable.

Would it be a deal breaker if a tool needed, say, 5 minutes to parse the entire codebase? Or am I completely deluded?

If we want to enforce it, then it needs to run on Travis so that the build fails if the documentation is incorrect or absent. And 5 minutes per job in Travis is indeed a lot of time.

@Georacer
Copy link
Contributor

I don't oppose having units and multipliers, but I think that without further information it is quite useless - you won't graph a variable if you don't know what it is and units aren't going to change that.

+1

@amilcarlucas
Copy link
Contributor

I like the units PR. it is at least a step forward. It will however break my code, that converts DF logs into DCM ( https://www.etas.com/download-center-files/products_ASCET_Software_Products/TechNote_DCM_File_Formats.pdf ) but hey... that's worth it.
BTW @magicrub and @peterbarker the last two commits on ArduPilot/mavlink#39 are also doing the same stuff on Mavlink. There it is easier because it is centralized XML

@jaxxzer
Copy link
Member

jaxxzer commented May 30, 2018

In the past year or so.. Has there been any further effort, discussion or decisions on how to proceed with this issue?

@amilcarlucas
Copy link
Contributor

I think the easiest way to get this done is to add doxygen documentation blocks above each Log-Write function call.
Doxygen would then bundle all those scattered documentation blocks into a single .html webpage.

What do you guys think about it ? Does Doxygen already run at out server?

@peterbarker
Copy link
Contributor

I'm playing with documenting AP_Logger messages in a similar way to the way we document parameters in this branch:

#11765

@amilcarlucas whatever we do has to also produce a machine-readable document - probably XML. Do you think Doxygen would be a reasonable alternative given that requirement?

@lvale
Copy link
Member

lvale commented Jul 10, 2019

@amilcarlucas
Copy link
Contributor

yes, doxygen is a good generator.

@peterbarker
Copy link
Contributor

We now have this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants