Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for timesync from onboard companion computer #1278

Closed
wants to merge 7 commits into from

Conversation

mhkabir
Copy link
Member

@mhkabir mhkabir commented Aug 15, 2014

This is a part of building in support for timing and sync of messages coming from an onboard companion computer. While this could be done more elaborately, I think this should suffice.
I will add more support to keep track of computer's clock to align time, so that it is useful for logging, stamped vision messages, etc. Would it be better to add a uORB topic for time and then add another module for maintaining time all over the system, tracking clock of companion computer and adding offsets, etc.?

sdlog2 I think needs to be modified to use system time, instead of time directly from vehicle_gps_position topic.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 15, 2014

@LorenzMeier PR for mavros for this is on the way. How often should this message be sent? Just at system boot or over a period?

@LorenzMeier
Copy link
Member

Thanks! The clock you set is the one responsible for files and alike stuff. It would be unsafe to modify the main system clock though - because you can end up with negative quantities, which is a major safety concern.

So what we would need to do is a bit more elaborate: Set a telemetry time offset in the mavlink app (for all mavlink apps, so needs to be a static class member) and apply the offset for all messages that require Unix time. For messages in time_boot_ms we would need to store the offset on the mavros side.

Please also consider the case of rebooting either system (autopilot, onboard computer) and the required update. You might also want to look into time sync protocols & phase delay estimation / compensation.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 15, 2014

Great :) Will do the implementation as soon as possible. I'll look more into time sync protocols. Seems interesting : http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1200555&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1200555

@LorenzMeier
Copy link
Member

Ah yes, and there is GPS time and time zones, of course 8). @vooon was interested in getting the time for the onboard computer from the autopilot, because a lot of them don't have RTC standby batteries and the file system freaks out if it doesn't get a proper time.

So ideally you would set the time on the Linux computer based on GPS time + a timezone parameter and send telemetry time where the unix timestamp is asked for as unix time synchronised to GPS time. This would make thinks reasonably robust and accurate.

@LorenzMeier
Copy link
Member

On time sync: Keep it simple if possible. You also might want to post on the mailing list if somebody has done this before and has a recommendation for a simple protocol.

@LorenzMeier
Copy link
Member

@markusachtelik Would you have some time sync recommendations for us here? We did this all before in our old system, but I'd like to do it "properly" here and I'm wondering if you did it nicely on your autopilot bridge to Asctec systems.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 15, 2014

We also have to consider that the Linux computer could have time synced to NTP servers, which could be used instead of GPS time, incase GPS is unavailable. My in-progress system connects (with the Rocket M5 pair) up to a GCS which has full internet capablities, so this could be useful.

@vooon
Copy link
Contributor

vooon commented Aug 15, 2014

@mhkabir mavros alredy can send SYSTEM_TIME messages https://github.com/vooon/mavros/blob/master/mavros/src/plugins/sys_status.cpp#L355 .

I implement it for px4flow, but anybody can use it :).

@vooon
Copy link
Contributor

vooon commented Aug 15, 2014

And maybe time sync on OBC can be easier (because ntpd alredy has synchronization algos).

@LorenzMeier
Copy link
Member

@mhkabir Reboot is easy to detect: Each system keeps track of the boot time of the other party. If the boot time decreases, it must have rebooted.

@LorenzMeier
Copy link
Member

@mhkabir The current code does not actually sync the telemetry-related clocks, nor does it take care of various corner cases. The current functionality is limited to setting the file system clock correctly, so that log files have the right date. How did your research on sync end up? Do you have further questions you'd like to discuss?

@mhkabir
Copy link
Member Author

mhkabir commented Aug 19, 2014

@LorenzMeier I explored the code a bit. On synchronizing time, I think what @vooon proposes would be best. We use the time on the computer (set from GPS or Internet Time) as ntpd has synchronization algorithms built in (bit slow to converge though). So the computer has correct UNIX time.

The PX4 receives this time and uses it for non-trivial tasks like the filesystem and logging. The PX4 and computer both independently keep track of time_boot_ms of the other system. Timestamps in messages are time_boot_ms of the system it originated from. The other system simply adds or subtracts the time offset as necessary to get correct time synced to its own boot time.

Now the nitty gritty part of this is actually tracking the time and offsets on the PX4 side. I am not very sure what would the best way be for this to be implemented. What do you think?

@LorenzMeier
Copy link
Member

@mhkabir This sounds perfectly reasonable. Tracking the time offset could be done in the MAVLink app and could then publish the messages via uORB received by ROS with the offset applied. You can initially just set it once and then augment it using a sync algorithm.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 20, 2014

We could implement a simple time-request system for robust implementation. Using SYSTEM_TIME and something like SYSTEM_TIME_REQUEST would work well enough. The offset can be updated in the message callback on either side.

t0 is the client's timestamp of the request packet transmission,
t1 is the server's timestamp of the request packet reception,
t2 is the server's timestamp of the response packet transmission and
t3 is the client's timestamp of the response packet reception.

The offset is given by
((t1 - t0) + (t2 - t3 )) /2

That should work fairly well, I think.

@markusachtelik
Copy link

hi, sorry for the late comments:

this is how we did it:
https://github.com/ethz-asl/asctec_mav_framework/blob/master/asctec_hl_firmware/sdk.c#L641
https://github.com/ethz-asl/asctec_mav_framework/blob/master/asctec_hl_interface/src/hl_interface.cpp#L403

essentially like suggested by @mhkabir above. t1=t2 in our case and we smooth the offset quite a bit. Then, we correct the timestamp every few cycles by a small dt, making sure we do not have a negative dt between two control cycles. Setting this up was quite tricky: to much correction could make the time oscillate, and too few correction was not able to deal with the drifting clock skew of the microcontroller when we went outdoors in the cold ... also the time hard sync in l650 could be reconsidered.

all in all, it did it's job so far, but we're also thinking about how to improve it.

the other option would be having a counter on the microcontroller, poll it from the pc and then compute a function, mapping counter to pc time, as e.g. ticsync does it: http://www.robots.ox.ac.uk/~mobile/wikisite/pmwiki/pmwiki.php?n=Software.TICSync

@mhkabir
Copy link
Member Author

mhkabir commented Aug 20, 2014

@LorenzMeier. Smoothing the offset and the hard sync are points to consider. Any improvements you could suggest here?

@markusachtelik How bad has the clock skew been in your experience due to temperature changes?

I'm going ahead and pushing the simple sync to my branch after testing.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 20, 2014

@LorenzMeier I also realized something else. Why track the time offset on the PX4 side at all? We can almost do everything on the mavros side. Simply subtracting offset on mavros side should do it... ?( normally since computer boot time > PX4 boot time)

Complicated sync if required can be messed around with there as we have a large overhead.

@LorenzMeier
Copy link
Member

Because the quantities estimated on ROS need to be in sync - if you modify the time stamps in mavros you would need to hold off sending time stamped messages until time is synced. It might have further implications such as data association.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 20, 2014

    mavlink_system_time_t t;
    mavlink_msg_system_time_decode(msg, &t);

    dt = (hrt_absolute_time() - t.time_boot_ms) - time_offset ;

    if(companion_reboot){
        #ifndef CONFIG_RTC
        //Since we lack a hardware RTC, set the system time clock based on companion computer UNIX time (from GPS time or NTP servers)
        timespec ts;
        ts.tv_usec = t.time_unix_usec;
        clock_settime(CLOCK_REALTIME, &ts);
    #endif
    time_offset = hrt_absolute_time() - t.time_boot_ms;
    companion_reboot = false;
    }

    if(dt > 10e6) // 1s offset skew = reboot -> CHEKC THIS
    {
    warnx("Companion computer reboot");
    companion_reboot = true;
    }
    else
    {
    time_offset = (time_offset + (hrt_absolute_time() - t.time_boot_ms)/2; //add more?
    }

How does this look? @LorenzMeier

@LorenzMeier
Copy link
Member

@mhkabir The logic for the reboot check doesn't look correct it should be checked first, not only applied in the next run. Note that its technically not correct that Pixhawk has no RTC - it even has a standby battery. Its merely a question of enabling it. So the system time should only be set if its currently set to a date smaller than e.g. 1/1/2011.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 22, 2014

@LorenzMeier . So moving the reboot check to the top should get it applied in the same run. Or is something else wrong. I'm afraid I'm a bit confused here.

As far as I'm aware and at least on my Pixhawk, the standby battery space is unpopulated. I think this should be changed in the newer production batches to take advantage of the RTC capabilities? Also, is RTC enabled in NuttX?

What should be choose as the date? 1/1/2011 seems good as any...?

@mhkabir
Copy link
Member Author

mhkabir commented Aug 22, 2014

@LorenzMeier Added the stuff in. Please check.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 23, 2014

@LorenzMeier I think I've satisfactorily dealt with most of the corner cases.

  1. We set onboard epoch time from GPS or companion computer if it is <1/1/2011 (gps PR needed)
  2. If companion reboots, it might have invalid epoch time. So, send sync packet with proper epoch time if we have it.

A bit of mavros implementation is needed to close the loop and get this running.

@@ -159,6 +160,10 @@ class MavlinkReceiver
float _hil_local_alt0;
struct map_projection_reference_s _hil_local_proj_ref;

static uint64_t time_offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not be static, but per MAVLink instance. You can have two computers / one computer, one GCS talking to the system and we want to keep track of each their offset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix pushed

@mhkabir
Copy link
Member Author

mhkabir commented Aug 23, 2014

@LorenzMeier Okay now?

// make sure the FTP server is started
(void)MavlinkFTP::getServer();
companion_reboot = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this into the initialiser list above the braces.

@mhkabir
Copy link
Member Author

mhkabir commented Aug 25, 2014

@LorenzMeier I think that should wrap it up, just need to do some modifications on mavros side to handle corner cases and we should be done.
Then its just testing :)

@vooon
Copy link
Contributor

vooon commented Aug 25, 2014

@mhkabir Note: I was forced to release 0.7.1 today (mavlink update).

@mhkabir
Copy link
Member Author

mhkabir commented Aug 26, 2014

@vooon OK. I will send you an PR when I'm done.

@mhkabir mhkabir mentioned this pull request Sep 4, 2014
@mhkabir
Copy link
Member Author

mhkabir commented Oct 8, 2014

@vooon, can you please post these code...?

@vooon
Copy link
Contributor

vooon commented Oct 8, 2014

@mhkabir Posted in #1380.

@mhkabir
Copy link
Member Author

mhkabir commented Oct 8, 2014

@vooon @LorenzMeier I rebased off Vladimir's updates.

@mhkabir
Copy link
Member Author

mhkabir commented Oct 8, 2014

@LorenzMeier Lets just get this clear, with the current state of mavros+ this PR, we get microsecond resolution on PX4 side and millisecond resolution in ROS.
ROS cannot get more resolution(for now) as ALL PX4 messages contain time_boot in ms only.

This is the best we can use as a base now , I think.

@mhkabir
Copy link
Member Author

mhkabir commented Oct 8, 2014

@vooon The PX4 locks up within seconds on initiating sync. Probably a overflow or something in the code.
@LorenzMeier How can we debug this? Even though a very elaborate procedure is not needed, I'd still like to learn the proper way to do it.

@vooon
Copy link
Contributor

vooon commented Oct 8, 2014

Just built this branch.

Also i added diagnostic to sys_time, dt changes in range -1..1 ms, while mean -0.07±0.01 ms.

Results of ntpd_driver shm_driver:

$ ntpq -p      
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+31.131.249.26   89.175.22.41     2 u 1835 1024  372   63.313    6.946  15.561
+ns1.ooonet.ru   89.109.251.23    2 u  921 1024  377   63.773   -4.334   9.622
*mail.sonur.ru   .PPS.            1 u  850 1024  373   59.631   -3.017  11.839
+95.213.132.254  89.175.22.41     2 u  837 1024  373   47.226   -2.153 106.715
+golem.canonical 193.79.237.14    2 u   44 1024  377  101.706   -3.955  31.572
 SHM(2)          .ROS.           16 l    9   16  177    0.000  364.202   3.408
$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+31.131.249.26   89.175.22.41     2 u 1865 1024  372   63.313    6.946  15.561
+ns1.ooonet.ru   89.109.251.23    2 u  951 1024  377   63.773   -4.334   9.622
*mail.sonur.ru   .PPS.            1 u  880 1024  373   59.631   -3.017  11.839
+95.213.132.254  89.175.22.41     2 u  862 1024  373   47.226   -2.153 106.715
+golem.canonical 193.79.237.14    2 u   69 1024  377  101.706   -3.955  31.572
 SHM(2)          .ROS.           16 l    2   16  377    0.000  360.557   2.315

@vooon
Copy link
Contributor

vooon commented Oct 8, 2014

Oh, i get error: fcu don't respond to param and mission protocol. Only imu data (at 20 Hz).
mavconn shows fast io, but not from SYSTEM_TIME.

In PR #1380 i don't see that error.

@vooon
Copy link
Contributor

vooon commented Oct 8, 2014

Ohh, it's my fault. I forgot to set proper target id's.

@mhkabir
Copy link
Member Author

mhkabir commented Oct 9, 2014

@LorenzMeier We have everything 'working' as of now. Now the concern is the continuous negative dt.
The dt mean is -0.133 ms or so mostly. This causes the offset to slowly decrease over time. I'm not sure why this happens, but it could be due to some form of NTP sync on the odroid, which is skewing the system clock. I'm not sure if this would affect ROS wall time though.

Ideas?

@mhkabir
Copy link
Member Author

mhkabir commented Oct 9, 2014

This is probably the best we can do without latency compensation for the transfer. We should implement what I wanted to do initially, in a 2-way sync packet.

@mhkabir
Copy link
Member Author

mhkabir commented Oct 11, 2014

@LorenzMeier We need your inputs on this. We have basic sync working, albeit with some observations :

  1. The mean drift we get, changes with packet rates. E.g with 10Hz packets, we get a mean dt of around -0.07 ms
  2. The dt remains constantly negative, however little.
  3. Latency on the USB link affects our timing probably leading to bad sync.
  4. We might need to augment this with better sync algorithms instead of simply averaging the offset, perhaps?

@LorenzMeier
Copy link
Member

@mhkabir I think we need to test this with more boards and see if we get some variance in the dt. If we do its probably ok as is, if its a constant offset we need to check why this is induced. @vooon Any chance you could run a test on this?

@vooon
Copy link
Contributor

vooon commented Oct 30, 2014

@LorenzMeier last time when i tested mavros i got similar dt, but x10 to @mhkabir 's values (because i run it at 1 Hz).
I keep getting negative mean dt.

@vooon
Copy link
Contributor

vooon commented Oct 30, 2014

Also probably my dt averaging algo is bad.

@mhkabir
Copy link
Member Author

mhkabir commented Oct 31, 2014

I get a negative mean dt too. It's a basic problem I think.

We don't compensate for link latency, which all time sync algos do. This
causes jitter according to the latency on the link, which keeps changing.
This is what I've observed in testing.
On Oct 31, 2014 12:24 AM, "Lorenz Meier" notifications@github.com wrote:

@mhkabir https://github.com/mhkabir I think we need to test this with
more boards and see if we get some variance in the dt. If we do its
probably ok as is, if its a constant offset we need to check why this is
induced. @vooon https://github.com/vooon Any chance you could run a
test on this?


Reply to this email directly or view it on GitHub
#1278 (comment).

@LorenzMeier
Copy link
Member

@mhkabir If you look further up, we discussed the potential need for a proper sync algorithm. Have you considered the tips from @markusachtelik here?
#1278 (comment)

@mhkabir
Copy link
Member Author

mhkabir commented Nov 2, 2014

Yes, as I mentioned earlier, we need a proper sync message with two
uint64_t fields for that.

On Sun, Nov 2, 2014 at 3:34 PM, Lorenz Meier notifications@github.com
wrote:

@mhkabir https://github.com/mhkabir If you look further up, we
discussed the potential need for a proper sync algorithm. Have you
considered the tips from @markusachtelik
https://github.com/markusachtelik here?
#1278 (comment)
#1278 (comment)


Reply to this email directly or view it on GitHub
#1278 (comment).

@LorenzMeier
Copy link
Member

Could you propose a format? It would be trivial to add.

@mhkabir
Copy link
Member Author

mhkabir commented Nov 2, 2014

Here you go :

Client | | Server
tc1 |--------| ts1
tc2 |--------| ts2

ts1=ts2; tc2 is measured on the arrival of the packet

MAVLINK_MSG_ID_TIMESYNC :
int64_t tc1
int64_t ts1

@LorenzMeier
Copy link
Member

@mhkabir If you merge master in this branch, you will have the new message at your disposal. Curios to see what it improves.

@mhkabir
Copy link
Member Author

mhkabir commented Nov 9, 2014

@LorenzMeier Thanks. I'll need to rework the code to use the new message.
Not much free time this week, but I'll see when I can squeeze in some time.

@mhkabir
Copy link
Member Author

mhkabir commented Dec 13, 2014

I'm closing this as it has rotted way too much . Open new one with new interface this week.

@mhkabir mhkabir closed this Dec 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants