UAVCAN performance audit #6829

pavel-kirienko · 2017-03-15T22:30:16Z

I will be logging my efforts to improve the UAVCAN driver, especially in the sense of CPU load, in this pull request. Please do not merge it yet.

Considerable time has been spent on my fruitless attempts to squeeze the driver into the FMUv2 build by switching some optimization flags and removing some not-so-critical components. Unfortunately, during my experiments the system tended to either overflow ROM, fail in a confusing way during boot, or just refuse to perform basic functions such as connecting to QGC. At some point I gave up (with much regret) and replaced FMUv2 with FMUv4.

According to the reports from @LorenzMeier, a system consisting of Zubax GNSS + a Sapog ESC + FMUv1 creates a 12% CPU load in the UAVCAN driver. I am observing a slightly different number of 7.6% with my FMUv4 setup, perhaps the difference can be explained by a higher clock rate of the newer board.

Looking at the CAN bus traffic I noticed that the ESC command messages carry six values, whereas my setup requires only four. It turned out that the default CAN quadrotor mixer is configured to generate 6 outputs rather than 4, the extra 2 serving some obscure purpose I couldn't understand. In order to validate this, I defined a new mixer and made the CAN quadrotor vehicle config refer to it. Expectedly, this change reduced the number of values per command message from 6 to 4, which allowed the UAVCAN stack to send only one frame per message. This change has dropped the CPU usage from 7.6% to 5.7%.

@LorenzMeier @AndreasAntener I strongly advice to review your VTOL mixer files and make sure that no unnecessary outputs are being generated, they slow things down.

I also added a check in the UAVCAN ESC output bridge that removes the trailing zero setpoint values from the output. This should help to somewhat improve performance even when buggy mixing is afoot.

My next step is to let the system run for several hours under a sampling profiler in order to find bottlenecks. Has this ever been done before?

…e that are always zero. This allows the UAVCAN stack to always transfer only the minimum number of output values, avoiding redundant zeroes and the associated increase in bus load and CPU time

pavel-kirienko · 2017-03-16T00:22:26Z

Looking at preliminary profiling results (profiling is still in progress) I noticed that the system performs a lot of 64-bit floating point arithmetic computations. I understand that in some cases it is intentional and perfectly valid, since the resolution of 32-bit float is really limited. However, certain cases look like a smoking gun to me. Have a look at this instance:

Observe that a call stack rooted in the class Vector<float> (notice that the template is instantiated with the float type argument) ends up in the 64-bit square root computation routine. I checked the source and found this:

    Type norm() const {
        const Vector &a(*this);
        return Type(sqrt(a.dot(a)));
    }

It looks like the author of this code intended to use std::sqrt(), since the version from the C++ standard library is overloaded for float and double, unlike the version available from the C standard library. The alleged mistake caused the compiler to convert float to double, compute square root using emulated FP, and then convert the result back to float. Note that the M4 core has a hardware support for floating point root finding, so the difference in performance could be drastic.

One might argue that if such an error took place, the compiler would trigger the carefully enabled warning -Wdouble-promotion. Correct me if I'm wrong, but as I understand the documentation, this flag will only report implicit promotion to double, but not implicit conversion. The example above features an implicit conversion rather than promotion, so no warning should be expected. Note that -Wconversion, were it enabled, probably would not have helped here either, although for different reasons.

There are dozens of similar suspicious instances where 64-bit soft FP routines are invoked (not only sqrt but also add, asin, mul, etc.) from where they probably shouldn't be. Have a look at this preliminary interactive SVG to get an idea: pmpn-flamegraph.tar.gz (tomorrow I will upload a higher resolution file - sampling is still in progress and it is slow).

Paging @davids5.

davids5 · 2017-03-16T00:49:27Z

@jgoppert

@pavel-kirienko has done some really good sleuthing herein. Can we look at avoiding the conversions and emulated FP overhead?

jgoppert · 2017-03-16T04:27:31Z

Good stuff. Happy to make changes to matrix. Can you submit a PR if you get it pinned down?

AndreasAntener · 2017-03-16T06:33:36Z

@pavel-kirienko I've changed the startup on our side so the CAN mixer only ever receives the multirotor component. So we should not have any overhead there. I'll check with @eyeam3 how many values the bus transports in our case.

eyeam3 · 2017-03-16T08:13:44Z

We have an octocopter setup, so the bus transports eight values.

LorenzMeier · 2017-03-16T08:38:10Z

@pavel-kirienko Could you please fix the routines in matrix and send a PR?

pavel-kirienko · 2017-03-16T10:08:08Z

@LorenzMeier Already doing so.

pavel-kirienko · 2017-03-16T10:16:01Z

Here's the high resolution output: pmpn-flamegraph.tar.gz

Here's some of the most common endpoint functions:

 17.6%    os_start
  6.7%    up_idle
  4.0%    spi_readword
  2.3%    memset
  2.2%    __aeabi_dadd
  1.9%    exception_common
  1.9%    __muldf3
  1.8%    <unknown symbol>
  1.6%    __ieee754_sqrtf
  1.6%    __ieee754_sqrt

In this list, all routines prefixed with double underscore perform 64-bit floating point arithmetic.

LorenzMeier · 2017-03-16T10:30:14Z

Way too much memset going on there. We might need to revisit our defensive programming strategies.

pavel-kirienko · 2017-03-16T12:20:50Z

Does anybody else find the 4% of spi_readword alarming? I checked the sources and found that the driver does a polling wait from an interrupt handler. I also ventured to make sure that the bus is clocked at the proper rate of 11 MHz (yes it is).

Can DMA or at least an IRQ-driven logic be an option?

pavel-kirienko · 2017-03-16T12:36:54Z

A simple test shows that the system, indeed, spends about 5% of its run time doing nothing in the SPI polling loop (notice the duty cycle estimate in the bottom right corner):

The high level is held while the SPI driver is waiting for completion of the current transfer.

LorenzMeier · 2017-03-16T12:55:26Z

That's a design decision originally done for a single sensor and lower-rate reads. The reason this is not an immediate concern from a controls perspective is that if you're not running at 100% load there is nothing you can do before that SPI update - you are waiting for a total system state update and unless you have new accel and gyro data there is no actionable information.

No sensor data, no controller output and if there is no controller output to execute on there is nothing high-priority to do. For the same reason UART has RX DMA so that this prioritisation does not lead to communication drops.

However, work is under way to longer-term replace this with SPI DMA. That will allow to run more CPU load on house-keeping tasks, but it will not improve the flight performance on current setups - because the actionable information does not become available before. The only reason SPI DMA can add value on future-generation systems is that you can go to higher oversampling of sensor data for new high-rate access and gyros.

Please focus on the CPU load related aspects. These are the ones that delay executing actionable information.

pavel-kirienko · 2017-03-16T16:00:08Z

For the record here, the best performance can be achieved with -O2, which is already the default. -O3 adds about 8K of code and no noticeable performance gain, -Os shrinks the code by 2K and adds about 1% of CPU load.

pavel-kirienko · 2017-03-17T11:52:17Z

Fixing the Matrix library brought CPU usage down by 10%.

Before:

Processes: 26 total, 4 running, 22 sleeping
CPU usage: 69.66% tasks, 1.96% sched, 28.39% idle
DMA Memory: 5120 total, 1024 used 1536 peak
Uptime: 1115.418s total, 319.128s idle

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE
   0 Idle Task                  319128 28.385   660/  748   0 (  0)  READY
   1 hpwork                      25149  2.244   784/ 1780 192 (192)  w:sig
   2 lpwork                       1067  0.072   644/ 1780  50 ( 50)  w:sig
   3 init                        10962  0.000  2096/ 2484 100 (100)  w:sem
 338 top                         29590  2.679  1264/ 1684 100 (100)  RUN
 100 gps                          3020  0.434   936/ 1372 220 (220)  w:sem
 103 dataman                       139  0.000   720/ 1180  90 ( 90)  w:sem
 142 sensors                     51157  4.561  1344/ 1980 250 (250)  w:sem
 144 commander                   25550  2.244  2832/ 3548 140 (140)  w:sig
 156 mavlink_if0                 93264  8.472  1688/ 2380 100 (100)  w:sig
 157 mavlink_rcv_if0                79  0.000   952/ 2100 175 (175)  w:sem
 166 mavlink_if1                 21091  1.882  1632/ 2420 100 (100)  w:sig
 168 mavlink_rcv_if1                75  0.000   952/ 2100 175 (175)  w:sem
 179 mavlink_if2                 15795  1.375  1632/ 2388 100 (100)  w:sig
 180 mavlink_rcv_if2                75  0.000   952/ 2100 175 (175)  w:sem
 201 uavcan                      92405  8.037  1848/ 2380 240 (240)  w:sem
 203 uavcan fw srv               14724  1.230  3064/ 5996 120 (120)  w:sem
 204 commander_low_prio             39  0.000   664/ 2996  50 ( 50)  w:sem
 216 frsky_telemetry                12  0.000   552/ 1020 200 (200)  w:sem
 237 mavlink_if3                 46224  4.199  1632/ 2388 100 (100)  w:sig
 239 mavlink_rcv_if3              2153  0.000   952/ 2100 175 (175)  w:sem
 247 sdlog2                        697  0.000  1696/ 3316 177 (177)  w:sig
 282 ekf2                       267381 24.185  5072/ 5780 250 (250)  w:sem
 287 mc_att_control              47537  4.272  1280/ 1676 250 (250)  w:sem
 298 mc_pos_control              40143  3.620  1016/ 1876 250 (250)  w:sem
 309 navigator                    2218  0.144  1040/ 1572 105 (105)  w:sem

After:

Processes: 26 total, 4 running, 22 sleeping
CPU usage: 59.24% tasks, 2.18% sched, 38.57% idle
DMA Memory: 5120 total, 1024 used 1536 peak
Uptime: 21.391s total, 9.537s idle

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE
   0 Idle Task                    9536 38.573   660/  748   0 (  0)  READY
   1 hpwork                        463  2.401   888/ 1780 192 (192)  w:sig
   2 lpwork                         17  0.072   644/ 1780  50 ( 50)  w:sig
   3 init                        14553  0.000  1996/ 2484 100 (100)  w:sem
 100 gps                            57  0.072   936/ 1372 220 (220)  w:sig
 103 dataman                         2  0.000   720/ 1180  90 ( 90)  w:sem
 142 sensors                       900  4.730  1344/ 1980 250 (250)  w:sem
 144 commander                     389  2.256  2840/ 3548 140 (140)  w:sig
 156 mavlink_if0                  1319  8.005  1664/ 2380 100 (100)  w:sig
 157 mavlink_rcv_if0                 1  0.000   952/ 2100 175 (175)  w:sem
 166 mavlink_if1                   334  1.819  1608/ 2420 100 (100)  w:sig
 168 mavlink_rcv_if1                 1  0.000   952/ 2100 175 (175)  w:sem
 179 mavlink_if2                   259  1.382  1608/ 2388 100 (100)  READY
 180 mavlink_rcv_if2                 1  0.000   952/ 2100 175 (175)  w:sem
 201 uavcan                       1327  8.078  1568/ 2380 240 (240)  w:sem
 203 uavcan fw srv                 413  1.382  3040/ 5996 120 (120)  w:sem
 215 frsky_telemetry                 0  0.000   552/ 1020 200 (200)  w:sem
 319 commander_low_prio              0  0.000   728/ 2996  50 ( 50)  w:sem
 234 mavlink_if3                   931  4.439  1600/ 2388 100 (100)  w:sig
 236 mavlink_rcv_if3                 1  0.000   952/ 2100 175 (175)  w:sem
 245 sdlog2                         11  0.000  1696/ 3316 177 (177)  w:sig
 278 ekf2                         2145 14.119  5072/ 5780 250 (250)  w:sem
 281 mc_att_control                594  3.930  1280/ 1676 250 (250)  w:sem
 286 mc_pos_control                359  3.420   712/ 1876 250 (250)  w:sem
 301 navigator                       7  0.000   960/ 1572 105 (105)  w:sem
 325 top                           445  3.129  1240/ 1684 100 (100)  RUN

pavel-kirienko · 2017-03-17T11:54:08Z

I am going to profile this again.

pavel-kirienko · 2017-03-17T11:58:58Z

N.B.: This is the second instance when a serious issue was discovered by way of profiling. The first happened 2 years ago, when we removed profiling instrumentation calls that were used for stack checking, which dropped CPU usage down by 40%: #1660. I suspect the project could really benefit from adopting the practice of regular profiling.

LorenzMeier · 2017-03-17T12:39:07Z

The conclusion that stack checking was a performance issue is incorrect. We removed an early stage safety net because the system was stable - it was a design intent to trade CPU load for safety. We might very well re-add it on the F7.

In the case of Matrix lib it's a whole different story as it should not have promoted to double.

I agree on the benefit of regular profiling. Could you create a page in the PX4 dev guide for it?

LorenzMeier · 2017-03-17T12:40:35Z

And thanks for the optimization. We should bring Hans CI back online and compare task CPU loads. Any difference in regular load should reject or flag the PR.

pavel-kirienko · 2017-03-17T12:43:59Z

The conclusion that stack checking was a performance issue is incorrect.

It was an issue in the sense that the actual performance hit (40%) was way above the expected penalty (20%). Profiling helped to discover that.

I agree on the benefit of regular profiling. Could you create a page in the PX4 dev guide for it?

Yes, but I will have to push back on it now a bit.

pavel-kirienko · 2017-03-17T13:53:49Z

@davids5 The compiler seems to be underutilizing the FPU due to incorrect build configuration.

I noticed that the fixed firmware still spends about 2% of the run time in __ieee754_sqrtf. This is suspicious because square root can be computed virtually with a single instruction, which is VSQRT.F32. Here's an example taken from a different project that runs on ARM Cortex M4F too, where GCC build if configured correctly:

           unfiltered_phi_ = std::sqrt(square((Uq - Iq * Rs) / w) + square(Ud / w + Iq * Lq));
 8024a70:	edd4 5a08 	vldr	s11, [r4, #32]
 8024a74:	ed97 7a00 	vldr	s14, [r7]
 8024a78:	ed94 6a0a 	vldr	s12, [r4, #40]	; 0x28
 8024a7c:	ee85 0a27 	vdiv.f32	s0, s10, s15
 8024a80:	eea6 7ae5 	vfms.f32	s14, s13, s11
 8024a84:	eea6 0a86 	vfma.f32	s0, s13, s12
 8024a88:	eec7 6a27 	vdiv.f32	s13, s14, s15
 8024a8c:	ee20 0a00 	vmul.f32	s0, s0, s0
 8024a90:	eea6 0aa6 	vfma.f32	s0, s13, s13
 8024a94:	eef1 7ac0 	vsqrt.f32	s15, s0 ; <-- Computing square root here
 8024a98:	eef4 7a67 	vcmp.f32	s15, s15
 8024a9c:	eef1 fa10 	vmrs	APSR_nzcv, fpscr
 8024aa0:	f040 80b8 	bne.w	8024c14 <_ZN3foc8motor_id15FluxLinkageTask15onNextPWMPeriodERKN5Eigen6MatrixIfLi2ELi1ELi0ELi2ELi1EEERKNS3_IfLi3ELi1ELi0ELi3ELi1EEEf+0x414>
 8024aa4:	f504 5304 	add.w	r3, r4, #8448	; 0x2100
 8024aa8:	edc3 7a01 	vstr	s15, [r3, #4]

I have disassembled the FMUv4 binary and found out that the compiler emits exactly zero vsqrt.f32 instructions, resorting to this instead for square root computation:

080fc84c <__ieee754_sqrtf>:
 80fc84c:	ee10 3a10 	vmov	r3, s0
 80fc850:	f023 4200 	bic.w	r2, r3, #2147483648	; 0x80000000
 80fc854:	f1b2 4fff 	cmp.w	r2, #2139095040	; 0x7f800000
 80fc858:	b470      	push	{r4, r5, r6}
 80fc85a:	d230      	bcs.n	80fc8be <__ieee754_sqrtf+0x72>
 80fc85c:	b36a      	cbz	r2, 80fc8ba <__ieee754_sqrtf+0x6e>
 80fc85e:	2b00      	cmp	r3, #0
 80fc860:	db3d      	blt.n	80fc8de <__ieee754_sqrtf+0x92>
 80fc862:	f5b2 0f00 	cmp.w	r2, #8388608	; 0x800000
 80fc866:	ea4f 50e3 	mov.w	r0, r3, asr #23
 80fc86a:	d32c      	bcc.n	80fc8c6 <__ieee754_sqrtf+0x7a>
 80fc86c:	f1a0 027f 	sub.w	r2, r0, #127	; 0x7f
 80fc870:	f3c3 0316 	ubfx	r3, r3, #0, #23
 80fc874:	07d1      	lsls	r1, r2, #31
 80fc876:	f443 0300 	orr.w	r3, r3, #8388608	; 0x800000
 80fc87a:	bf48      	it	mi
 80fc87c:	005b      	lslmi	r3, r3, #1
 80fc87e:	2400      	movs	r4, #0
 80fc880:	1056      	asrs	r6, r2, #1
 80fc882:	005b      	lsls	r3, r3, #1
 80fc884:	4625      	mov	r5, r4
 80fc886:	2119      	movs	r1, #25
 80fc888:	f04f 7280 	mov.w	r2, #16777216	; 0x1000000
 80fc88c:	18a8      	adds	r0, r5, r2
 80fc88e:	4298      	cmp	r0, r3
 80fc890:	dc02      	bgt.n	80fc898 <__ieee754_sqrtf+0x4c>
 80fc892:	1a1b      	subs	r3, r3, r0
 80fc894:	1885      	adds	r5, r0, r2
 80fc896:	4414      	add	r4, r2
 80fc898:	3901      	subs	r1, #1
 80fc89a:	ea4f 0343 	mov.w	r3, r3, lsl #1
 80fc89e:	ea4f 0252 	mov.w	r2, r2, lsr #1
 80fc8a2:	d1f3      	bne.n	80fc88c <__ieee754_sqrtf+0x40>
 80fc8a4:	b113      	cbz	r3, 80fc8ac <__ieee754_sqrtf+0x60>
 80fc8a6:	f004 0301 	and.w	r3, r4, #1
 80fc8aa:	441c      	add	r4, r3
 80fc8ac:	1064      	asrs	r4, r4, #1
 80fc8ae:	f104 547c 	add.w	r4, r4, #1056964608	; 0x3f000000
 80fc8b2:	eb04 53c6 	add.w	r3, r4, r6, lsl #23
 80fc8b6:	ee00 3a10 	vmov	s0, r3
 80fc8ba:	bc70      	pop	{r4, r5, r6}
 80fc8bc:	4770      	bx	lr
 80fc8be:	eea0 0a00 	vfma.f32	s0, s0, s0
 80fc8c2:	bc70      	pop	{r4, r5, r6}
 80fc8c4:	4770      	bx	lr
 80fc8c6:	f413 0200 	ands.w	r2, r3, #8388608	; 0x800000
 80fc8ca:	d001      	beq.n	80fc8d0 <__ieee754_sqrtf+0x84>
 80fc8cc:	e00c      	b.n	80fc8e8 <__ieee754_sqrtf+0x9c>
 80fc8ce:	460a      	mov	r2, r1
 80fc8d0:	005b      	lsls	r3, r3, #1
 80fc8d2:	021c      	lsls	r4, r3, #8
 80fc8d4:	f102 0101 	add.w	r1, r2, #1
 80fc8d8:	d5f9      	bpl.n	80fc8ce <__ieee754_sqrtf+0x82>
 80fc8da:	1a80      	subs	r0, r0, r2
 80fc8dc:	e7c6      	b.n	80fc86c <__ieee754_sqrtf+0x20>
 80fc8de:	ee70 7a40 	vsub.f32	s15, s0, s0
 80fc8e2:	ee87 0aa7 	vdiv.f32	s0, s15, s15
 80fc8e6:	e7e8      	b.n	80fc8ba <__ieee754_sqrtf+0x6e>
 80fc8e8:	f04f 32ff 	mov.w	r2, #4294967295
 80fc8ec:	e7f5      	b.n	80fc8da <__ieee754_sqrtf+0x8e>
 80fc8ee:	bf00      	nop

Should I move this into a separate issue?

pavel-kirienko · 2017-03-17T14:32:12Z

I will sum up my findings here, apart from what has been already said above.

Poll is very slow

About half of the run time of the UAVCAN driver is spent in the poll() system call. This issue doesn't affect many other modules using poll as hard because they typically run at lower update frequencies. Besides UAVCAN, optimization of the poll function will also improve some other components, especially sensors, attitude control, position control, and all other modules running at high update rates.

The poll system call must be optimized, I will report this as a separate issue.

Performance counters are slow

Many performance counters in the system do not worth the performance impact they introduce. I removed them from the UAVCAN driver, which lowered the CPU usage by 0.7%. I would advice to either review their implementation, or just remove them from all high frequency modules. My estimate is that this change alone would drop the CPU usage by up to 5%.

MAVLink

About 1% of the total run time is spent in this check in MavlinkStream::update(const hrt_abstime) which is mostly useless:

	if (dt > (interval - (_mavlink->get_main_loop_delay() / 10) * 4)) {

The performance impact can be seen here on the right, it is significant:

I wouldn't venture to fix it myself, so I propose to register this as a separate issue.

FPU is misbehaving

See the previous post. I propose to move this issue into a separate ticket, too.

UAVCAN stack

The UAVCAN stack itself does not have any detectable performance issues.

pavel-kirienko · 2017-03-17T14:33:14Z

New report; note that it was collected before perfcounters were removed from the UAVCAN module: pmpn-flamegraph.tar.gz

jgoppert · 2017-03-17T15:42:29Z

Just talked with @davids5 about this, @pavel-kirienko Can you update matrix in this PR so we can check it passes and we can make sure the build size looks good.

pavel-kirienko · 2017-03-17T15:54:09Z

@jgoppert it is already updated here

dagar · 2017-03-17T16:04:18Z

@pavel-kirienko we could look at enabling this clang-tidy check in the builds.
https://clang.llvm.org/extra/clang-tidy/checks/performance-type-promotion-in-math-fn.html

pavel-kirienko · 2017-03-17T16:06:33Z

@dagar That would be useful.

jgoppert · 2017-03-17T16:17:49Z

I don't see a significant change in build size, so will merge the changes to matrix.

build size of master

Building sizes
   text	   data	    bss	    dec	    hex	filename
 891596	   5228	  15208	 912032	  deaa0	build_aerofc-v1_default/src/firmware/nuttx/firmware_nuttx
1317136	   5912	  20412	1343460	 147fe4	build_auav-x21_default/src/firmware/nuttx/firmware_nuttx
 787668	   3816	   9116	 800600	  c3758	build_crazyflie_default/src/firmware/nuttx/firmware_nuttx
1216616	   5844	  20636	1243096	 12f7d8	build_mindpx-v2_default/src/firmware/nuttx/firmware_nuttx
 997252	   4952	  10084	1012288	  f7240	build_px4fmu-v1_default/src/firmware/nuttx/firmware_nuttx
 967696	   5884	  18340	 991920	  f22b0	build_px4fmu-v2_default/src/firmware/nuttx/firmware_nuttx
 907220	   5884	  18340	 931444	  e3674	build_px4fmu-v2_lpe/src/firmware/nuttx/firmware_nuttx
1442236	   6140	  20724	1469100	 166aac	build_px4fmu-v3_default/src/firmware/nuttx/firmware_nuttx
1350776	   5944	  23820	1380540	 1510bc	build_px4fmu-v4_default/src/firmware/nuttx/firmware_nuttx
 774912	   5332	  15568	 795812	  c24a4	build_tap-v1_default/src/firmware/nuttx/firmware_nuttx

build size of this branch:

Building sizes
   text	   data	    bss	    dec	    hex	filename
 891356	   5228	  15208	 911792	  de9b0	build_aerofc-v1_default/src/firmware/nuttx/firmware_nuttx
1316616	   5912	  20412	1342940	 147ddc	build_auav-x21_default/src/firmware/nuttx/firmware_nuttx
 787316	   3816	   9116	 800248	  c35f8	build_crazyflie_default/src/firmware/nuttx/firmware_nuttx
1216088	   5844	  20636	1242568	 12f5c8	build_mindpx-v2_default/src/firmware/nuttx/firmware_nuttx
 996756	   4952	  10084	1011792	  f7050	build_px4fmu-v1_default/src/firmware/nuttx/firmware_nuttx
 968832	   5884	  18340	 993056	  f2720	build_px4fmu-v2_default/src/firmware/nuttx/firmware_nuttx
 907276	   5884	  18340	 931500	  e36ac	build_px4fmu-v2_lpe/src/firmware/nuttx/firmware_nuttx
1441580	   6140	  20724	1468444	 16681c	build_px4fmu-v3_default/src/firmware/nuttx/firmware_nuttx
1350128	   5944	  23820	1379892	 150e34	build_px4fmu-v4_default/src/firmware/nuttx/firmware_nuttx
 774896	   5332	  15568	 795796	  c2494	build_tap-v1_default/src/firmware/nuttx/firmware_nuttx

davids5 · 2017-03-17T19:46:53Z

@jgoppert - Thank you - at least it went in the right directions.

pavel-kirienko · 2017-03-18T10:11:49Z

This is ready for merge.

This reverts commit 21e04c9.

davids5 · 2017-03-18T14:00:43Z

@pavel-kirienko

This is ready for merge.

Apparently not yet.

Please have a look at the submode commit. See Revert "UAVCAN performance audit" #6846

This reverts commit 21e04c9.

* UAVCAN ESC output: removing ESC output channels from published message that are always zero. This allows the UAVCAN stack to always transfer only the minimum number of output values, avoiding redundant zeroes and the associated increase in bus load and CPU time * Added a separate mixer file for CAN quadrotor * Sampling profiler improvements * PMSP: Output more endpoints * Matrix update * libc usage workaround * Removed UAVCAN perfcounters * Matrix submodule update

jgoppert · 2017-03-18T19:18:02Z

This probably happened because I merged the PR on matrix and deleted the old branch and he just needs to update the matrix submodule again.

pavel-kirienko · 2017-03-18T21:26:33Z

Yes, it's fixed already.

pavel-kirienko added 2 commits March 16, 2017 00:43

UAVCAN ESC output: removing ESC output channels from published messag…

0ecd7b2

…e that are always zero. This allows the UAVCAN stack to always transfer only the minimum number of output values, avoiding redundant zeroes and the associated increase in bus load and CPU time

Added a separate mixer file for CAN quadrotor

3d601ff

pavel-kirienko self-assigned this Mar 15, 2017

pavel-kirienko added the Admin: Enhancement (improvement) 💡 label Mar 15, 2017

Sampling profiler improvements

1de5a90

PMSP: Output more endpoints

4260edd

pavel-kirienko changed the title ~~UAVCAN performance audit~~ General performance audit Mar 16, 2017

LorenzMeier changed the title ~~General performance audit~~ UAVCAN performance audit Mar 16, 2017

pavel-kirienko mentioned this pull request Mar 17, 2017

Fixing misuse of the standard math library (performance audit) PX4/PX4-Matrix#41

Merged

pavel-kirienko added 2 commits March 17, 2017 14:45

Matrix update

2192870

libc usage workaround

a44c2fc

Removed UAVCAN perfcounters

8585f0d

pavel-kirienko requested a review from LorenzMeier March 17, 2017 14:36

This was referenced Mar 17, 2017

Compiler is not emitting vsqrt.f32 on ARM Cortex M4F targets #6838

Closed

The NuttX poll system call is slow #6840

Closed

LorenzMeier approved these changes Mar 18, 2017

View reviewed changes

davids5 merged commit 21e04c9 into master Mar 18, 2017

davids5 pushed a commit that referenced this pull request Mar 18, 2017

Revert "UAVCAN performance audit (#6829)"

2b6dad6

This reverts commit 21e04c9.

davids5 mentioned this pull request Mar 18, 2017

Revert "UAVCAN performance audit" #6846

Merged

davids5 pushed a commit that referenced this pull request Mar 18, 2017

Revert "UAVCAN performance audit (#6829)" (#6846)

c20b85e

This reverts commit 21e04c9.

bkueng mentioned this pull request May 18, 2017

log CPU and RAM per process #7259

Closed

dagar mentioned this pull request Aug 20, 2017

mavlink main return main loop delay proper size #7829

Merged

pavel-kirienko mentioned this pull request Jul 31, 2018

CAN cleanup ArduPilot/ardupilot#8987

Merged

pavel-kirienko mentioned this pull request May 4, 2019

Many fixes to CMakeLists and dsdl compiler OpenCyphal-Garage/libcyphal#212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UAVCAN performance audit #6829

UAVCAN performance audit #6829

pavel-kirienko commented Mar 15, 2017

pavel-kirienko commented Mar 16, 2017

davids5 commented Mar 16, 2017 •

edited

jgoppert commented Mar 16, 2017

AndreasAntener commented Mar 16, 2017

eyeam3 commented Mar 16, 2017

LorenzMeier commented Mar 16, 2017

pavel-kirienko commented Mar 16, 2017

pavel-kirienko commented Mar 16, 2017

LorenzMeier commented Mar 16, 2017

pavel-kirienko commented Mar 16, 2017

pavel-kirienko commented Mar 16, 2017

LorenzMeier commented Mar 16, 2017 •

edited

pavel-kirienko commented Mar 16, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

LorenzMeier commented Mar 17, 2017

LorenzMeier commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

jgoppert commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

dagar commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

jgoppert commented Mar 17, 2017

davids5 commented Mar 17, 2017

pavel-kirienko commented Mar 18, 2017

davids5 commented Mar 18, 2017 •

edited

jgoppert commented Mar 18, 2017

pavel-kirienko commented Mar 18, 2017

UAVCAN performance audit #6829

UAVCAN performance audit #6829

Conversation

pavel-kirienko commented Mar 15, 2017

pavel-kirienko commented Mar 16, 2017

davids5 commented Mar 16, 2017 • edited

jgoppert commented Mar 16, 2017

AndreasAntener commented Mar 16, 2017

eyeam3 commented Mar 16, 2017

LorenzMeier commented Mar 16, 2017

pavel-kirienko commented Mar 16, 2017

pavel-kirienko commented Mar 16, 2017

LorenzMeier commented Mar 16, 2017

pavel-kirienko commented Mar 16, 2017

pavel-kirienko commented Mar 16, 2017

LorenzMeier commented Mar 16, 2017 • edited

pavel-kirienko commented Mar 16, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

LorenzMeier commented Mar 17, 2017

LorenzMeier commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

Poll is very slow

Performance counters are slow

MAVLink

FPU is misbehaving

UAVCAN stack

pavel-kirienko commented Mar 17, 2017

jgoppert commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

dagar commented Mar 17, 2017

pavel-kirienko commented Mar 17, 2017

jgoppert commented Mar 17, 2017

davids5 commented Mar 17, 2017

pavel-kirienko commented Mar 18, 2017

davids5 commented Mar 18, 2017 • edited

jgoppert commented Mar 18, 2017

pavel-kirienko commented Mar 18, 2017

davids5 commented Mar 16, 2017 •

edited

LorenzMeier commented Mar 16, 2017 •

edited

davids5 commented Mar 18, 2017 •

edited