-
Notifications
You must be signed in to change notification settings - Fork 508
EKF: replacement of autocode with sympy generated output #868
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool. I am super happy that this is finally happening. I appreciate that you paid attention to const correctness and used const references to grab the state's values.
I would like to raise the discussion about the size of the observation jacobian container. I saw that it is not done consistently throughout this PR. At some points it is a container of size 24 with lot of entries being 0, sometimes it is a container of size << 24 with only containing non-zero elements. I am not sure of to trade-off efficiency and readability here. Maybe @jkflying can help us here.
Any way it would be great if we could decide on a single approach. As it will help us to simplify the sparse computation of KHP in a single function.
Additionally, would it be possible to swap the autogenerated code in multiple steps/PRs? If think this would make it much simpler to review, and rebase on any change in master.
Write comparison test programs for the rest of the observation methods.
SITL and/or testing that exercises all observation methods. I can do the MC and FW which will exercise most, but am unsure how to exercise the external vision/VIO fusion methods.
I am happy to help here. Can you be a bit more specific about the nature of these comparison test programs? Do you want to compare state outputs similar to the "change output predictor", or write unit tests or general SITL tests?
EKF/airspeed_fusion.cpp
Outdated
|
||
if (update_wind_only) { | ||
// If we are getting aiding from other sources, then don't allow the airspeed measurements to affect the non-windspeed states | ||
for (unsigned row = 0; row <= 21; row++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Kfusion
is a matrix::Vector
it will be already initialized with zeros. No need to set it to zero again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
|
||
for (uint8_t i = 0; i < _k_num_states; i++) { H_TAS[i] = 0.0f; } | ||
// Intermediate variables | ||
const float HK0 = vn - vwn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming HK* for the intermediate variables could be a bit confusing as there are also variables with the name KHP or KH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it makes it more confusing. They are clearly intermediate variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is indeed a comment saying that it is an intermediate variable. Nevertheless if someone sees HK0, he might think it is the 0th element of the result of the operation between two vector/matrix quantities H and K. And that is also the way I perceived it in the first moment.
The former t0, t1, ...
notation - at least for me - was clearer on the first glance. I understand that this is probably not so safe, when copying generated code output. Maybe there is a good option in between, something like t_hk_0
or tmp_hk_0
.
EKF/covariance.cpp
Outdated
SPP[8] = 2*q0*q3 + 2*q1*q2; | ||
SPP[9] = 2*q0*q2 + 2*q1*q3; | ||
SPP[10] = SF[16]; | ||
const float PS0 = powf(q1, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this powf be replaced with sq?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler should optimise out powf(x,2) and doing a find and replace on it has risks.
SPP[9] = 2*q0*q2 + 2*q1*q3; | ||
SPP[10] = SF[16]; | ||
const float PS0 = powf(q1, 2); | ||
const float PS1 = 0.25F*daxVar; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most situation we used a small f for specifing the type of the literal. Is there a specific reason why a capital F is used here? Would be nice if we have consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but thats what python outputs.
const float PS45 = P(0,13) - P(1,13)*PS11 + P(10,13)*PS6 + P(11,13)*PS7 + P(12,13)*PS9 - P(2,13)*PS12 - P(3,13)*PS13; | ||
const float PS46 = PS37 + PS38; | ||
const float PS47 = P(0,15) - P(1,15)*PS11 + P(10,15)*PS6 + P(11,15)*PS7 + P(12,15)*PS9 - P(2,15)*PS12 - P(3,15)*PS13; | ||
const float PS48 = 2*PS47; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make a difference using 2.f
instead of 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference testing says no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I guess the test tests the numerical output. What about computational speed? I guess the compiler is forcing a float multiplication.
const float HK29 = -HK12*P(1,23) + HK12*P(1,5) - HK13*P(1,6) + HK14*P(1,1) + HK15*P(0,1) - HK16*P(1,2) + HK17*P(1,3) - HK7*P(1,22) + HK7*P(1,4); | ||
const float HK30 = -HK12*P(2,23) + HK12*P(2,5) - HK13*P(2,6) + HK14*P(1,2) + HK15*P(0,2) - HK16*P(2,2) + HK17*P(2,3) - HK7*P(2,22) + HK7*P(2,4); | ||
const float HK31 = -HK12*P(3,23) + HK12*P(3,5) - HK13*P(3,6) + HK14*P(1,3) + HK15*P(0,3) - HK16*P(2,3) + HK17*P(3,3) - HK7*P(3,22) + HK7*P(3,4); | ||
float HK32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to declare the variable already here. Can be declared const on initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that is possible. It needs to be declared in the larger scope and only updated if the inverse is numerically OK.
const float HK29 = 4/powf(HK17, 2); | ||
const float HK30 = -HK16*P(0,2) - HK24*P(1,2) - HK25*P(2,2) + HK26*P(2,3); | ||
const float HK31 = -HK16*P(0,3) - HK24*P(1,3) - HK25*P(2,3) + HK26*P(3,3); | ||
float HK32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to declare the variable already here. Can be declared const on initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that is possible. It needs to be declared in the larger scope and only updated if the inverse is numerically OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we directly return inside the if statement, we could declare the variable afterwards.
float HK32;
if(bla) {
HK32 = ...;
status_ = A;
} else {
// Do stuff not related to HK32
return;
}
could be
if(!bla) {
// Do stuff not related to HK32
return;
}
const float HK32 = ...;
status_ = A;
} | ||
} else { | ||
for (uint8_t i = 0; i < 16; i++) { | ||
Kfusion(i) = 0.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to set 0 to 0. Kfusion is initialized to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
Vector24f gain; | ||
// calculate observation Jocobians and Kalman gains | ||
if (obs_index == 0) { | ||
// Observation Jacobians - axis 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, I really like that we can now compute K and H after computing the innovation variance of both axis. This is how it has been done everywhere else.
EKF/sideslip_fusion.cpp
Outdated
} | ||
} else { | ||
for (int i = 16; i <= 21; i++) { | ||
Kfusion(i) = 0.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to set 0 to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
The flash savings here are looking great. What this likely means is that we'll be able to comfortably afford to increase the optimization level from For reference, the entire estimator (PX4 ekf2 frontend + ecl/EKF backend) is about 140 KB of flash on px4_fmu-v5. |
dba97cc
to
a93349c
Compare
I've added test programs to compare the before and after autocode and rebased. |
@kamilritz I did some simple test programs for the covariance prediction and observation jacobian and Kalman gains that compare them before and after. Differences are within what is expected for single precision calculations. |
|
This is a pretty big PR, and I'm not able to pull out the patterns in the time I have. However, it sounds like the 'gather-scatter' pattern would work well here, allowing you to work with small dense matrices rather than large sparse matrices. How this works is you create a permutation matrix which selects which rows/cols you keep (and can reorder them too, if you want). So rather than having a large matrix Q (dimensions mxm), we have a smaller D (nxn) with the non-zero values, and permutation P (mxn) with a single 1 per column, such that PDP' = Q. We would have to add support for permutation matrix multiplications to Matrix, which would enable this kind of design work. To be efficient, and because we don't have the same compiler type optimization |
@jkflying That is a cool concept. I am not sure if we need to go that far. The observation jacobian H is almost the only consistently sparse vector/matrix quantity. I think it would be enough to have a sparse KHP computation, that takes benefit of knowing which are the non-zero element of H. The question is should we store H as a |
Yes, my usage of this pattern was in a very different situation, with matrices of 5000+ x 5000+, but with lots of sparsity, that could be reduced down to a ~100x100 dense matrix with a permutation matrix. Particularly on larger processors (and GPUs) there is a lot of efficiency gained (~20x) by having your computations not being spread over a matrix, but by operating on adjacent elements. I'm not sure how much benefit it will have here, considering the small sizes of our vectors and that we don't have lots of tiered caches for memory (ie. L1/L2/L3/RAM/swap). I don't see the (mathematical) need to initialize the other elements of the matrix to 0, since we can ignore them if we know what indices we need. The initialization is just because it is in a Vector, right? You could also make a 'reverse index' which maps indices in the sparse matrix to the dense matrix. Then the autogenerated code could use the 24-based indexing, which gets translated to the dense indexing. But this might be going too far, and with the small amount of computation we're doing I don't think would be worth the benefit. |
I've added a script file to run the test programs. |
Signed-off-by: RomanBapst <bapstroman@gmail.com>
Signed-off-by: RomanBapst <bapstroman@gmail.com>
Signed-off-by: RomanBapst <bapstroman@gmail.com>
Signed-off-by: RomanBapst <bapstroman@gmail.com>
Signed-off-by: RomanBapst <bapstroman@gmail.com>
Signed-off-by: RomanBapst <bapstroman@gmail.com>
Also update outtput formatting
Add shell script to run tests. Update tests with Pass/Fail indicator.
757de17
to
7d74568
Compare
With respect to the use of the powf(x, y) function, although the compiler appears to be optimising it to x*x, it leaves the code at the discretion of the compiler which is not ideal. At yesterdays estimation call we discussed the options. One suggestion was to define an inlined ecl::powfx,y)( function to be used for integer values of y. I have implemented that in 0a4f5cc The other options is to modify the python script to do a find and replace for all powf(x, 2) instances with sq(x). The total number of instances of powf(x,y) is not large, so an editor guided manual replacement is possible, but not ideal. @dagar what is your view on relying on the compiler optinisations for this function. Can compilers be relied on to interpret powf(x,2) as x*x ? i don't have access to the ISO/IEC 9899:2011 document. |
GCC at least has been able to optimize Also maybe I missed it, but the sed 's/\(ecl::\)\{0,1\}powf/ecl::powf/' generated/*.cpp to make it happen with all files automatically. |
Is the concern performance or correctness? |
@dagar The number of usages is not large so the concern is firstly correctness and secondly performance. Edit: What i don't want happening is a future change in compiler producing a slightly different numerical result. |
Also improve protection against badly conditioned calculation
@jkflying There is one place in airspeed fusion where the code generator used powf(x, -1.0F/2.0F), but that can be reworke by hand. As you point out, the ecl::powf will generate a compiler error for instances where non integer exponents are used so your proposal for a global find and replace should work OK and we can handle the odd exception manually |
The feedback to date is that the overall approach looks OK, so I will now break this down into a number of smaller PR's starting with the covariance prediction and a separate PR for each fusion type. |
@priseborough Regarding the observation jacobian H, it would be great if all of them could be of type |
Doesn't work in all instances because sometimes we only need a few entries and optimise accordingly. |
The |
@kamilritz There are sparse matrix techniques that could be applied, but we would need to provide an occupancy mask. This would end up trading memory use for code commonality. Lets have the discussion when the PR for the first observation method goes in. If you are able to look at the options for doing this efficiently, I will in the meantime prepare the covariance prediction PR so we don't have to make a decision right now. |
It's not an unreasonable concern, although in this particular case I really don't think it's going to be an issue. https://godbolt.org/z/7d1q87 - With any optimization at all (-O1) you can see In general for this kind of thing it's another reason why I want to make sure we're continuously updating PX4/Firmware with each PX4/ecl change. On the PX4/Firmware side we can fairly easily monitor changes in execution time, memory usage, and flash usage over time across a range of boards. We lose a lot of detail when each PX4/ecl bump is large. |
This is a work in progress. Derivations and conversions in c++ code are complete, however test benches and SITL/flight testing require further work.
This is built on proof of concept work by @RomanBapst - see https://github.com/RomanBapst/ecl/tree/ekf_python
Test comparison programs have been written that compare the output from the sympy generated code to the legacy code in master. The largest fraction difference defined as (new_value - old_value)/old_value is found and printed. For example the covariance prediction gives:
max_diff_fraction = 2.074566e-06 , old = -1.490614e+17 , new = -1.490610e+17 , location index = 4,7
Comparing the observation Jacobians for the various yaw fusion methods gives:
321 option A max_diff_fraction = 1.322706e-07 , old = 9.012533e-01 , new = 9.012532e-01 , location index = 3
321 option B max_diff_fraction = 3.555763e-07 , old = 1.005770e+00 , new = 1.005769e+00 , location index = 2
312 option A max_diff_fraction = 1.696653e-07 , old = -2.810457e+00 , new = -2.810457e+00 , location index = 2
312 option B max_diff_fraction = 5.844245e-07 , old = -4.589488e-01 , new = -4.589485e-01 , location index = 1
Differences have so far have been the last significant figure.
Testing
Comparison test programs for the covariance prediction and observation methods. See https://github.com/PX4/ecl/blob/pr-ekfSymPyDerivationConversion/EKF/python/ekf_derivation/run_test_programs.sh
SITL testing that exercises all observation methods. See the following:
SITL gazebo test https://review.px4.io/plot_app?log=a08ff241-6840-456b-b509-491235c4267d
SITL gazebo_iris_opt_flow test https://logs.px4.io/plot_app?log=9c29150e-ab23-4245-869c-bb2a978cebc6
SITL gazebo plane test https://logs.px4.io/plot_app?log=a568008f-92f8-48c8-93ed-694b7a94dd19
TODO: SITL testing for MC body frame specific force fusion and GPS yaw fusion.