Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/easing library #947

Closed
wants to merge 2 commits into from
Closed

Refactor/easing library #947

wants to merge 2 commits into from

Conversation

mlampert
Copy link
Contributor

I ran into some performance issues on an 8MHz STM32F3 (Cortex-M4) when playing with the easing library. I refactored the library a bit and think the performance improvements are worth the somewhat lesser readability.

Some performance measurements on an STM32F3 Cortex-M4 running at
8MHz showed some performance issues of pwm_test.c. Making sure
all constants are single precision float values allows the compiler
to use the hardware FPU for some of the calculations.

A quick and dirty measurment of the easing function execution time
shows the execution time improvements for some of the functions:

          |       original       | eliminate double promo |
----------+----------+-----------+-------------+----------+
          |   min    |     max   |      min    |     max  |
----------+----------+-----------+-------------+----------+
sine      |   330    |     480   |      330    |     480  |
bounce    |    50    |     130   |       11    |      15  |
circular  |   340    |     370   |      220    |     250  |
quadratic |   160    |     250   |       14    |      75  |
cubic     |  1230    |    1320   |     1120    |    1240  |
quartic   |  1230    |    1320   |     1120    |    1240  |
quintic   |  1230    |    1320   |     1120    |    1240  |
----------+----------+-----------+-------------+----------+

All times in micro seconds.
Looking at the execution times of the different easing functions
the long haul are the ones using pow(). Unfortunately there is no
float equivalent. Manually unrolling the ones with a fixed exponent
is tedious but it seems worthwhile.

Some rough execution time measurments on an STM32F3 Cortex-M4 with
hardware FPU (again using pwm_test.c):

          |  original   | float expr  |    no pow
----------+------+------+------+------+------+------
          |  min |  max |  min |  max |  min |  max
----------+------+------+------+------+------+------
sine      |  330 |  480 |  330 |  480 |  330 |  480
bounce    |   50 |  130 |   11 |   15 |   11 |   15
circular  |  340 |  370 |  220 |  250 |  150 |  190
quadratic |  160 |  250 |   14 |   75 |    9 |   12
cubic     | 1230 | 1320 | 1120 | 1240 |   11 |   11
quartic   | 1230 | 1320 | 1120 | 1240 |   11 |   11
quintic   | 1230 | 1320 | 1120 | 1240 |   11 |   11
----------+------+------+------+------+------+------

All times are in micro seconds.
@mlaz
Copy link
Contributor

mlaz commented Mar 22, 2018

Looks good to me. Have you tested these for mathematical correctness?

@mlaz mlaz self-assigned this Mar 22, 2018
@mlampert
Copy link
Contributor Author

The math is identical to the original implementation. Is there anything specific you're looking for?

@@ -43,12 +43,12 @@ static inline float exponential_out(float step, float max_steps, float max_val)
{
return (step == max_steps) ?
max_val :
max_val - pow(max_val, 1.0 - (float)step/max_steps);
max_val - pow(max_val, 1.0f - (float)step/max_steps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are moving everything to float you could as well get rid of those (float) casts. This one would be the same as max_val - pow(max_val, 1.0f - step/max_steps). This happens across much of the refactoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the original implementation had that cast there, it was promoted to a double anyway. I presume that early on step and max_step were integer values and it was a remnant of refactoring. I can certainly clean those up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentations where inherited from a previous implementation I had while validating this mathematically.


return max_val + max_val / 2 * pow(ratio -2, 5);
ratio -= 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird identation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the weird identation comes from the original source code having tab characters in some places and github uses tabstop=8. I didn't clean up the tabs because it introduces a lot of noise into the code review but can certainly do that.


return max_val / 2 * (pow(ratio - 2, 3) + 2);
ratio -= 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird identation

}

/* Cubic */
static inline float cubic_in(float step, float max_steps, float max_val)
{
float ratio = step / max_steps;

return max_val * pow(ratio, 3);
return max_val * (ratio * ratio * ratio);
Copy link
Member

@utzig utzig Mar 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you have a pow or sqrt on the code, your variables are automatically promoted to double and the operation is done at a higher precision, then what is done now by simply multiplying floats. I am not sure how much this would impact on precision but I assume this is why @mlaz might have asked the previous question about the accuracy.

Copy link
Contributor Author

@mlampert mlampert Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood - but I cannot answer that question without a reference. The interface does not suggest that there is double precision involved, it's all based on float so as a user that's as much as I can expect. In order to answer the question if this refactoring falsifies the result we would need a definition as to what level of deviation is acceptable.

In addition none of the computations uses accumulation which is where float's really start bleeding. All values are monotonically increasing or decreasing and the result is converted to a float. So if the initial value and the result fits into a float then every intermediary value will also fit.

Copy link
Contributor

@mlaz mlaz Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that at some point, not using the the functions will in fact change the results, we just need to make sure wont break the functions. The method I used to validate these mathematically is to look at the function here and here, then implementing it on wolfram (or some other equivalent tool) run the lib's code in order to produce some values an compare them to what I get on wolfram.

@mlampert
Copy link
Contributor Author

A general comment on the "precision" question.

I love the library, use it all the time ever since the PR was posted the first time. Yet, as is it is IMHO not very practical because it's resource requirements are too high. The reason I started looking into this was because the output of console started to have pauses (in the middle of printing a word) when I used the library for a single LED.

Having this library is insanely cool but I cannot afford 0.5-1ms execution times in a timer interrupt callback.

Maybe I'm missing a use case here - is there one that requires double precision?

@mlaz
Copy link
Contributor

mlaz commented Apr 2, 2018

@mlampert as you can see this lib is far from being optimized.
My approach to this is to have an implementation which is accurate at first.
Then let's talk about performance, take your LED case for instance: right now you may implement an easing_i_func_t which instead of calling one of the implemented functions may just use a lookup table.
This is one approach, the other way we may work this out is to implement an alternate, architecture level optimized math lib (which is not something one is able to do overnight, see an exaple of this in 1.) and use it on our functions.

Some references here:

  1. https://web.archive.org/web/20161219143458/http://http.developer.nvidia.com:80/Cg/index_stdlib.html
  2. http://people.math.sfu.ca/~cbm/aands/abramowitz_and_stegun.pdf

@sdalu
Copy link
Contributor

sdalu commented Apr 2, 2018

A bit of topic, but I'm wondering if it would be possible to combine a lookup table (generated by the easing function) with access to the easydma feature of the nrf52840? (perhaps ST also has the same kind of feature?)

@mlampert
Copy link
Contributor Author

mlampert commented Apr 3, 2018

Can't claim I understand the rationale but it sounds like we don't want to move ahead with this change. I close the PR and we can create a new one once we know where we're going with this.

@mlampert mlampert closed this Apr 3, 2018
@jacobrosenthal
Copy link
Member

FWIW Id rather have performance over accuracy. Easing is generally used for animation and is not mission critical. But if we want both options, separate functions or a separate dependency seems viable.

@mlaz
Copy link
Contributor

mlaz commented Apr 3, 2018

@jacob i think we may agree here that in this particular case accuracy is having growth ratios near to what we have now. I never opposed to having these changes merged. We just need to check if by any means the behaviour is consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants