More efficient and unlimited implementation of fluid_ct2hz() #569

derselbst · 2019-10-18T19:37:36Z

While working on an unlimited implementation of fluid_ct2hz() as discussed in #568, I'm now providing a less branchy and therefore more instruction-cache-friendly version, ~~which has no upper limit (like the old one had)~~. Also, it significantly reduces the number of floating-point comparisons, which is esp. beneficial for non-FPU systems.

Using Intel's VTune Amplifier gives the following results.

Function / Call Stack	CPU Time	Instructions Retired	Microarchitecture Usage	Comment
fluid_ct2hz_real	0.072s	312,800,000	56.0%	old, legacy version
fluid_ct2hz_real	0.062s	217,600,000	65.4%	version of this PR without `ldexp()`
fluid_ct2hz_real	0.032s	163,200,000	52.0%	version of this PR with `ldexp()`
ldexp	0.067s	268,600,000	70.4%	`scalbn()` and `ldexp()` turn out to be less efficient than arithmetic operators

Just added a test_ct2hz unit test to master to make sure this change doesn't cause trouble.

Are lookup tables even contemporary? Yes, they are!

Function / Call Stack	CPU Time	Instructions Retired	Microarchitecture Usage	Comment
fluid_ct2hz_real	0.048s	204,000,000	52.1%	no lookup table, only pow()
pow	0.658s	2,692,800,000	45.8%

This reverts commit ef9e081. "Although ldexp, scalbn and scalbln are specified to perform the operation efficiently, on many implementations they are less efficient than multiplication or division by a power of two using arithmetic operators." Which was proven by using VTune.

jjceresa · 2019-10-18T23:15:53Z

Thanks for this implementation. Just some details:

if(FLUID_UNLIKELY(res.quot > (int)(sizeof(mult)*8)))
should be
if(FLUID_UNLIKELY(res.quot >= (int)(sizeof(mult)8)))

I wonder if removing else branch will enhance performance ?
// else
{
mult = 1u << (unsigned int)res.quot;
return mult * fluid_ct2hz_tab[(res.rem)];
}

derselbst · 2019-10-19T08:40:59Z

Oh, you're right, thanks.

And no, removing the else keyword does not affect performance, as it results in the same assembly being generated.

jjceresa · 2019-10-19T14:26:46Z

Something now sound wrong.
When listening a midifile played on new-ct2hz branch, high frequency are much more attenuated than playing the on master branch !.

jjceresa · 2019-10-19T14:50:12Z

Something now sound wrong....

Sorry, please ignore my previous comment. All sound fine. (I had compiled a wrong gen_conv.c)

mawe42

Looks good! I've tried to measure the performance gain with profiling on my target platform (ARM Coretex A7 - 1GHz) but could not notice any speed improvements with the built-in profiling feature. The range of "max estimated voices" between different profiling runs spans 146 - 150, both on master and this branch. Would be interesting to see what this does to less capable platforms.

I'm not so sure about the fallback to on-the-fly calculation though. From a sound perspective, I don't think we gain anything from calculating those extremely high frequencies. But they could still happen if a Soundfont designer adds a huge amount into the (mod|vib)-lfo-to-frequency generator. Now I know that doing so would not make any sense, but neither does calculating those frequencies to exact values using lots of CPU cycles. So I would vote to choose a high cutoff and return a fixed high frequency instead.

mawe42 · 2019-10-20T13:20:18Z

The range of "max estimated voices" between different profiling runs spans 146 - 150, both on master and this branch.

Sorry, need to revise this (I worked in the wrong directory and accidentally tested an older master checkout in both cases). Now with the proper branches, these are the real profiling outputs:

estimated maxVoices

master: 147 - 150
new-ct2hz: 143 - 145

Below are some more details. I chose the slowest run on master and the fastest run on new-ct2hz. Not sure if there are any other performance improvements on current master compared to new-ct2hz, but at least on my platform, profiling suggests that it is a slight performance loss rather than an improvement.

Maybe relevant: I'm compiling with -ffast-math -O3 and -Denable-floats=1.

master

# LD_LIBRARY_PATH=. ./fluidsynth -a alsa /data/sounds/french.sf2 -R0 -C0
FluidSynth runtime version 2.0.7
Copyright (C) 2000-2019 Peter Hanappe and others.
Distributed under the LGPL license.
SoundFont(R) is a registered trademark of E-mu Systems, Inc.

fluidsynth: warning: No preset found on channel 9 [bank=128 prog=0]
Type 'help' for help topics.

> prof_set_print 1
> prof_set_notes 120 0 1
> prof_start 2 1000
Generating 120 notes, fluidsynth: warning: Instrument not found on channel 0 [bank=0 prog=1], substituted [bank=0 prog=0]
fluidsynth: warning: Instrument not found on channel 1 [bank=0 prog=1], substituted [bank=0 prog=0]
generated voices:120
Number of measures(n_prof):2, duration of one mesure(dur):1000ms

Profiling time(mm:ss): Total=0:2  Remainder=0:2, press <ENTER> to cancel
 ------------------------------------------------------------------------------
 Duration(microsecond) and cpu loads(%) (sr: 44100 Hz, sp: 22.68 microsecond)
 ------------------------------------------------------------------------------
 Code under profiling       |Voices|       Duration (microsecond)   |  Load(%)
                            |   nbr|       min|       avg|       max|
 ---------------------------|------|--------------------------------|----------
 synth_write_* ------------>|   120|   1150.00|   1190.80|   1313.00|  82.053
 synth_one_block ---------->|   120|   1141.00|   1180.82|   1302.00|  81.366
 synth_one_block:clear ---->|   120|      1.00|      2.62|     27.00|   0.181
 synth_one_block:one voice->|     1|      8.00|      9.77|    266.00|   0.674
 synth_one_block:all voices>|   120|   1137.00|   1175.10|   1298.00|  80.972
 synth_one_block:reverb --->| no profiling available
 synth_one_block:chorus --->| no profiling available
 voice:note --------------->| no profiling available
 voice:release ------------>| no profiling available
 ------------------------------------------------------------------------------
 Cpu loads(%) (sr: 44100 Hz, sp: 22.68 microsecond) and maximum voices
 ------------------------------------------------------------------------------
 nVoices| total(%)|voices(%)| reverb(%)|chorus(%)| voice(%)|estimated maxVoices
 -------|---------|---------|----------|---------|---------|-------------------
     120|   82.053|   82.053|     0.000|    0.000|    0.675|              148

Profiling time(mm:ss): Total=0:2  Remainder=0:1, press <ENTER> to cancel
 ------------------------------------------------------------------------------
 Duration(microsecond) and cpu loads(%) (sr: 44100 Hz, sp: 22.68 microsecond)
 ------------------------------------------------------------------------------
 Code under profiling       |Voices|       Duration (microsecond)   |  Load(%)
                            |   nbr|       min|       avg|       max|
 ---------------------------|------|--------------------------------|----------
 synth_write_* ------------>|   120|   1150.00|   1194.50|   1444.00|  82.308
 synth_one_block ---------->|   120|   1141.00|   1184.47|   1432.00|  81.617
 synth_one_block:clear ---->|   120|      1.00|      2.70|     30.00|   0.186
 synth_one_block:one voice->|     1|      8.00|      9.80|    266.00|   0.675
 synth_one_block:all voices>|   120|   1137.00|   1178.72|   1421.00|  81.221
 synth_one_block:reverb --->| no profiling available
 synth_one_block:chorus --->| no profiling available
 voice:note --------------->| no profiling available
 voice:release ------------>| no profiling available
 ------------------------------------------------------------------------------
 Cpu loads(%) (sr: 44100 Hz, sp: 22.68 microsecond) and maximum voices
 ------------------------------------------------------------------------------
 nVoices| total(%)|voices(%)| reverb(%)|chorus(%)| voice(%)|estimated maxVoices
 -------|---------|---------|----------|---------|---------|-------------------
     120|   82.308|   82.308|     0.000|    0.000|    0.677|              147
Stopping 120 voices...voices stopped.
> fluidsynth: panic: An error occurred while reading from stdin.
fluid_profiling_print
fluidsynth: Estimated times: min/avg/max (micro seconds)
fluidsynth: synth_write_* ------------>: 1150.000/1194.497/1444.000
fluidsynth: synth_one_block ---------->: 1141.000/1184.466/1432.000
fluidsynth: synth_one_block:clear ---->: 1.000/2.703/30.000
fluidsynth: synth_one_block:one voice->: 8.000/9.803/266.000
fluidsynth: synth_one_block:all voices>: 1137.000/1178.718/1421.000

new-ct2hz

# LD_LIBRARY_PATH=. ./fluidsynth -a alsa /data/sounds/french.sf2 -R0 -C0
FluidSynth runtime version 2.0.7
Copyright (C) 2000-2019 Peter Hanappe and others.
Distributed under the LGPL license.
SoundFont(R) is a registered trademark of E-mu Systems, Inc.

fluidsynth: warning: No preset found on channel 9 [bank=128 prog=0]
Type 'help' for help topics.

> prof_set_print 1
> prof_set_notes 120 0 1
> prof_start 2 1000
Generating 120 notes, fluidsynth: warning: Instrument not found on channel 0 [bank=0 prog=1], substituted [bank=0 prog=0]
fluidsynth: warning: Instrument not found on channel 1 [bank=0 prog=1], substituted [bank=0 prog=0]
generated voices:120
Number of measures(n_prof):2, duration of one mesure(dur):1000ms

Profiling time(mm:ss): Total=0:2  Remainder=0:2, press <ENTER> to cancel
 ------------------------------------------------------------------------------
 Duration(microsecond) and cpu loads(%) (sr: 44100 Hz, sp: 22.68 microsecond)
 ------------------------------------------------------------------------------
 Code under profiling       |Voices|       Duration (microsecond)   |  Load(%)
                            |   nbr|       min|       avg|       max|
 ---------------------------|------|--------------------------------|----------
 synth_write_* ------------>|   120|   1177.00|   1212.27|   1413.00|  83.533
 synth_one_block ---------->|   120|   1168.00|   1202.39|   1401.00|  82.852
 synth_one_block:clear ---->|   120|      1.00|      2.68|     26.00|   0.185
 synth_one_block:one voice->|     1|      8.00|      9.95|     63.00|   0.685
 synth_one_block:all voices>|   120|   1163.00|   1196.20|   1391.00|  82.425
 synth_one_block:reverb --->| no profiling available
 synth_one_block:chorus --->| no profiling available
 voice:note --------------->| no profiling available
 voice:release ------------>| no profiling available
 ------------------------------------------------------------------------------
 Cpu loads(%) (sr: 44100 Hz, sp: 22.68 microsecond) and maximum voices
 ------------------------------------------------------------------------------
 nVoices| total(%)|voices(%)| reverb(%)|chorus(%)| voice(%)|estimated maxVoices
 -------|---------|---------|----------|---------|---------|-------------------
     120|   83.533|   83.533|     0.000|    0.000|    0.687|              145

Profiling time(mm:ss): Total=0:2  Remainder=0:1, press <ENTER> to cancel
 ------------------------------------------------------------------------------
 Duration(microsecond) and cpu loads(%) (sr: 44100 Hz, sp: 22.68 microsecond)
 ------------------------------------------------------------------------------
 Code under profiling       |Voices|       Duration (microsecond)   |  Load(%)
                            |   nbr|       min|       avg|       max|
 ---------------------------|------|--------------------------------|----------
 synth_write_* ------------>|   120|   1175.00|   1218.27|   1485.00|  83.946
 synth_one_block ---------->|   120|   1166.00|   1208.31|   1470.00|  83.260
 synth_one_block:clear ---->|   120|      1.00|      2.81|     31.00|   0.194
 synth_one_block:one voice->|     1|      8.00|     10.00|    609.00|   0.689
 synth_one_block:all voices>|   120|   1161.00|   1201.98|   1464.00|  82.824
 synth_one_block:reverb --->| no profiling available
 synth_one_block:chorus --->| no profiling available
 voice:note --------------->| no profiling available
 voice:release ------------>| no profiling available
 ------------------------------------------------------------------------------
 Cpu loads(%) (sr: 44100 Hz, sp: 22.68 microsecond) and maximum voices
 ------------------------------------------------------------------------------
 nVoices| total(%)|voices(%)| reverb(%)|chorus(%)| voice(%)|estimated maxVoices
 -------|---------|---------|----------|---------|---------|-------------------
     120|   83.946|   83.946|     0.000|    0.000|    0.690|              144
Stopping 120 voices...voices stopped.
> fluidsynth: panic: An error occurred while reading from stdin.
fluid_profiling_print
fluidsynth: Estimated times: min/avg/max (micro seconds)
fluidsynth: synth_write_* ------------>: 1175.000/1218.270/1485.000
fluidsynth: synth_one_block ---------->: 1166.000/1208.309/1470.000
fluidsynth: synth_one_block:clear ---->: 1.000/2.812/31.000
fluidsynth: synth_one_block:one voice->: 8.000/9.999/609.000
fluidsynth: synth_one_block:all voices>: 1161.000/1201.984/1464.000

mawe42 · 2019-10-20T13:52:45Z

The culprit seems to be the div() stdlib call. If I replace that with icents / 1200 and icents % 1200, then performance of new-ct2hz is at least equal, sometimes a little higher than master.

derselbst · 2019-10-20T16:26:27Z

Thanks for testing on ARM.

ARM Coretex A7

That CPU has a proper FPU and dynamic branch prediction. Doesn't surprise me that your test don't show a real difference. @carlo-bramini for instance has a Cortex M4, maybe things look different for him.

The culprit seems to be the div() stdlib call.

Oh, you are right.

Function / Call Stack	CPU Time	Instructions Retired	Microarchitecture Usage
div	0.044s	265,200,000	69.5%

Quite disappointing that stdlib calls like div() and ldexp() perform so badly. I'll change it to unsigned integer div and mod then.

I'm not so sure about the fallback to on-the-fly calculation though. From a sound perspective, I don't think we gain anything from calculating those extremely high frequencies.

Thinking about it again, I agree. With this new implementation we have a maximum of 38100 absolute cents, which is 29,527,900,160 Hz . Nothing will ever reach this. Inaudible anyway, so yeah, use an upper limit... Now the question is where to check that upper limit?

if(FLUID_UNLIKELY(cents < 0 || cents > 38100)) - FPU comparison :(
res.quot = min(res.quot, 32/*bits*/) - conditional assignment :(
Don't check anything and just let the shift safely wrap around - :) ?

turned out to be expensive

mawe42 · 2019-10-20T17:16:08Z

Don't check anything and just let the shift safely wrap around - :) ?

Yes, why not? It's not any worse than the current 1.0Hz fallback, and due to the higher cents cutoff actually a little better.

carlo-bramini · 2019-10-20T17:49:27Z

Excuse me, since icents is an int type, why don't you simply write something like:

quo = icents / 1200;
rem = icents % 1200;

Since the divisor is a constant value known at compile time, the compiler simplifies the calculation of the divisor and the reminder by just multiplying for its reciprocal.

derselbst · 2019-10-22T09:48:37Z

Should be fine now. Unless smb. objects, I would merge this.

mawe42 · 2019-10-22T09:58:05Z

👍

jjceresa · 2019-10-22T11:31:24Z

Please, as fac has same type then mul what could be the difference between fac and (fac & (sizeof(mult)*8u - 1u)) ?

derselbst · 2019-10-22T11:43:01Z

If fac becomes 32 or more, the result of the shift would be zero. The & is just a replacement for the quick modulo operation % 32.

jjceresa · 2019-10-22T12:23:12Z

Oh yes, ofc.

mawe42 · 2019-10-22T12:28:55Z

Maybe it would be good to extend the comment above that line to explain this?

derselbst · 2019-10-22T12:37:56Z

Done.

jjceresa · 2019-10-22T12:45:18Z

I forgot that (fac & (sizeof(mult)*8u - 1u)) is the fast equivalent of fac % 12. I'm getting old :).

More efficient and unbound implementation of fluid_ct2hz()

2b585f0

derselbst added the enhancement label Oct 18, 2019

derselbst requested review from mawe42 and jjceresa October 18, 2019 19:37

derselbst mentioned this pull request Oct 18, 2019

Fix incorrect usage of fluid_ct2hz() #568

Merged

derselbst added 3 commits October 18, 2019 22:06

use ldexp()

ef9e081

prefer FLUID_POW to pow()

28968fd

derselbst added 2 commits October 19, 2019 10:37

update regression test

13e7ca6

fix broken comparison

8bbbf30

derselbst mentioned this pull request Oct 20, 2019

Optimizing fluid_synth_write functions #571

Closed

mawe42 reviewed Oct 20, 2019

View reviewed changes

remove call to div()

190ee28

turned out to be expensive

derselbst added 2 commits October 21, 2019 10:31

make sure left shift wraps around

3c1d3f8

fix unit test

b038870

derselbst merged commit fa7354a into master Oct 22, 2019

derselbst deleted the new-ct2hz branch October 22, 2019 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More efficient and unlimited implementation of fluid_ct2hz() #569

More efficient and unlimited implementation of fluid_ct2hz() #569

derselbst commented Oct 18, 2019 •

edited

jjceresa commented Oct 18, 2019

derselbst commented Oct 19, 2019 •

edited

jjceresa commented Oct 19, 2019

jjceresa commented Oct 19, 2019

mawe42 left a comment

mawe42 commented Oct 20, 2019

mawe42 commented Oct 20, 2019

derselbst commented Oct 20, 2019

mawe42 commented Oct 20, 2019

carlo-bramini commented Oct 20, 2019

derselbst commented Oct 22, 2019

mawe42 commented Oct 22, 2019

jjceresa commented Oct 22, 2019

derselbst commented Oct 22, 2019

jjceresa commented Oct 22, 2019

mawe42 commented Oct 22, 2019

derselbst commented Oct 22, 2019

jjceresa commented Oct 22, 2019

More efficient and unlimited implementation of fluid_ct2hz() #569

More efficient and unlimited implementation of fluid_ct2hz() #569

Conversation

derselbst commented Oct 18, 2019 • edited

jjceresa commented Oct 18, 2019

derselbst commented Oct 19, 2019 • edited

jjceresa commented Oct 19, 2019

jjceresa commented Oct 19, 2019

mawe42 left a comment

Choose a reason for hiding this comment

mawe42 commented Oct 20, 2019

master

new-ct2hz

mawe42 commented Oct 20, 2019

derselbst commented Oct 20, 2019

mawe42 commented Oct 20, 2019

carlo-bramini commented Oct 20, 2019

derselbst commented Oct 22, 2019

mawe42 commented Oct 22, 2019

jjceresa commented Oct 22, 2019

derselbst commented Oct 22, 2019

jjceresa commented Oct 22, 2019

mawe42 commented Oct 22, 2019

derselbst commented Oct 22, 2019

jjceresa commented Oct 22, 2019

derselbst commented Oct 18, 2019 •

edited

derselbst commented Oct 19, 2019 •

edited