Speedup rendering functions #573

carlo-bramini · 2019-10-19T17:32:03Z

See issue #571

derselbst

Looks promising, I will add a unit test on master, similar as the one for fluid_synth_process(), to be sure that nothing else breaks. Meanwhile, see below.

derselbst · 2019-10-19T18:44:32Z

src/synth/fluid_synth.c


-    synth->cur = l;
+    synth->cur = cur;

    time = fluid_utime() - time;
    cpu_load = 0.5 * (fluid_atomic_float_get(&synth->cpu_load) + time * synth->sample_rate / len / 10000.0);


When control reaches here, len will be zero, causing a DIV by zero. Recompile with cmake -Denable-trap-on-fpe=1 to get a SIGFPE.

Argh, I did not notice the use of len into cpu_load, I will fix it.

Immediately after fixing the SIGFPE error, I noticed that the 'i' and 'j' variables could be deleted safely. loff and roff are added to left_out and right_out, so the lincr and rincr can be added directly to the pointers.

jjceresa · 2019-10-20T18:07:12Z

Thanks for speed enhancement, however this new implementation doesn't behave as before.
When fluid_synth_write_xxx() is called with len to 0, the function will still call fluid_synth_render_blocks() and write in caller supplied buffer here:

*left_out = (float) left_in[n];
*right_out = (float) right_in[n];

This could lead with possible memory violation access. To fix that the outer loop do; {..} while(size)
should be while(size) {...}.

Please, why did you use a reverse indexing ?. Shouldn't a classic indexing be more faster ? and easier to read, like this:
'''
for (i=0, left_in += cur, right_in += cur; i < n; i++)
{
*left_out = (float) left_in[i];
*right_out = (float) right_in[i];

left_out += lincr;
right_out += rincr;

}
'''

carlo-bramini · 2019-10-20T19:49:06Z

I fixed the case when len is zero, thank you.

Shouldn't a classic indexing be more faster

No, I don't think so. The readability of the code is quite subjective, but in my opinion the condition (++n < 0) is typically evaluated faster than (++i < n).

derselbst · 2019-10-20T20:49:06Z

Shouldn't a classic indexing be more faster

No, I don't think so.

Reversed indexing is fine for me.

condition (++n < 0) is typically evaluated faster than (++i < n).

I wouldn't nit-pick too much on those issues. After all, we are only talking about integers, and most of the time will be spent when accessing left_in and right_in and loading the data into the cache. That said, what you could try is traversing left_in and right_in in separate loops, to avoid potential cache conflicts (https://stackoverflow.com/q/8547778). And declaring left_out, right_out, left_in and right_in as FLUID_RESTRICT.

jjceresa · 2019-10-20T21:18:09Z

That said, what you could try is traversing left_in and right_in in separate loops, to avoid potential cache conflicts.

1)Interesting idea !.

Also to simplify a bit we could replace do; {..} while(size) by while(size) {...} this will allow to remove if (FLUID_LIKELY((size = len) > 0))

carlo-bramini · 2019-10-21T12:13:57Z

Also to simplify a bit we could replace do; {..} while(size) by while(size) {...} this will allow to remove if (FLUID_LIKELY((size = len) > 0))

Actually, I did that intentionally instead of using a while() on top of the loop, it is true that the compiler could unroll the loop, but in my opinion it would be worth to help on this piece of code.

carlo-bramini · 2019-10-22T11:43:52Z

what you could try is traversing left_in and right_in in separate loops, to avoid potential cache conflicts

It is a good idea, although this seems to be platform dependant.
Actually, my platform performs better with a single loop rather than two splitted loops.

jjceresa · 2019-10-22T12:32:52Z

On my plaftorms, the improvements were very interesting.
The improvement comes out by eliminating the execution costs of the test at every cycle.

I'm curious of the improvements you got. Would you please report these result ?

derselbst · 2019-10-23T11:32:55Z

Actually, my platform performs better with a single loop rather than two splitted loops.

Ok.

On my plaftorms, the improvements were very interesting.
The improvement comes out by eliminating the execution costs of the test at every cycle.

I'm curious of the improvements you got. Would you please report these result ?

Me too.

Unit test passed, will merge it now, because it's convenient for me. Thanks.

Speedup rendering functions

248cb94

See issue #571

derselbst reviewed Oct 19, 2019

View reviewed changes

carlo-bramini added 2 commits October 20, 2019 14:07

Fix SIGFPE introduced by previous change

fbac561

Optimize more the critical cycle.

9384eb9

Immediately after fixing the SIGFPE error, I noticed that the 'i' and 'j' variables could be deleted safely. loff and roff are added to left_out and right_out, so the lincr and rincr can be added directly to the pointers.

Test if len > 0

8b1e8ac

Merge branch 'master' into master

f303f8e

derselbst changed the base branch from master to speedup-write October 23, 2019 11:30

derselbst merged commit 8aef093 into FluidSynth:speedup-write Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup rendering functions #573

Speedup rendering functions #573

carlo-bramini commented Oct 19, 2019

derselbst left a comment

derselbst Oct 19, 2019

carlo-bramini Oct 20, 2019

jjceresa commented Oct 20, 2019

carlo-bramini commented Oct 20, 2019

derselbst commented Oct 20, 2019

jjceresa commented Oct 20, 2019

carlo-bramini commented Oct 21, 2019 •

edited

Loading

carlo-bramini commented Oct 22, 2019

jjceresa commented Oct 22, 2019

derselbst commented Oct 23, 2019

Speedup rendering functions #573

Speedup rendering functions #573

Conversation

carlo-bramini commented Oct 19, 2019

derselbst left a comment

Choose a reason for hiding this comment

derselbst Oct 19, 2019

Choose a reason for hiding this comment

carlo-bramini Oct 20, 2019

Choose a reason for hiding this comment

jjceresa commented Oct 20, 2019

carlo-bramini commented Oct 20, 2019

derselbst commented Oct 20, 2019

jjceresa commented Oct 20, 2019

carlo-bramini commented Oct 21, 2019 • edited Loading

carlo-bramini commented Oct 22, 2019

jjceresa commented Oct 22, 2019

derselbst commented Oct 23, 2019

carlo-bramini commented Oct 21, 2019 •

edited

Loading