Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup rendering functions #573

Merged
merged 5 commits into from
Oct 23, 2019
Merged

Speedup rendering functions #573

merged 5 commits into from
Oct 23, 2019

Conversation

carlo-bramini
Copy link
Contributor

See issue #571

Copy link
Member

@derselbst derselbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising, I will add a unit test on master, similar as the one for fluid_synth_process(), to be sure that nothing else breaks. Meanwhile, see below.


synth->cur = l;
synth->cur = cur;

time = fluid_utime() - time;
cpu_load = 0.5 * (fluid_atomic_float_get(&synth->cpu_load) + time * synth->sample_rate / len / 10000.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When control reaches here, len will be zero, causing a DIV by zero. Recompile with cmake -Denable-trap-on-fpe=1 to get a SIGFPE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I did not notice the use of len into cpu_load, I will fix it.

Immediately after fixing the SIGFPE error, I noticed that the 'i' and 'j' variables could be deleted safely.
loff and roff are added to left_out and right_out, so the lincr and rincr can be added directly to the pointers.
@jjceresa
Copy link
Collaborator

Thanks for speed enhancement, however this new implementation doesn't behave as before.
When fluid_synth_write_xxx() is called with len to 0, the function will still call fluid_synth_render_blocks() and write in caller supplied buffer here:

*left_out = (float) left_in[n];
*right_out = (float) right_in[n];

This could lead with possible memory violation access. To fix that the outer loop do; {..} while(size)
should be while(size) {...}.


Please, why did you use a reverse indexing ?. Shouldn't a classic indexing be more faster ? and easier to read, like this:
'''
for (i=0, left_in += cur, right_in += cur; i < n; i++)
{
*left_out = (float) left_in[i];
*right_out = (float) right_in[i];

left_out += lincr;
right_out += rincr;

}
'''

@carlo-bramini
Copy link
Contributor Author

I fixed the case when len is zero, thank you.

Shouldn't a classic indexing be more faster

No, I don't think so. The readability of the code is quite subjective, but in my opinion the condition (++n < 0) is typically evaluated faster than (++i < n).

@derselbst
Copy link
Member

Shouldn't a classic indexing be more faster

No, I don't think so.

Reversed indexing is fine for me.

condition (++n < 0) is typically evaluated faster than (++i < n).

I wouldn't nit-pick too much on those issues. After all, we are only talking about integers, and most of the time will be spent when accessing left_in and right_in and loading the data into the cache. That said, what you could try is traversing left_in and right_in in separate loops, to avoid potential cache conflicts (https://stackoverflow.com/q/8547778). And declaring left_out, right_out, left_in and right_in as FLUID_RESTRICT.

@jjceresa
Copy link
Collaborator

That said, what you could try is traversing left_in and right_in in separate loops, to avoid potential cache conflicts.

1)Interesting idea !.

  1. Also to simplify a bit we could replace do; {..} while(size) by while(size) {...} this will allow to remove if (FLUID_LIKELY((size = len) > 0))

@carlo-bramini
Copy link
Contributor Author

carlo-bramini commented Oct 21, 2019

  1. Also to simplify a bit we could replace do; {..} while(size) by while(size) {...} this will allow to remove if (FLUID_LIKELY((size = len) > 0))

Actually, I did that intentionally instead of using a while() on top of the loop, it is true that the compiler could unroll the loop, but in my opinion it would be worth to help on this piece of code.

@carlo-bramini
Copy link
Contributor Author

what you could try is traversing left_in and right_in in separate loops, to avoid potential cache conflicts

It is a good idea, although this seems to be platform dependant.
Actually, my platform performs better with a single loop rather than two splitted loops.

@jjceresa
Copy link
Collaborator

On my plaftorms, the improvements were very interesting.
The improvement comes out by eliminating the execution costs of the test at every cycle.

I'm curious of the improvements you got. Would you please report these result ?

@derselbst derselbst changed the base branch from master to speedup-write October 23, 2019 11:30
@derselbst
Copy link
Member

Actually, my platform performs better with a single loop rather than two splitted loops.

Ok.

On my plaftorms, the improvements were very interesting.
The improvement comes out by eliminating the execution costs of the test at every cycle.

I'm curious of the improvements you got. Would you please report these result ?

Me too.

Unit test passed, will merge it now, because it's convenient for me. Thanks.

@derselbst derselbst merged commit 8aef093 into FluidSynth:speedup-write Oct 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants