Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24 bit sample support #301

Merged
merged 19 commits into from Dec 14, 2017
Merged

24 bit sample support #301

merged 19 commits into from Dec 14, 2017

Conversation

derselbst
Copy link
Member

This adds support for 24 bit sample soundfonts. It has been implemented by adding a separate pointer data24 to fluid_sample_t that points to the lower byte counterpart of each sample. This pointer can be set to NULL if a soundfont only has 16 bit samples. Implementing it this way ensures two important aspects:

  1. Not to break any use-case related to the public fluid_sample_t struct, and
  2. have the same amount of memory consumption when coping with usual 16 bit SF2s.

Putting the 16 bit samples together with a potentially present 8 bit counterpart to create a 24 bit sample is done as late as possible, specifically in the interpolation routines in rvoice_dsp. But even if there is no 8 bit counterpart, we now always use 24 bit samples (in fact int32_t) for mixing in rvoice_dsp (the lower part is assumed to be zero in this case, see fluid_rvoice_get_sample()).

This leads to the interesting question: Is there even any gain in precision when using 24 bit samples?

The synthesizing pipeline looks as follows:

  1. Read integer sample data from the soundfont
  2. Access those integer samples in rvoice_dsp during interpolation and create a 32 bit integer sample from it (with a value range of [-2^23 ; 2^23 - 1])
  3. Implicitly convert these 32 bit integer samples to floating point samples upon interpolating
  4. Do the interpolation calculation
  5. Amplify the floating point samples by INT24_MAX to force them to a normalized range of [-1.0 ; 1.0)

The only potential precision loss due to 32 bit integers I can see could occur at 3. However even with single point precision floating points we can accurately represent integer values in the range [-2^24 ; 2^24] when promoting them to floats. The 32 bit integer samples are guaranteed to be in that range so there should be no rounding errors that haven't been there before.

I tested the synthesized output with the OmegaGMGS2 soundfont. As expected an audible difference was not hearable. However the resulting 24 bit waveform is slightly different from the one synthesized with fluidsynth 1.1.8. Also as far as I can tell there doesnt seem to be any measurable drop in performance.

Any feedback welcome.

@derselbst derselbst added this to the 2.0 milestone Dec 9, 2017
@derselbst derselbst self-assigned this Dec 9, 2017
@jjceresa
Copy link
Collaborator

jjceresa commented Dec 9, 2017

I tested the synthesized output with the OmegaGMGS2 soundfont. As expected an audible difference was not hearable.

I suppose the test was done using an audio card with a 24bits DAC ?.
Even in this case, it is difficult to see a difference by ear between 16 bits and 24 bits. 16 bits samples brings 96 dB of dynamic which is already enormous for ours ears.
The only things that i see, that with 24 bits, the quantization noise is -48 dB lower than with 16 bits. The quantization noise produces an unwanted harmonic distortion sometimes hearable(because it is periodic). The use of dithering technique is often useful to cancel this harmonic distortion. Actually dithering was used in 16 bits . May be that with 24 bits dithering will be not necessary anymore ?

However the resulting 24 bit waveform is slightly different from the one synthesized with fluidsynth 1.1.8.

What do you mean ?

Also as far as I can tell there doesnt seem to be any measurable drop in performance.

Do you mean, lost in computation time ?

@derselbst
Copy link
Member Author

I suppose the test was done using an audio card with a 24bits DAC ?

Good point, I cant even tell right now.

Even in this case, it is difficult to see a difference by ear between 16 bits and 24 bits.

Yes, that's why I was expecting none, else this would have probably meant I introduced some strange bug.

May be that with 24 bits ditheringwill be not necessary anymore ?

Not sure. But since most soundfonts are only 16 bit, dithering will still be useful.

What do you mean ?

See the attachment. Watching the two files in e.g. audacity you can see a slight difference for some samples, so at least it seem to have any effect.
sm24_test.zip

Do you mean, lost in computation time ?

I dont think lost is the correct word in this context, but yes I was referring to computation time which stayed the same.

@jjceresa
Copy link
Collaborator

jjceresa commented Dec 11, 2017

Watching the two files you can see a slight difference for some samples, so at least it seem to have any effect.

Using Sonic Visualiser the two files look similar.
Of course, i suppose the two files was build using the file rendering functionality of FluidSynth.

In the two files, the absolute maximum level is around 0.2 (normalized to 1.0f). For example, if we compare the same sample number near this level (ie the sample number 4581140 at time 1:43:880.).
The result is:

  • Test16:left 0.1873 (-7.274 dB) -- Test24: left 0.1865 (-7.292 dB) difference: 0.18 cB!
  • Test16:right 0.1532 (-8.147 dB) -- Test24: right 0.1508 (-8.216 dB) difference: 0.69 cB!

The average difference is 0.45 cB (i.e 0.6 %), so this is an inaudible difference.


I was referring to computation time which stayed the same.

Please, how the time measurement was done ? , and which interpolation method was used ?

@mawe42
Copy link
Member

mawe42 commented Dec 11, 2017

If you load both files in audacity, invert one of them and mix them together you get the difference. It does look like the changes are miniscule and not audible.

@mawe42
Copy link
Member

mawe42 commented Dec 11, 2017

Also as far as I can tell there doesnt seem to be any measurable drop in performance.

I think it would be great to add more instrumentation (like jjcs profiling patch) to objectively measure performance. And maybe keep a wiki page updated with the numbers for different platforms for each release. That way we can be sure that there are no performance regressions.

@derselbst
Copy link
Member Author

I just noticed that the demo files I provided used the default s16 for "audio.file.format" setting, so that waveform analysis was quite meaningless I guess.

And maybe keep a wiki page updated with the numbers for different platforms for each release.

Even if we manage it to setup a fully automated test suite, I dont think gathering those performance information across multiple/different platforms is something we can bear up in the long term.

Anyway, here is the demo program I used.
benchmark.txt
Compiled with gcc benchmark.c -lfluidsynth -o ben, libfluidsynth.so build with RelWithDebInfo.

And these are my test results, depending on the interp. method used:

Test with OmegaGMGS2.sf2

./ben OmegaGMGS2/OmegaGMGS2.sf2 OmegaGMGS2/Demo\ Midi\ Files/Video\ Game/Zelda/DARKWRL2.MID

fluidsynth-master (without sm24)

interp = 0 : 3983.905100 ms
interp = 1 : 4734.673700 ms
interp = 4 : 5635.398300 ms
interp = 7 : 8316.970300 ms

fluidsynth-1.1.8

interp = 0 : 4091.934600 ms
interp = 1 : 4866.325400 ms
interp = 4 : 5733.814400 ms
interp = 7 : 8409.062700 ms

fluidsynth-sm24 (this PR)

interp = 0 : 3987.057500 ms
interp = 1 : 4380.970100 ms
interp = 4 : 5094.222800 ms
interp = 7 : 6235.854200 ms

Test with FluidR3_GM.sf2

./ben /usr/share/sounds/sf2/FluidR3_GM.sf2 OmegaGMGS2/Demo\ Midi\ Files/Video\ Game/Zelda/DARKWRL2.MID

fluidsynth-master (without sm24)

interp = 0 : 4220.572200 ms
interp = 1 : 5006.859700 ms
interp = 4 : 5940.045100 ms
interp = 7 : 8673.536500 ms

fluidsynth-1.1.8

interp = 0 : 4422.360800 ms
interp = 1 : 5208.928800 ms
interp = 4 : 6130.983500 ms
interp = 7 : 8904.197700 ms

fluidsynth-sm24 (this PR)

interp = 0 : 4201.633800 ms
interp = 1 : 4664.054000 ms
interp = 4 : 5289.579800 ms
interp = 7 : 6197.322800 ms

I'm not quite sure why sm24 is even faster today, didnt experienced that two days ago...

@jjceresa
Copy link
Collaborator

If you load both files in audacity, invert one of them and mix them together you get the difference.

Good point . This is because i haven't yet audacity installed that i have used an other available tool .

In fact, the difference are mainly for small amplitude sample: (for 16 bits the smallest level is 1/32768.0f against 1/8388608.0f for 24bits. So, the worst case difference could be -48dB).


i think it would be great to add more instrumentation like profiling patch..

As an example,I have used this patch to compare different reverb cpu load. It was possible to see notable difference between code implemented inline and the same code implemented with macros.

Another example, using a Raspberry 2, it was possible to see that when playing 50 voices the result was:

 ------ cpu loads(%)(sample rate:44100 Hz) and estimated maximum voices ------
  nVoices |total(%) |voices(%) |reverb(%) |chorus(%)  | voice(%)|estimated maxVoices
        50|    73.44|     38.25|     27.90|      7.29 |     0.73|    137

On this hardware the reverb cpu load is high 27.90% (probably due to the lack of floating point unit ?).
When used properly profiling patch gives precises data. I don't regret the necessary time to build it.

@jjceresa
Copy link
Collaborator

jjceresa commented Dec 12, 2017

To get a behaviour similar to this sm24 PR , i have implemented fluid_rvoice_get_sample() and fluid_voice_calculate_gain_amplitude().

Theses are interesting results tests using the profiling commands (on interpolation method 4 only).

  • Test1: is without implementation of sm24 PR behaviour.
  • Test2: is with fluid_rvoice_get_sample() and fluid_voice_calculate_gain_amplitude().
  • Test3: is similar to Test 2, but the functions are implemented inline.

The condition of the measure:

  • interpolation method 4 only.
  • 1 CPU core.
  • chorus, reverb: off
  • numbers of voices generated: 250
  • Duration of the measurement: 1 second.

Definitions:

  • cpu loads (%)are the sample processing time (in percent) relative to the current sample period.
  • nVoices the number of generated voices (here 250).
  • voices(%) the cpu load of all voices (here 250).
  • reverb(%) the cpu load of reverb only.
  • chorus(%) the cpu load of chorus only.
  • voice(%): the cpu load of only one voice.
  • estimated maxVoices gives the maximum voices the current hardware is capable (at 100% cpu load).

In those tests, because we are only interested by the impact on interpolation, only the cpu load of one voice (i.e voice(%)) is relevant:
For example a voice(%) value of 0.22 mean that the duration processing of one voice sample is 0.22 % of the sample period.

Test:1 Without sm24 PR behaviour.

------ cpu loads(%)(sample rate:44100 Hz) and estimated maximum voices ------
  nVoices |total(%) |voices(%) |reverb(%) |chorus(%)  | voice(%)|estimated maxVoices
       250|    40.80|     40.80|      0.00|      0.00 |     0.16|    620

The duration of one voice sample is 0.16% of the sample period.

Test:2 With fluid_rvoice_get_sample() - fluid_voice_calculate_gain_amplitude()

------ cpu loads(%)(sample rate:44100 Hz) and estimated maximum voices ------
  nVoices |total(%) |voices(%) |reverb(%) |chorus(%)  | voice(%)|estimated maxVoices
       250|    55.23|     40.80|      0.00|      0.00 |     0.22|    458

The duration of one voice sample is 0.22% of the sample period. This correspond to an augmentation of 35 % compared to Test 1.

Test:3 With fluid_rvoice_get_sample() and fluid_voice_calculate_gain_amplitude() now implemented inline

------ cpu loads(%)(sample rate:44100 Hz) and estimated maximum voices ------
  nVoices |total(%) |voices(%) |reverb(%) |chorus(%)  | voice(%)|estimated maxVoices
       250|    40.92|     40.80|      0.00|      0.00 |     0.16|    620

The duration of one voice sample is 0.16% of the sample period. This is the same result that Test 1.

So, it worth to implement those functions inline.

  • static inline fluid_real_t fluid_voice_calculate_gain_amplitude().
  • static inline int32_t fluid_rvoice_get_sample().

@derselbst
Copy link
Member Author

So, it worth to implement those functions inline.

I dont experience any performance boost when marking those functions inline explicitly. What platform are you on? What compiler + version did you use?

@mawe42
Copy link
Member

mawe42 commented Dec 12, 2017

I dont experience any performance boost when marking those functions inline explicitly.

That "inline" is only a suggestion to the compiler anyway. The function might be inlined even when not marked as inline explicitly, or the compiler might ignore the suggestion. I think there's a -Winline switch for gcc where you get a warning if a function was suggested for inlining but wasn't inlined.

@derselbst
Copy link
Member Author

derselbst commented Dec 12, 2017

That "inline" is only a suggestion to the compiler anyway.

Yes, and when compiled with RelWithDebInfo a nowadays compiler should be smart enough to inline it implicitly. I'm still curious to know jjc's compiler version + platform.

@jjceresa
Copy link
Collaborator

What platform are you on? What compiler + version did you use?

The OS is Win XP. The CPU is an AMD.
The compiler is Visual C++ 6.0.

I have chosen the MinSizeRel configuration not the Debug configuration because in Debug the inline is not taken in effect.
When in MinSizeRel the optimisation is Minimun Size but inline is in effect also.
The equivalent command line options are : /O1 (for "minimum size") and /Ob1 (for "expands only functions marked as inline").

with RelWithDebInfo a nowadays compiler should be smart enough to inline it implicitly.

May be inline code is not appropriate to Debug ?.

@derselbst
Copy link
Member Author

May be inline code is not appropriate to Debug ?.

RelWithDebInfo is a release build that enables compiler optimizations and should inline appropriately.

I think the problem is your compiler. Marking functions as inline usually really helped compilers from the good old 90s and Visual C++ 6.0 was released in '98. Although I do think explicitly inlining functions is outdated, in this case it seems to be useful.

fluid_voice_calculate_gain_amplitude()
fluid_rvoice_get_sample()
@derselbst
Copy link
Member Author

@mawe42 Are you all right with this?

Copy link
Member

@mawe42 mawe42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants