Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What scale is applied when reading audio files #226

Closed
santigijon opened this issue May 11, 2018 · 8 comments
Closed

What scale is applied when reading audio files #226

santigijon opened this issue May 11, 2018 · 8 comments

Comments

@santigijon
Copy link

santigijon commented May 11, 2018

Hey,

I would like to know what sort of normalization/scale is applied to the sound files that are read using the SoundFile library, since they all are supossed to be within the (-1,1) amplitude range.

And I would also like to know why some audio samples are allowed to exceed this +1 limit?

@bastibe
Copy link
Owner

bastibe commented May 12, 2018

There are certain floating point file formats that may contain values greater than (-1,1). We are not applying any scaling to them but just return what is in the file.

Only integer formats such as PCM16 are scaled to (-1,1), with a possible rounding error of 2^-15 (in case of PCM16). But consequently, these formats cannot exceed (-1,1).

Can you upload a sample file, so we can have a look at it?

@santigijon
Copy link
Author

santigijon commented May 12, 2018

Sure, I will attach two audio files.
sample audio files.zip

And regarding the first question, how is this (-1,1) range established? Does it depend on the voltage range of the ADC of the soundcard of my computer or something else?
Sorry if I say something that may sound strange, I am yet a newbie in this ADC audio conversion area.

@bastibe
Copy link
Owner

bastibe commented May 14, 2018

There is no ADC conversion happening in SoundFile. SoundFile merely reads audio files that are already stored on disk.

(ADC happens in your sound card, when you record audio. Your sound card has a reference gain that converts voltages to numbers. However, the ADC has a limited dynamic range (the range between most quiet and most loud sound that can be converted without error). This is usually set up such that the loudest possible sound maps to (-1,1), and everything above that will clip, i.e. will not be recorded correctly and represented as -1 or 1.)

@santigijon
Copy link
Author

Great answers, thanks.

So then those values exceeding the (-1,1) range in the audio samples that I attached are due to some floating point file formats that do not estrictly map the maximum voltage to (-1,1)?

And a last question regarding what you explained in the parenthesis. It is then impossible (unless more information is provided) to compare the loudness of two audio files recorded from different devices, right? Since every different microphone will have a different voltage gain range and therefore a different scale.

@bastibe
Copy link
Owner

bastibe commented May 14, 2018

So then those values exceeding the (-1,1) range in the audio samples that I attached are due to some floating point file formats that do not estrictly map the maximum voltage to (-1,1)?

Exactly.

It is then impossible (unless more information is provided) to compare the loudness of two audio files recorded from different devices, right? Since every different microphone will have a different voltage gain range and therefore a different scale.

Yes, it is impossible to compare the voltages and loudnesses of the original recording. However, you can of course compare the loudness of the file if it were played. This is still subject to your current settings on your sound card, but as long as you don't change your sound card gain, the loudness of two files is comparable.

@santigijon
Copy link
Author

santigijon commented May 14, 2018

Thanks a lot bastibe, you have been extremely helpful, thanks.

One very last question regarding the last thing you said about playing the sounds and comparing them.

If I play two sound files with my computer and one (L) sounds louder than the other one (NL). This does not mean that when the sound "L" was recorded, it was in fact louder than "NL". Because the device ("DNL") that recorded the sound "NL" might have had a greater voltage range than "DL", and therefore a digital amplitude value of 0.5 of the sound "NL" would mean higher voltage and thus higher pressure than a digital 0.5 for the sound "L" recorded with the "DL".
Right ?

L - loud (sound)
NL - not loud (sound)
DL - device loud (device with which the loud sound was recorded)
DNL - device not loud (device with which the not loud sound was recorded)

@bastibe
Copy link
Owner

bastibe commented May 15, 2018

Exactly. Comparisons like this are only possible if you calibrate your microphones (which is not particularly hard, but easy to get wrong).

@bastibe bastibe closed this as completed May 22, 2018
@santigijon
Copy link
Author

Sorry I forgot to reply to the last comment @bastibe , thanks again for the help, much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants