-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support interface normalized to [-1,1) range? #20
Comments
PySoundFile uses libsndfile, which by default normalizes PCM data to the range [-1, 1) if floating point values are requested, just as you like it (see note 1). The normalization is on by default and can be switched off, but this is not (yet) supported in PySoundFile. You can already use the current release; if you request |
@mgeier: thanks for the quick and detailed reply. When I looked more closely at libsndfile, I see that it does what I need for reads but not writes to PCM WAV files. This FAQ explains what libsndfile does: I need to have my function that writes PCM WAV files bit-exact with the data in my float64 Numpy array, and I plan to handle in Python/Numpy the two's-complement saturation and quantization appropriate to the number of bits in the target PCM WAV file (before calling the WAV write function). For example, when writing a 16-bit PCM WAV file, I need my WAV write function to multiply my [-1, 1) range data by 32768, not 32767 as PySoundFile/libsndfile does today. I see two choices to meet my needs:
The libsndfile FAQ says "Some people would say that this is a severe short-coming of libsndfile." I would say that it is a short-coming not to offer users the option of a bit-exact conversion. Especially for users like myself who optimize dither for sound quality in compact discs. However, PySoundFile can offer more flexibility than libsndfile. If fact, PySoundFile could also provide options for two's-complement saturation and quantization or raising an exception if these operations would otherwise change the data. This would allow PySoundFile to be used by both novices and experienced DSP engineers. As a DSP engineer, these new features would be simple for me to add in Python (but not in C++). The reason I bring these suggestions up is because I think they would make PySoundFile and Python more useful to the community. |
@Whistler7: OK, I understand your motivation now ... I don't think, however, that PySoundFile should do anything to the actual values of the audio data. It should support (nearly) all parameters to control libsndfile and it should provide the data via NumPy arrays in a meaningful and flexible way. If you want to change how normalization is done (or add an additional flag), you should ask Erik de Castro Lopo, the original author of libsndfile. Or, which will be probably more successful, you should disable the normalization and do it properly in your own code. Disabling the normalization is not yet supported in PySoundFile, but at least I created an issue for it: #23 |
I see two conventional number representations for PCM WAV files in science and engineering: integer and fractional. In other words, these number representations are two different mental concepts on how to interpret the same binary fixed-point data in the file. The mental concepts determine the mapping of file values into Python (floating-point) values. Since PCM WAV files use two's complement binary representation, the positive and negative full-scale ranges are non-symmetrical. For 16-bit WAV files, the range is For reading from PCM WAV files, libsndfile follows the above conventions when returning a floating-point array (fractional when normalization is enabled, integer when normalization is disabled). PySoundFile doesn't need to support disabled normalization because the integer number representation is accessible by returning an integer array. Everybody is happy with reads. For writing to a PCM WAV file from a floating-point array, libsndfile has created a unique and unconventional number representation when normalization is enabled. With 16-bit WAV files for example, libsndfile writes values that are scaled by 32767/32768 compared to the conventional fractional number representation. This creates a number of problems. Writing a -1 value does not produce the expected negative full-scale value in the PCM WAV file. When a PCM WAV file is read, unchanged and written to another PCM WAV file of the same bit depth, the two files have different values. There is effectively a hidden quantization distortion inside of libsndfile. This behavior is unexpected by users and troubling to some (especially me for high-fidelity audio). I believe it is fair to say that libsndfile is making a conceptual abstraction mistake in this case. Here is where I believe that Erik de Castro Lopo makes the mistake. In his above FAQ entry, he says "Converting the other way, the only way to ensure that floating point values in the range Since Erik de Castro Lopo hasn't released an update since 2011, and since his above FAQ says that he knows about this issue and doesn't consider it large enough to do anything about, I don't have much hope that he would be willing to offer an option of "normalizing" using the conventional fractional number representation. Also, I want to start doing WAV I/O in about a month, so I don't have time to wait for Erik de Castro Lopo to do an update and propogate this up through PySoundFile. Instead, I propose that a goal for PySoundFile is to provide a clean and conventional abstraction (see my response in issue #24). Specifically, I propose that when writing a PCM WAV file from a floating-point array, PySoundFile clips and scales the array as I have showed above, and then uses libsndfile's integer interface (which behaves conventionally). If the PySoundFile team decides to continue to propagate this conceptual abstraction mistake (for backwards-compatibility reasons), then my secondary proposal is for PySoundFile to offer an optional flag argument for support of the conventional fractional number representation as I have shown above. I would also propose, then, that this is the default since libsndfile is unconventional and surprising. If the PySoundFile team decides to reject all of my proposals, then I see no benefit to disabling the normalization for floating-point writes to PCM files (issue #23). I would need to create a Python wrapper function that provides the conventional behavior for 16, 24 and 32-bit PCM WAV writes, and it would call PySoundFile with an integer array. Disabling the normalization would degrade PySoundFile's clean abstraction and add unnecessary complexity. |
You raise a very interesting issue. I don't see much conceptual difference between multiplication with 0x7FFF and multiplication with 0x8000 as long as it is done consistently for reading and writing. Can you point to some resource that makes one approach "better" than the other? After all, both approaches are wrong in some way: One approach maps -1.0 perfectly to negative full scale, but not +1.0. The other maps +1.0 perfectly to positive full scale, but not -1.0 (or rather: -1.0+eps). |
Lets look again at the weights of a 16-bit number in the two formats (as we were taught in Computer Engineering courses at the university): They are related by a factor of libsndfile using the wrong conversion factor for PCM writes is like legislating a new value for pi because some people find it inconvenient: Restating from my above post, some of the problems caused by libsndfile's mistake:
PySoundFile has the opportunity to correct the abstraction or at least provide the option for the correct abstraction to Python users. |
Frankly, I don't see how this would affect many people. If you do actually need bit-correctness, I think you should read the integer values instead of the floating point values. Don't get me wrong, I do see your point. This behavior changes the amplitude by 0.00025 dB. But this is so far beyond the hearing threshold that I don't think it matters. Again, can you point to some resources that clearly defines one approach "better" than the other? Or, maybe, can you point me to some problem for which either of these behaviors would be clearly wrong? |
I agree with @bastibe that both approaches are wrong. I think there is no correct solution in the general case, but of course situations can be conceived where either of the three is "correct" or "incorrect". Those are the problems as I see them:
Is it really? If I'm not mistaken, Matlab's What about other software?
Is it really? What happens if you write -1.0-2**-15 (or any even smaller number for that matter)?
True. But nobody does that anyway.
I see that it is annoying in @Whistler7's case, but it doesn't matter in many other cases. Often signals are generated somehow using I think PySoundFile should by default do what libsndfile does. After all, that's what people expect from a libsndfile wrapper. @Whistler7: what API would you suggest for your approach? Regarding #23, I still think it's useful for you to disable normalization in libsndfile because if you do |
Here is documentation on MATLAB's wavwrite() function: Notice the non-symmetrical data range when writing to (2's complement) PCM. I just confirmed, for floating-point input and writing to 16-bit, that it provides a warning for clipping +1, and no warning/clipping for -1 input. I have been using MATLAB for many years, and I am in the process of transitioning to Python for scientific work. I suspect that there are many others like me who would at least like the option for PySoundFile to behave like MATLAB's wavwrite() concerning conversion of floating-point to PCM. Regarding the API, I suggest an optional boolean argument for SoundFile() when creating the wave object instance. Maybe call the argument 'pow2_scaling' or 'pow2_write_scaling', which indicates that the scaling from floating-point to integer/PCM is a power of 2. This would be 32768 ( By the way, does PySoundFile clip and raise a warning for out-of-range write input data (like MATLAB does)? |
At the moment, PySoundCard leaves this solely to libsndfile. libsndfile has a flag for enabling/disabling normalization, but the documentation is somewhat unclear on how this behaves. I don't think any warnings are raised either. If anything, we could add this to the |
The topic was just brought up in the There are some interesting arguments plus a link to a blog post: http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html |
There is a new thread on the PortAudio mailing list: http://music.columbia.edu/pipermail/portaudio/2015-January/016501.html. Those links are mentioned in the responses: |
I'm moving @Whistler7's comment from #109 here:
I closed #20 because as far as I can tell, #104 changed PySoundFile's behavior (incidentally) exactly the way you asked for. Is this not what you wanted? As you can see from the tests (and from my comments in #104), we're now using the scaling factor 32768 instead of 32767.
If your code leads to different results, can you please share it? If you think this issue (#20) isn't solved yet, feel free to re-open it. |
I would like to interface Numpy arrays to 16, 24, 32 and 64-bit WAV formats, and PySoundFile looks like a good match for my needs. I saw the discussion in issue #17, and I plan to wait until the next version that avoids truncating 64-bit data to 32-bit before I install PySoundFile.
I plan to do all my signal processing with dtype=float64 and a range of -1 to +1. For 32-bit and 64-bit floating-point WAV files, this is the natural range. For the PCM WAV formats, scaling will be necessary to achieve this in the interface to Numpy arrays. I found another Python WAV interface package that supports this style:
https://github.com/gesellkammer/sndfileio
However, it does not support the 64-bit WAV format or Python 3.
My suggestion for PySoundFile is to support the [-1,1) range for audio data. I am not clear if PySoundFile already does this when using float64 Numpy arrays with PCM WAV files. If not, my suggestion is to add this as an option. Maybe using an optional 'normalized' argument. This would avoid the need for me to provide wrapper functions around PySoundFile's read and write functions in order to do the scaling that depends on the number of bits in the PCM format.
The scaling could be implemented in Numpy with the ldexp() function. I have not programmed in C++, but I think I saw that the ldexp() function in C++ also, which would be ideal for speed.
The text was updated successfully, but these errors were encountered: