Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support interface normalized to [-1,1) range? #20

Closed
Whistler7 opened this issue Mar 13, 2014 · 13 comments
Closed

Support interface normalized to [-1,1) range? #20

Whistler7 opened this issue Mar 13, 2014 · 13 comments
Labels
Milestone

Comments

@Whistler7
Copy link

I would like to interface Numpy arrays to 16, 24, 32 and 64-bit WAV formats, and PySoundFile looks like a good match for my needs. I saw the discussion in issue #17, and I plan to wait until the next version that avoids truncating 64-bit data to 32-bit before I install PySoundFile.

I plan to do all my signal processing with dtype=float64 and a range of -1 to +1. For 32-bit and 64-bit floating-point WAV files, this is the natural range. For the PCM WAV formats, scaling will be necessary to achieve this in the interface to Numpy arrays. I found another Python WAV interface package that supports this style:
https://github.com/gesellkammer/sndfileio
However, it does not support the 64-bit WAV format or Python 3.

My suggestion for PySoundFile is to support the [-1,1) range for audio data. I am not clear if PySoundFile already does this when using float64 Numpy arrays with PCM WAV files. If not, my suggestion is to add this as an option. Maybe using an optional 'normalized' argument. This would avoid the need for me to provide wrapper functions around PySoundFile's read and write functions in order to do the scaling that depends on the number of bits in the PCM format.

The scaling could be implemented in Numpy with the ldexp() function. I have not programmed in C++, but I think I saw that the ldexp() function in C++ also, which would be ideal for speed.

@mgeier
Copy link
Contributor

mgeier commented Mar 14, 2014

PySoundFile uses libsndfile, which by default normalizes PCM data to the range [-1, 1) if floating point values are requested, just as you like it (see note 1).

The normalization is on by default and can be switched off, but this is not (yet) supported in PySoundFile.

You can already use the current release; if you request np.float64 in the read() method, you should get exactly what you want.
The result is only truncated if you use the default format np.float32, (which will probably be changed to np.float64 in a future release).

@Whistler7
Copy link
Author

@mgeier: thanks for the quick and detailed reply.

When I looked more closely at libsndfile, I see that it does what I need for reads but not writes to PCM WAV files. This FAQ explains what libsndfile does:
http://www.mega-nerd.com/libsndfile/FAQ.html#Q010

I need to have my function that writes PCM WAV files bit-exact with the data in my float64 Numpy array, and I plan to handle in Python/Numpy the two's-complement saturation and quantization appropriate to the number of bits in the target PCM WAV file (before calling the WAV write function). For example, when writing a 16-bit PCM WAV file, I need my WAV write function to multiply my [-1, 1) range data by 32768, not 32767 as PySoundFile/libsndfile does today.

I see two choices to meet my needs:

  1. I could write a wrapper function that converts my [-1, 1) range floating-point data to integer (using the appropriate power-of-2 scaler) before calling the PySoundFile WAV write function.
  2. PySoundFile could offer an optional flag argument for bit-exact conversion from floating-point to integer.

The libsndfile FAQ says "Some people would say that this is a severe short-coming of libsndfile." I would say that it is a short-coming not to offer users the option of a bit-exact conversion. Especially for users like myself who optimize dither for sound quality in compact discs. However, PySoundFile can offer more flexibility than libsndfile. If fact, PySoundFile could also provide options for two's-complement saturation and quantization or raising an exception if these operations would otherwise change the data. This would allow PySoundFile to be used by both novices and experienced DSP engineers.

As a DSP engineer, these new features would be simple for me to add in Python (but not in C++). The reason I bring these suggestions up is because I think they would make PySoundFile and Python more useful to the community.

@mgeier
Copy link
Contributor

mgeier commented Mar 18, 2014

@Whistler7: OK, I understand your motivation now ...

I don't think, however, that PySoundFile should do anything to the actual values of the audio data. It should support (nearly) all parameters to control libsndfile and it should provide the data via NumPy arrays in a meaningful and flexible way.

If you want to change how normalization is done (or add an additional flag), you should ask Erik de Castro Lopo, the original author of libsndfile.

Or, which will be probably more successful, you should disable the normalization and do it properly in your own code.

Disabling the normalization is not yet supported in PySoundFile, but at least I created an issue for it: #23

@Whistler7
Copy link
Author

I see two conventional number representations for PCM WAV files in science and engineering: integer and fractional. In other words, these number representations are two different mental concepts on how to interpret the same binary fixed-point data in the file. The mental concepts determine the mapping of file values into Python (floating-point) values. Since PCM WAV files use two's complement binary representation, the positive and negative full-scale ranges are non-symmetrical. For 16-bit WAV files, the range is [-32768, 32767] for integer, and [-1.0, 1.0 - (2**-15)] for fractional. In detail, the weights of the integer bits are (-2**15, 2**14, 2**13, ..., 2**0), and the weights of the fractional bits are (-2**0, 2**-1, 2**-2, ..., 2**-15).

For reading from PCM WAV files, libsndfile follows the above conventions when returning a floating-point array (fractional when normalization is enabled, integer when normalization is disabled). PySoundFile doesn't need to support disabled normalization because the integer number representation is accessible by returning an integer array. Everybody is happy with reads.

For writing to a PCM WAV file from a floating-point array, libsndfile has created a unique and unconventional number representation when normalization is enabled. With 16-bit WAV files for example, libsndfile writes values that are scaled by 32767/32768 compared to the conventional fractional number representation. This creates a number of problems. Writing a -1 value does not produce the expected negative full-scale value in the PCM WAV file. When a PCM WAV file is read, unchanged and written to another PCM WAV file of the same bit depth, the two files have different values. There is effectively a hidden quantization distortion inside of libsndfile. This behavior is unexpected by users and troubling to some (especially me for high-fidelity audio). I believe it is fair to say that libsndfile is making a conceptual abstraction mistake in this case.

Here is where I believe that Erik de Castro Lopo makes the mistake. In his above FAQ entry, he says "Converting the other way, the only way to ensure that floating point values in the range [-1.0, 1.0] are within the valid range allowed by a 16 bit short is to multiply by 0x7FFF." As a professional DSP engineer, I suggest that the more conventional approach is to saturate/limit/clip the floating-point array to the valid range of [-1.0, 1.0 - (2**-15)] for two's complement and then multiply by 0x8000. In Python/NumPy, this can be implemented as np.clip(in_array, -1.0, 1.0 - (2**-15)). Then the number representation for PCM WAV writes would match that of reads, and everyone would be happy.

Since Erik de Castro Lopo hasn't released an update since 2011, and since his above FAQ says that he knows about this issue and doesn't consider it large enough to do anything about, I don't have much hope that he would be willing to offer an option of "normalizing" using the conventional fractional number representation. Also, I want to start doing WAV I/O in about a month, so I don't have time to wait for Erik de Castro Lopo to do an update and propogate this up through PySoundFile.

Instead, I propose that a goal for PySoundFile is to provide a clean and conventional abstraction (see my response in issue #24). Specifically, I propose that when writing a PCM WAV file from a floating-point array, PySoundFile clips and scales the array as I have showed above, and then uses libsndfile's integer interface (which behaves conventionally).

If the PySoundFile team decides to continue to propagate this conceptual abstraction mistake (for backwards-compatibility reasons), then my secondary proposal is for PySoundFile to offer an optional flag argument for support of the conventional fractional number representation as I have shown above. I would also propose, then, that this is the default since libsndfile is unconventional and surprising.

If the PySoundFile team decides to reject all of my proposals, then I see no benefit to disabling the normalization for floating-point writes to PCM files (issue #23). I would need to create a Python wrapper function that provides the conventional behavior for 16, 24 and 32-bit PCM WAV writes, and it would call PySoundFile with an integer array. Disabling the normalization would degrade PySoundFile's clean abstraction and add unnecessary complexity.

@bastibe
Copy link
Owner

bastibe commented Mar 20, 2014

You raise a very interesting issue.

I don't see much conceptual difference between multiplication with 0x7FFF and multiplication with 0x8000 as long as it is done consistently for reading and writing. Can you point to some resource that makes one approach "better" than the other?

After all, both approaches are wrong in some way: One approach maps -1.0 perfectly to negative full scale, but not +1.0. The other maps +1.0 perfectly to positive full scale, but not -1.0 (or rather: -1.0+eps).

@Whistler7
Copy link
Author

Lets look again at the weights of a 16-bit number in the two formats (as we were taught in Computer Engineering courses at the university):
Integer: (-2**15, 2**14, 2**13, ..., 2**0)
Fractional: (-2**0, 2**-1, 2**-2, ..., 2**-15)

They are related by a factor of 2**15 = 32768. Thus, 32768 is the correct conversion factor. Note in libsndfile's FAQ that it uses 32768 for PCM reads (correct) and 32767 for PCM writes (incorrect). Thus, libsndfile is not even self-consistent!

libsndfile using the wrong conversion factor for PCM writes is like legislating a new value for pi because some people find it inconvenient:
https://en.wikipedia.org/wiki/Indiana_Pi_Bill

Restating from my above post, some of the problems caused by libsndfile's mistake:

  • It is unconventional/unexpected/surprising
  • Not possible to write the negative full-scale value in PCM WAV files (-32768 for 16 bit)
  • A PCM WAV file read into Python and written back with the same bit depth (via libsndfile) does not produce an identical file
  • When data in the range of [-1.0, 1.0) has been quantized to 16 bits, libsndfile's 32767 conversion factor causes additional quantization distortion to occur.

PySoundFile has the opportunity to correct the abstraction or at least provide the option for the correct abstraction to Python users.

@bastibe
Copy link
Owner

bastibe commented Mar 20, 2014

Frankly, I don't see how this would affect many people. If you do actually need bit-correctness, I think you should read the integer values instead of the floating point values.

Don't get me wrong, I do see your point. This behavior changes the amplitude by 0.00025 dB. But this is so far beyond the hearing threshold that I don't think it matters. Again, can you point to some resources that clearly defines one approach "better" than the other? Or, maybe, can you point me to some problem for which either of these behaviors would be clearly wrong?

@mgeier
Copy link
Contributor

mgeier commented Mar 21, 2014

I agree with @bastibe that both approaches are wrong.
There is even a third approach which is also wrong: using 32767 for both reading and writing.

I think there is no correct solution in the general case, but of course situations can be conceived where either of the three is "correct" or "incorrect".

Those are the problems as I see them:
@Whistler7's approach clips +1.0 on write, libsndfile's approach scales between read and write and the "third approach" creates a floating point value less than -1.0 on read.

  • It is unconventional/unexpected/surprising

Is it really?

If I'm not mistaken, Matlab's wavwrite() uses @Whistler7's approach but issues a warning if +1.0 is clipped, I think Octave doesn't print the warning (but I'm not sure if it even uses the same approach as Matlab).

What about other software?

  • Not possible to write the negative full-scale value in PCM WAV files (-32768 for 16 bit)

Is it really?

What happens if you write -1.0-2**-15 (or any even smaller number for that matter)?

  • A PCM WAV file read into Python and written back with the same bit depth (via libsndfile) does not produce an identical file

True. But nobody does that anyway.

  • When data in the range of [-1.0, 1.0) has been quantized to 16 bits, libsndfile's 32767 conversion factor causes additional quantization distortion to occur.

I see that it is annoying in @Whistler7's case, but it doesn't matter in many other cases.

Often signals are generated somehow using float32/float64 and then normalized by dividing by the absolute maximum before writing them to PCM files. In this case libsndfile does exactly the right thing by not clipping +1.0.
The worst case would probably be if someone creates a full-scale rectangular signal. Clipping +1.0 would introduce a DC offset and scaling wouldn't.

I think PySoundFile should by default do what libsndfile does. After all, that's what people expect from a libsndfile wrapper.
However, if there is a meaningful way to do it, we can try to offer the other option(s) as well.

@Whistler7: what API would you suggest for your approach?

Regarding #23, I still think it's useful for you to disable normalization in libsndfile because if you do in_array.astype(np.int16), as you suggested, NumPy allocates additional memory and makes a copy of the whole signal. If I'm not mistaken, libsndfile would do the conversion on-the-fly, thereby saving memory and CPU cycles.

@Whistler7
Copy link
Author

Here is documentation on MATLAB's wavwrite() function:
http://www.mathworks.com/help/matlab/ref/wavwrite.html

Notice the non-symmetrical data range when writing to (2's complement) PCM. I just confirmed, for floating-point input and writing to 16-bit, that it provides a warning for clipping +1, and no warning/clipping for -1 input.

I have been using MATLAB for many years, and I am in the process of transitioning to Python for scientific work. I suspect that there are many others like me who would at least like the option for PySoundFile to behave like MATLAB's wavwrite() concerning conversion of floating-point to PCM.

Regarding the API, I suggest an optional boolean argument for SoundFile() when creating the wave object instance. Maybe call the argument 'pow2_scaling' or 'pow2_write_scaling', which indicates that the scaling from floating-point to integer/PCM is a power of 2. This would be 32768 (2**15) for 16-bit and 2**23 for 24-bit. If backwards compatibility is desired, the default for the argument would be be False, indicating to use a scale factor of 32767 for 16-bit (and so forth).

By the way, does PySoundFile clip and raise a warning for out-of-range write input data (like MATLAB does)?

@bastibe
Copy link
Owner

bastibe commented Mar 27, 2014

At the moment, PySoundCard leaves this solely to libsndfile. libsndfile has a flag for enabling/disabling normalization, but the documentation is somewhat unclear on how this behaves. I don't think any warnings are raised either.

If anything, we could add this to the write/read methods.

@mgeier
Copy link
Contributor

mgeier commented May 12, 2014

The topic was just brought up in the libsndfile PortAudio mailing list: http://music.columbia.edu/pipermail/portaudio/2014-May/016071.html

There are some interesting arguments plus a link to a blog post: http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html

@mgeier mgeier modified the milestone: 0.7 Oct 26, 2014
@bastibe bastibe removed this from the 0.6.x milestone Nov 24, 2014
@mgeier
Copy link
Contributor

mgeier commented Jan 7, 2015

@mgeier
Copy link
Contributor

mgeier commented Mar 2, 2015

I'm moving @Whistler7's comment from #109 here:

I saw that issue #20 was closed yesterday without an option being added for users who need to interpret two's complement fixed-point WAV files in the same way that hardware manufacturers do. Erik de Castro Lopo's 32767 scale factor for 16-bit WAV files produces audible distortion, so it is usable [note from @mgeier: I guess you mean "unusable"] for me.

I closed #20 because as far as I can tell, #104 changed PySoundFile's behavior (incidentally) exactly the way you asked for.
You should have a look at 7724a09 (which is now in the master branch), where I added a test that shows the exact numbers that are involved (you can also find them in the file tests/test_pysoundfile.py in the function test_clipping_float_to_int()).

Is this not what you wanted?

As you can see from the tests (and from my comments in #104), we're now using the scaling factor 32768 instead of 32767.

Therefore, I wrote my own code to translate between 64-bit float and fixed-point WAV files.

If your code leads to different results, can you please share it?
This way we might be able to see what you actually want.

If you think this issue (#20) isn't solved yet, feel free to re-open it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants