Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing integer values through the non-numpy interface results in a wav of all 0s #405

Open
mcclure opened this issue Sep 24, 2023 · 13 comments

Comments

@mcclure
Copy link
Contributor

mcclure commented Sep 24, 2023

Consider this simple use of soundfile:

import soundfile as sf
SAMPLE_RATE = 48000
SHRT_MAX = 32767
with sf.SoundFile("testfile2.wav", mode = 'w', samplerate=SAMPLE_RATE, channels=2, subtype='PCM_16') as outfile:
    frames = []
    for i in range(SAMPLE_RATE):
        frames.append([i%SHRT_MAX, SHRT_MAX-(i%SHRT_MAX)])
    outfile.write(frames)

I expect this to create a wav containing a very slow sawtooth wave (it will not be audible but the waveform will be visible in an editor such as audacity) with a DC offset. Instead, it creates a wav in which all values are zero.

If I pass the values in as numpy rather than as python lists, it works and creates a usable, correct wav file:

import numpy as np
import soundfile as sf
SAMPLE_RATE = 48000
SHRT_MAX = 32767
with sf.SoundFile("testfile2.wav", mode = 'w', samplerate=SAMPLE_RATE, channels=2, subtype='PCM_16') as outfile:
    frames = []
    for i in range(SAMPLE_RATE):
        frames.append([i%SHRT_MAX, SHRT_MAX-(i%SHRT_MAX)])
    array = np.array(frames, dtype=np.int16)
    outfile.write(array)
    print(array)

It is possible that when using non-numpy input, Soundfile is expecting the values in some format (floats?) other than ints. If so, this is not documented. I assume PCM_16 values because I have specified PCM_16 subtype.

I have a non-numpy app in which I adopted python-soundfile because it had the ability to append to wavs. I understand python-soundfile is mostly intended for use with numpy, but it does advertise a non-numpy interface. Using without numpy, I encountered several inconvences, but this was the biggest problem because I could not resolve it by consulting the documentation. I ultimately had to just add numpy to my program.

@bastibe
Copy link
Owner

bastibe commented Sep 30, 2023

In general, soundfile expects normalized floating point data between -1 and 1. In integer formats this is clearly not possible, so it scales between the appropriate minimum and maximum.

This is documented in soundfile.read, but not mentioned as clearly in soundfile.write. Would you like to contribute an improved documentation as a pull request?

@mcclure
Copy link
Contributor Author

mcclure commented Sep 30, 2023

I am open to contributing a PR, but if I am going to write docs I need to be sure I understand the behavior.

When you say "scales between the appropriate minimum and maximum", you mean it scales from the range -1…1 to the minimum and maximum values of the output sound file's type?

This description surprises me because in my failed test above I was giving values outside the range -1…1 and they were being snapped to 0 rather than scaled or clipped to the -1…1 range. Is there special case handling for values outside -1…1?

@bastibe
Copy link
Owner

bastibe commented Sep 30, 2023

Integer values scale between the minimum and maximum allowable integer value. That's -2^15 ... (2^15 - 1) for int16, and -2^31 ... (2^31 - 1) for int32. Ignore the asymmetry between positive and negative values; IIRC, the scaling factor is actually always symmetrically 2^15 or 2^31, with the odd 2^15 clipped to 2^15-1.

Thus integer-1 scales 1/2^15, which is very close to zero.

But this only applies to numpy arrays with dtype int16 or int32. Python lists are converted to a float numpy array, which must be between -1 and 1.

@mgeier
Copy link
Contributor

mgeier commented Sep 30, 2023

What @bastibe mentioned above is mostly correct, but Python lists are not necessarily converted to float arrays.

The code uses np.asarray(), which converts (nested) Python lists to NumPy arrays. If only integers are used, the dtype is chosen to be int64 by default (and we don't specify a dtype explicitly):

>>> import numpy as np
>>> np.asarray([32767]).dtype
dtype('int64')

This means that in your example, you are essentially writing 16-bit data to the least significant bytes of a 64-bit array.
Later, this 64-bit array is scaled down to PCM_16, losing all those insignificant bytes and leaving you with only zeros.

For comparison, in your NumPy example, you are writing your data to a 16-bit array, which is then not scaled down, leading to your expected result.

Using without numpy, I encountered several inconvences

I think the misunderstanding is that you think you are using it without NumPy, but as soon as you use the .write() method, you are indirectly using NumPy, even if you don't import NumPy in your own code.

I think this has been clarified already in #407: If you don't want to use NumPy, you should use the .buffer_*() methods.

@mgeier
Copy link
Contributor

mgeier commented Sep 30, 2023

When you say "scales between the appropriate minimum and maximum", you mean it scales from the range -1…1 to the minimum and maximum values of the output sound file's type?

To get a clearer idea about the scaling and clipping behavior, you can have a look at the tests:

def test_write_float_data_to_pcm_file(file_inmemory):
float_to_clipped_int16 = [
(-1.0 - 2**-15, -2**15 ),
(-1.0 , -2**15 ),
(-1.0 + 2**-15, -2**15 + 1),
( 0.0 , 0 ),
( 1.0 - 2**-14, 2**15 - 2),
( 1.0 - 2**-15, 2**15 - 1),
( 1.0 , 2**15 - 1),
]
written, expected = zip(*float_to_clipped_int16)
sf.write(file_inmemory, written, 44100, format='WAV', subtype='PCM_16')
file_inmemory.seek(0)
read, fs = sf.read(file_inmemory, dtype='int16')
assert np.all(read == expected)
assert fs == 44100

It also happens in the other direction, but that is a very exotic use case:

def test_read_int_data_from_float_file(file_inmemory):
"""This is a very uncommon use case."""
unnormalized_float_to_clipped_int16 = [
(-2.0**15 - 1 , -2**15),
(-2.0**15 , -2**15),
(-2.0**15 + 1 , -2**15 + 1),
(-1.0 , -1),
(-0.51 , -1),
(-0.5 , 0),
( 0.0 , 0),
( 0.5 , 0),
( 0.51 , 1),
( 1.0 , 1),
( 2.0**15 - 2 , 2**15 - 2),
( 2.0**15 - 1 , 2**15 - 1),
( 2.0**15 , 2**15 - 1),
]
file_data, expected = zip(*unnormalized_float_to_clipped_int16)
sf.write(file_inmemory, file_data, 44100, format='WAV', subtype='FLOAT')
file_inmemory.seek(0)
read, fs = sf.read(file_inmemory, always_2d=False, dtype='int16')
assert np.all(read == expected)
assert fs == 44100

@bastibe
Copy link
Owner

bastibe commented Sep 30, 2023

Thank you for the clarification, @mgeier. I wasn't aware that numpy.ndarray converts an all-int array to int64. That's a surprising gotcha for our use case, actually.

So integer lists must be treated as int64, and scale between -2^63 and 2^63. That's kind of annoying.

@mcclure
Copy link
Contributor Author

mcclure commented Sep 30, 2023

So integer lists must be treated as int64, and scale between -2^63 and 2^63. That's kind of annoying.

Hm, yeah, from a user perspective this is an unusual enough range that if it cannot be changed to something less surprising, it might actually be better for it to error on integer lists.

@bastibe
Copy link
Owner

bastibe commented Oct 1, 2023

I suppose we could work around this issue by forcing any non-numpy-array iterables to float64 in soundfile.py:1019. This would technically be a backwards-incompatible change, though, which I am somewhat weary to make.

Also, this is the first time I have heard of this issue, so I'd assume it is a rare occurrence. Nevertheless, it should be documented at the very least.

@mcclure
Copy link
Contributor Author

mcclure commented Oct 1, 2023

I am going to attempt to send you one or more docs PRs today.

mcclure added a commit to mcclure/python-soundfile that referenced this issue Oct 2, 2023
* Clarifies range of python integer input to write(), addressing issue bastibe#405.
* Clarifies, which came up in issue bastibe#407.
* Clarifies how to build the docs, which confused me while preparing this PR
  (Simply installing Sphinx is not enough)
mcclure added a commit to mcclure/python-soundfile that referenced this issue Oct 2, 2023
* Clarifies range of python integer input to write(), addressing issue bastibe#405.
* Clarifies endianness of buffer_write input, which came up in issue bastibe#407.
* Clarifies how to build the docs, which confused me while preparing this PR
  (Simply installing Sphinx is not enough)
@mcclure
Copy link
Contributor Author

mcclure commented Oct 4, 2023

I need clarification on something before I continue with #410.

The code uses np.asarray(), which converts (nested) Python lists to NumPy arrays. If only integers are used, the dtype is chosen to be int64 by default (and we don't specify a dtype explicitly)

This is not exactly what I see in testing.

Consider this sample program:

import soundfile
s = soundfile.SoundFile("outtest.wav", mode = 'w', samplerate=48000, channels=1, subtype='PCM_16')
s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)-1])
print(s)

Result: Saves a wav file with 4 samples, first 2 are 0, third is minimum intensity, final is maximum intensity

import soundfile
s = soundfile.SoundFile("outtest.wav", mode = 'w', samplerate=48000, channels=1, subtype='PCM_16')
s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)])
print(s)

Result:

Traceback (most recent call last):
  File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1328, in _check_dtype
    return _ffi_types[dtype]
           ~~~~~~~~~~^^^^^^^
KeyError: 'int64'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Andi\work\g\other\python-soundfile\64TEST.py", line 3, in <module>
    s.write([0,1,(1<<16)-1,-(1<<31),(1<<31)])
  File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1020, in write
    written = self._array_io('write', data, len(data))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1342, in _array_io
    ctype = self._check_dtype(array.dtype.name)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Andi\work\g\other\python-soundfile\soundfile.py", line 1330, in _check_dtype
    raise ValueError("dtype must be one of {0!r} and not {1!r}".format(
ValueError: dtype must be one of ['float32', 'float64', 'int16', 'int32'] and not 'int64'

In other words, from this and other tests I conclude the behavior for python int inputs is:

  • If all values are in the range of 32-bit INT_MIN to INT_MAX inclusive, the values are scaled FROM int32s TO the output format
  • If at least one value is outside the 32-bit signed integer range, there is an error

This behavior is non-ideal, but it seems to at least be consistent (I did not test on non-win10 systems).

I will change my documentation in #410 to match this behavior, but before I do, does this sound right? Am I missing anything?

@mgeier
Copy link
Contributor

mgeier commented Oct 5, 2023

What does this return?

>>> import numpy as np
>>> np.asarray([1]).dtype

@mcclure
Copy link
Contributor Author

mcclure commented Oct 6, 2023

Okay.
I have some very bad news.
In my Windows 10 venv:

(VENV) C:\Users\Andi\work\g\other\python-soundfile>python
Python 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.asarray([1]).dtype
dtype('int32')
>>> np.version.version
'1.26.0'

In my WSL (linux VM) venv on the same machine:

(WSLVENV) Andi@Alita:/mnt/c/Users/Andi/work/g/other/python-soundfile$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.asarray([1]).dtype
dtype('int64')
>>> np.version.version
'1.24.4'

I can't explain this. What does it do on your machine?

If the scale is potentially different (32 or 64 bit) depending on system/configuration, this suggests to me the int form simply should not be used and the documentation should say something like "if the input is a python int array the behavior is undefined".

Alternately, before @bastibe said he didn't want to change the behavior saying

This would technically be a backwards-incompatible change, though, which I am somewhat weary to make.

But, if it was actually broken before (per the tests I pasted above, even if np manages to assign the array an int64, python-soundfile rejects it) maybe you don't need to worry about backward compatibility. If it never worked it seems like logically there must not be existing code to break?

@bastibe
Copy link
Owner

bastibe commented Oct 6, 2023

Oh, this is indeed terrible news. Thank you very much for the analysis!

So essentially, there is no consistent option for handling lists of integers across platforms. We could force any non-numpy type to float, but this would break lists of integers entirely. I don't really see a good way forward beyond documenting that "any non-numpy data will be converted to numpy using np.asarray", and noting that using non-numpy integer data is a very bad idea indeed.

In fact, we should consider issuing a warning whenever np.asarray returns an integer type (and it wasn't already a numpy array beforehand). What are your thoughts on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants