# Reading and Writing Audio Files with wave

[back to overview page](index.ipynb)

The `wave` module is part of the Python standard library.

Documentation:

* http://docs.python.org/2/library/wave.html (Python 2.x)
* http://docs.python.org/3/library/wave.html (Python 3.x)

Audio data is handled with the Python type `str` (Python 2.x) or `bytes` (Python 3.x).

Advantages:

* part of the standard library, no further dependencies
* 24-bit files can be used (but manual conversion is necessary)
* partial reading is possible
* works with both Python 2 and 3

Disadvantages:

* 32-bit float not supported
* WAVEX doesn't work
* manual de-interleaving and conversion is necessary

## Reading

Reading a 16-bit WAV file into a NumPy array is not hard, but it requires a few lines of code:

In [None]:
import wave
import numpy as np
import utility

with wave.open('data/test_wav_pcm16.wav') as w:
    channels = w.getnchannels()
    assert w.getsampwidth() == 2
    data = w.readframes(w.getnframes())

sig = np.frombuffer(data, dtype='<i2').reshape(-1, channels)

normalized = utility.pcm2float(sig, np.float32)

But not so fast! Let's do that step-by-step and put a few explanations in between.

First, let's load matplotlib and NumPy and enable inline plotting:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

The most important module we need to load is the built-in `wave` module.

In [None]:
import wave

Now we open a 16-bit WAV file, show a few informations and read its contents.

Note: starting from Python 3.4, [wave.open()](https://docs.python.org/3/library/wave.html#wave.open)
returns a context manager which can be used in a ["with" statement](https://docs.python.org/3/reference/compound_stmts.html#the-with-statement).
For Python < 3.4, you should wrap the `open()` call with [contextlib.closing()](https://docs.python.org/3/library/contextlib.html#contextlib.closing).

In [None]:
with wave.open('data/test_wav_pcm16.wav') as w:
    framerate = w.getframerate()
    frames = w.getnframes()
    channels = w.getnchannels()
    width = w.getsampwidth()
    print('sampling rate:', framerate, 'Hz')
    print('length:', frames, 'samples')
    print('channels:', channels)
    print('sample width:', width, 'bytes')
    
    data = w.readframes(frames)

We see that the *sample width* is 2 bytes, which we expected for a 16-bit file.

If the audio file has 7 channels, is 15 samples long and uses 2 bytes (16 bit) per sample, the buffer has a size of $7 \times 15 \times 2 = 210$ bytes.

In [None]:
len(data)

Audio data is stored in "strings of bytes".
In Python 2 this type is called `str`, in Python 3 it's called `bytes` (which makes much more sense, doesn't it?).

In [None]:
type(data)

The buffer data looks like this (which isn't really helpful because it looks like garbage):

In [None]:
data

We could convert the bytes by means of the [`struct` module](https://docs.python.org/3/library/struct.html), but it's easier with [the `frombuffer()` function from NumPy](http://docs.scipy.org/doc/numpy/reference/generated/numpy.frombuffer.html).
We have to specify (similar to how it's done in the `struct` module) how the bytes should be interpreted to yield the desired numbers.

If we would interpret them as single (unsigned) bytes, it would look like this (still not very useful):

In [None]:
np.frombuffer(data, dtype='B')

To be able to do something useful with the data, we have to take pairs of 2 bytes and convert them to 16-bit integers. Thereby we have to consider that data in WAV files is stored in *little endian* format (the least significant byte comes first and then the most significant byte), which is specified with `'<'` in the format string.

Let's also reshape the array to get a column for each channel:

In [None]:
sig = np.frombuffer(data, dtype='<i2').reshape(-1, channels)
sig

Let's see how this look like:

In [None]:
plt.plot(sig);

Note that neither `frombuffer()` nor `reshape()` made a copy of the data. We are still using the buffer of bytes we got from `readframes()`. By using the `.base` attribute of the array we get the result of `frombuffer()` (before `reshape()`) and by using `.base` a second time, we get a reference to the original buffer object:

In [None]:
sig.base.base is data

With the `flags` attribute we get a few details about the buffer. `C_CONTIGUOUS` means that the rows are contiguous (in row-major format, like it's used in C). We also see that `sig` doesn't "own" the data (it's rather "borrowed" from the `data` object) and that it's not writable (because the `bytes` object returned by `readframes()` above happens to be read-only):

In [None]:
sig.flags

We've already got the correct values but if we want to do some signal processing with the data, it's normally easier to convert the numers to floating point format and to normalize them to a range from -1 to 1.

To do that, I wrote a little helper function called `utility.pcm2float()`, located in the file [utility.py](https://github.com/mgeier/python-audio/blob/master/audio-files/utility.py), let's load it and apply it to our signal:

In [None]:
import utility

normalized = utility.pcm2float(sig, 'float32')

plt.plot(normalized);

Because we change the data type from int16 to float32, a copy of the array is created, which now is writable and "owns" its data:

In [None]:
normalized.flags

24-bit files can be opened with `wave.open()` as well, but the conversion is a little more complicated.

In [None]:
with wave.open('data/test_wav_pcm24.wav') as w:
    framerate = w.getframerate()
    frames = w.getnframes()
    channels = w.getnchannels()
    width = w.getsampwidth()
    print('sampling rate:', framerate, 'Hz')
    print('length:', frames, 'samples')
    print('channels:', channels)
    print('sample width:', width, 'bytes')
    
    data = w.readframes(frames)

Note that the sample width is 3 bytes (because we loaded a 24-bit file). Sadly, there is no 3-byte integer type in NumPy. Therefore, we have to fill each 3-byte number with a further byte and then convert it to a 4-byte integer.
We can add this byte (filled with zero-bits) either as most significant or a least significant byte. This doesn't change the precision of the data, we just have to remember which one it was when we do calculations with the stored values.
If we add the zero-byte as LSB, the resulting values will have the full range of a 4-byte integer, therefore we can use the `utility.pcm2float()` function from above.
If we would add the zero-byte as MSB, the range would be limited to a 3-byte integer and we would have to write a new function for normalization.

In [None]:
assert width == 3

temp = bytearray()

for i in range(0, len(data), 3):
    temp.append(0)
    temp.extend(data[i:i+3])

# Using += instead of .extend() may be faster
# (see https://youtu.be/z9Hmys8ojno?t=35m50s).
# But starting with an empty bytearray and
# extending it on each iteration might be slow, anyway.
# See further below for how to reserve all necessary memory in the beginning.

four_bytes = np.frombuffer(temp, dtype='B').reshape(-1, 4)
four_bytes

Now we have a bytearray where each group of 4 bytes represents an integer (in little-endian order).

Next, let's convert it to actual integers and reshape the channels into columns.

In [None]:
sig = np.frombuffer(temp, dtype='<i4').reshape(-1, channels)
sig

Let's put this into a function, just in case we find ourselves needing to load 24-bit WAV files some time in the future.

In [None]:
def pcm24to32_bytearray(data, channels=1):
    if len(data) % 3 != 0:
        raise ValueError('Size of data must be a multiple of 3 bytes')

    size = len(data) // 3
    
    # reserve memory (initialized with null bytes):
    temp = bytearray(size * 4)

    for i in range(size):
        newidx = i * 4 + 1
        oldidx = i * 3
        temp[newidx:newidx+3] = data[oldidx:oldidx+3]

    return np.frombuffer(temp, dtype='<i4').reshape(-1, channels)

This looks a bit clumsy (*TODO:* timing measurements!), let's try it another way:

In [None]:
def pcm24to32_view(data, channels=1):
    if len(data) % 3 != 0:
        raise ValueError('Size of data must be a multiple of 3 bytes')

    temp = np.zeros((len(data) // 3, 4), dtype='B')
    temp[:, 1:] = np.frombuffer(data, dtype='B').reshape(-1, 3)
    return temp.view('<i4').reshape(-1, channels)

This is what we're doing here:
* create a zero-initialized array of bytes with 4 columns
* write the data to the last 3 columns (leaving the first column filled with zeros)
* interpret each row (4 bytes) as one integer

I think this is better than `pcm24to32_bytearray()`, but it still has a little flaw: the returned array doesn't own it's data.

In [None]:
sig = pcm24to32_view(data, channels)
sig.flags.owndata

This is because a *view* is returned. The original array (with its 4 columns) has exactly the same values as the array `four_bytes` which we were using before.

In [None]:
np.all(sig.base == four_bytes)

So instead of returning a view, let's turn this around and create the correct array first and use a view internally to copy the data.
Additionally, instead of using the `reshape()` method (which would also return a view), we directly set the shape.

In [None]:
def pcm24to32(data, channels=1):
    if len(data) % 3 != 0:
        raise ValueError('Size of data must be a multiple of 3 bytes')
    
    out = np.zeros(len(data) // 3, dtype='<i4')
    out.shape = -1, channels
    temp = out.view('B').reshape(-1, 4)
    temp[:, 1:] = np.frombuffer(data, dtype='B').reshape(-1, 3)
    return out

I think this one is better.

Now the returned array owns its data:

In [None]:
sig = pcm24to32(data, channels)
sig.flags.owndata

I added this implementation (plus a little tweak) to [utility.py](utility.py).
Let's have a look at the documentation:

In [None]:
help(utility.pcm24to32)

As you can see, I added the function argument `normalize`, which allows to select if the zero-byte shall be added as LSB or MSB.
The default behavior is unchanged, though.

In [None]:
sig = utility.pcm24to32(data, channels)

BTW, the result of all three implementations is of course the same:

In [None]:
np.all(sig == pcm24to32_view(data, channels)), np.all(sig == pcm24to32_bytearray(data, channels))

Finally, we convert the 32-bit integer values to 32-bit floating point values and normalize them to a range from -1 to 1 (see above).

In [None]:
normalized = utility.pcm2float(sig, 'float32')
plt.plot(normalized);

WAVEX doesn't work:

In [None]:
import traceback
try:
    w = wave.open('data/test_wavex_pcm16.wav')
except:
    traceback.print_exc()
else:
    print('It works (unexpectedly)!')

Let's try 32-bit float:

In [None]:
try:
    w = wave.open('data/test_wav_float32.wav')
except:
    traceback.print_exc()
else:
    print('It works (unexpectedly)!')

## Writing

TODO

Another way (without NumPy): http://soledadpenades.com/2009/10/29/fastest-way-to-generate-wav-files-in-python-using-the-wave-module/

Another way for 24-bit WAV files (with NumPy): https://github.com/WarrenWeckesser/wavio

## Version Info

In [None]:
import sys, IPython
print('Versions: NumPy = {}; IPython = {}'.format(np.__version__, IPython.__version__))

print('Python interpreter:')
print(sys.version)

<p xmlns:dct="http://purl.org/dc/terms/">
  <a rel="license"
     href="http://creativecommons.org/publicdomain/zero/1.0/">
    <img src="http://i.creativecommons.org/p/zero/1.0/88x31.png" style="border-style: none;" alt="CC0" />
  </a>
  <br />
  To the extent possible under law,
  <span rel="dct:publisher" resource="[_:publisher]">the person who associated CC0</span>
  with this work has waived all copyright and related or neighboring
  rights to this work.
</p>