Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2-dimensional indexing on SoundFile objects? #15

Closed
mgeier opened this issue Feb 25, 2014 · 19 comments
Closed

2-dimensional indexing on SoundFile objects? #15

mgeier opened this issue Feb 25, 2014 · 19 comments

Comments

@mgeier
Copy link
Contributor

mgeier commented Feb 25, 2014

As mentioned in #12, it would be nice to select certain channels by indexing.

I wanted to create a pull request, but I have still a few questions which make me think maybe it's not such a good idea after all.

Should data which was read before be somehow cached?

If a channel is requested from an interleaved file, all channels have to be read anyway, so if another channel is requested afterwards it would make sense to re-use the data.
OTOH, if only one of many channels is ever needed, the memory for all other channels will not be freed up.

Should indexing be possible when writing files?
In this case the affected frames would have to be read first, only to overwrite a part of them afterwards.
But I guess this doesn't make sense in any case ...

All-in-all, maybe the current read() and write() methods are enough, and indexing isn't needed at all?

Maybe instead, to be able to be as succinct as possible, there should be some convenience functions à la Octave's wavread() (to avoid explicit creation of the SoundFile object)?
Because then, indexing could be used like this:

from pysoundfile import readfile
wave = readfile('existing_file.wav')[:, 1]

And if someone doesn't want to read the whole file, they'll have to fall back to read():

from pysoundfile import SoundFile
with SoundFile('existing_file.wav') as f:
    f.seek_absolute(22050)
    wave = f.read(44100)[:, 1]

I guess this wouldn't be much worse than:

from pysoundfile import SoundFile
wave = SoundFile('existing_file.wav')[22050:66150, 1]

(Although latter may have the __del__() problem discussed in #13)

@bastibe
Copy link
Owner

bastibe commented Feb 26, 2014

You are correct, that insofar as reading the file is concerned, two dimensional indexing is no advantage to just reading all the channels.

However, I think it is much more convenient and much easier to understand to write

wave = SoundFile('file.wav')[22050:66150,1]

instead of

wave = SoundFile('file.wav')[22050:66150][:,1]

In ac983e5 , I implemented two dimensional indexing simply as syntactic sugar that converts the first expression into the second. The whole numpy array is still read, and the second index is only applied after the whole data was read. Indeed, libsndfile does not provide a way to access single channels anyway, so this is probably the only possible solution.

@bastibe
Copy link
Owner

bastibe commented Feb 26, 2014

As for wavread-like functionality, I don't see much difference between

s = wavread('file.wav')

and

s = SoundFile('file.wav')[:]

or

s = SoundFile('file.wav').read()

Do you think wavread-like functionality is really necessary?

@mgeier
Copy link
Contributor Author

mgeier commented Feb 28, 2014

You are correct, that insofar as reading the file is concerned, two dimensional indexing is no advantage to just reading all the channels.

However, I think it is much more convenient and much easier to understand to write

wave = SoundFile('file.wav')[22050:66150,1]

instead of

wave = SoundFile('file.wav')[22050:66150][:,1]

True. But what about this:

import PySoundFile as sf
wave = sf.read('file.wav')[22050:66150,1]

In this hypothetical case, read() is a function that returns the
whole file as NumPy array.
It would internally open the file, read from it and close the file.
Or what about this:

wave = sf.read('file.wav', start=22050, stop=66150)[:, 1]

This wouldn't be the most common use, though, the most common would be:

wave = sf.read('file.wav')

... which in my opinion easier to grasp then

wave = SoundFile('file.wav')[:]

or

wave = SoundFile('file.wav').read()

And, just to be clear, I don't suggest to abandon the class
SoundFile entirely (just the indexing feature).
In more special cases of course the SoundFile object can be used
directly, ideally within a context manager.

In ac983e5 , I implemented two dimensional indexing simply as syntactic sugar that converts the first expression into the second. The whole numpy array is still read, and the second index is only applied after the whole data was read. Indeed, libsndfile does not provide a way to access single channels anyway, so this is probably the only possible solution.

I guess this is better than before, because only allowing
one-dimensional indexing seemed odd.

But not caching the data also seems strange, especially if someone
wants to read 2 (or more) channels separately.
Each time a full np.ndarray with all channels will be created and a
view for one channel will be returned.
If I'm not mistaken, the whole array will be kept in memory (via the
wave.base reference) until the returned view is destroyed.
That would mean, for example, if I load 2 separate channels from a
7-channel file, 14 channels would be kept im memory, right?

So probably if a subset of channels is selected they should be copied
to a new (and smaller) array?
Which again would incur a performance overhead and additional memory
allocation which also would be very bad.

All in all, regardless of the chosen syntax, I think it is impossible
to get that right, so selecting individual channels should be left to
the user.

@bastibe
Copy link
Owner

bastibe commented Feb 28, 2014

I think I finally get your point. Just to make sure: You are saying that indexing a SoundFile object just like a Numpy array is unintuitive, and doing the same on a Numpy array would be better.

I can agree with that. However, I think indexing the SoundFile object is extremely convenient and one of the core features of PySoundFile. Also, as I said, there is no way to read single channels in portaudio, so we'll have to live with the inefficiencies this affords us.

That said, we could easily allocate a new array instead of returning the view. Did you really run into that problem though? Because I have quite a few friends who work with audio stuff and this is the least of their worries.

@bastibe
Copy link
Owner

bastibe commented Feb 28, 2014

Also, I think there is some confusion regarding the lifetime of SoundFile objects:

wave = SoundFile('file.wav')[:]
wave = SoundFile('file.wav').read()

are equivalent, and both SoundFile objects will deallocate right on that very line. They will not stay in memory beyond that and the file will not be kept open. I see no reason at all to change this to

wave = wavread()

If you really want wavread, it's literally just lambda x: SoundFile(x).read().

@mgeier
Copy link
Contributor Author

mgeier commented Mar 2, 2014

In order to be able to give examples for my answer here I tried the two-dimensional feature and promptly found a bug. In line 445 instead of

if second_frame:

you should write

if second_frame is not None:

Otherwise, whenever you request the first channel (second_frame == 0), you'll get the whole array.

This is not a biggie, bugs happen all the time, but it illustrates my point: You are implementing stuff that's already implemented in NumPy!

I deliberately didn't open a new issue for that because I think the best fix is to remove the code completely.

About indexing being intuitive:
I do think indexing would be intuitive, but only if it is two-dimensional and only if it's done right.
And, as I mentioned above, I believe two-dimensional indexing cannot be done right (because of caching and copying issues which cause CPU and/or memory overhead).
Indexing on a NumPy array wouldn't be more intuitive, but it would be much easier for people to get what they expect. If they use indexing on a NumPy array, they know what they get (a view, a copy, ...), correctly indexing a SoundFile object would be something they have to learn.

About indexing being convenient:
It doesn't matter if it is convenient or if it is a core feature.
As long as it cannot be implemented right (which it cannot, as I stated above), it shouldn't be implemented.

About returning new memory:
This would only in a few cases save memory, in others it would waste memory, in all cases it would waste CPU.
I didn't run into problems regarding memory usage, but I'm convinced users can handle their memory requirements better if they are able to use the ndarrays they are used to instead of indexing into a SoundFile object which they might not be very familiar with.

It is impossible with libsndfile to read single channels therefore PySoundFile shouldn't claim to do it either.

Regarding lifetime:

Of course the SoundFile objects in your example don't persist after the line they're on, but the returned array does.
And if it is a view, it still needs the whole original array (which was called np_data in read()) and therefore the original CFFI buffer stays in memory (and can be accessed with wave.base in your example).
If you read two channels from the same SoundFile object (but indexing two times separately), each time a new CFFI buffer (holding all channels) will be created.

Example (with my strange test file, 15 frames, 7 channels):

from pysoundfile import SoundFile
with SoundFile('data/test_wav_pcm16.wav') as f:
    sig1 = f[:, 2]
    sig2 = f[:, 5]
sig1.size, sig2.size, sig1.base.size, sig2.base.size

>>> (15, 15, 105, 105)

The two buffers hold the same data (the whole file contents) but at separate memory locations:

all(sig1.base == sig2.base)
>>> True
sig1.base is sig2.base
>>> False

If PySoundFile would never return single channels (which I strongly suggest), this can still happen but then It's the user's fault.

Regarding the necessity of a wavread-like functionality:

Yes! I really want (even need) such a thing!

I'm trying to convert people from using Matlab to using Python+NumPy+... and every obstacle will make it harder for them and therefore for me.
It might seem like a small thing but it is hard to explain why it is necessary to append [:] or .read() to the constructor and why there isn't a simpler way to do that (using a function).
Apart from that, reading and writing audio files is also very common in interactive sessions, and saving a few keystrokes (especially brackets and stuff that cannot be auto-completed) can make a big difference.

Of course it would be easy for me to implement it locally, but regardless how trivial it may be, I'll have to do this for each project, for each example I show someone, each time I quickly want to try something, ...
It would make my life (and that of my colleagues) easier if such convenience functions were included in PySoundFile.

Don't get me wrong, I definitely don't want a Matlab-clone, I just want a clear and easy high-level interface (in addition to a fully-fledged low-level interface).

In closing:
I will implement said convenience functions anyway and you can decide after I show them to you if you like them or not.

@bastibe
Copy link
Owner

bastibe commented Mar 3, 2014

Thank you very much for pointing out that bug! Quite a nasty one, that. I fondly remember working with Lua, where only false and nil evaluated to False. That makes so much more sense...

@bastibe
Copy link
Owner

bastibe commented Mar 3, 2014

Regarding wavread, I concede your point. It should not be called wavread, though, since it can read so much more than wavs. How about soundread, audioread, or just simply read?

from pysoundfile import read as wavread  
wavread('file.wav')

import pysoundfile as sf
sf.read('file.wav')

Looks good to me.

@bastibe
Copy link
Owner

bastibe commented Mar 3, 2014

Regarding indexing:

I understand everything you are saying, but I really want that feature. As you rightly point out, there are many issues with indexing in general, and two-dimensional indexing in particular, but that feature is just incredibly powerful for my particular use cases. It is implemented as a simple wrapper over read, seek and indexing the Numpy result, and those are the very steps I would have to take manually every time, otherwise.

In the end, I think this comes down to the same point you were making about wavread: The functionality is trivial and could be easily implemented locally. But it saves a few key strokes and it makes things easier to explain. And just like wavread is crucial for you, so is indexing for me. I think PySoundFile can accomodate both use cases, and make both of us happy.

@marcecj
Copy link

marcecj commented Mar 3, 2014

Responding as per your Email, Basti, I think I agree on all points here: a trivial read function is quite frankly practical, and if it's so simple why not include it? But the indexing is also quite practical, if, as it turns out, somewhat problematic. However, if I wanted to index separate non-neighbouring channels I would instinctively just get the entire numpy array and index from that, as opposed to the example given by mgeier. Put differently: I don't think I would use two-dimensional indexing on a SoundFile object to do more than [a:b, c:d] style indexing. Anything fancier requires a proper numpy array anyway, e.g., the appropriately termed "fancy indexing" (although, since you pass second_frame to the numpy array, I believe you could do something like SoundFile(...)[(a, b, c), :] to select separate channels).

@mgeier
Copy link
Contributor Author

mgeier commented Mar 3, 2014

I guess both features (SoundFile indexing and a high-level read() function) don't interfere with each other, so it shouldn't be a problem for both to coexist.

@bastibe: just out of curiosity ... can you give an example where indexing into a SoundFile is more powerful/easy/efficient/beautiful/... than, say, indexing into a NumPy array after it has been returned from a (still hypothetical) read() function?

@bastibe
Copy link
Owner

bastibe commented Mar 3, 2014

@mgeier Basically, any time I want to access only part of a file:

with SoundFile as s:
    data = s[44100:88200]

This will only read the second second of the file, without reading the second before or anything after.

You could also write it like this:

with SoundFile as s:
    s.seek(44100)
    data = s.read(44100)

But I think the first example is much cleaner and easier to understand. I had one application that was churning through some 3000 audio files on 8 simultaneus threads. This application was wicked fast and used next to no memory, mainly because PySoundFile allowed us to only load a tiny part of a file at a time and never having to do big allocations. Also, the algorithm could only ever process 1024 samples at a time anyway, so loading more than that would have been a waste.

On a different note, read, seek and write work in relation to an invisible current read pointer. Consecutiive calls to read seek and write will have different effects depending on that invisible pointer. I think that is very confusing. You can't just pass the SoundFile object around and expect random functions to read and write with any kind of confidence.

Indexing on the other hand is a deterministic operation that will always do the same thing. You can pass the SoundFile object into some random function and have it play back the file, while simultaneusly doing some signal processing with it on the main thread. This works even for very large files that you wouldn't want to load in their entirity.

On the other hand, there is a whole class of algorithms that just wants to step through a file one block after another. For that, read, write and seek works perfectly: I think both APIs have their merit and use cases. I have used both techniques in different contexts to great effect.

@mgeier
Copy link
Contributor Author

mgeier commented Mar 6, 2014

@bastibe: thanks for the examples

To make my counter-examples a bit less hypothetical, I implemented the suggested features in #18.

If you only want to read a certain part of a file, I would suggest the high-level function read():

import pysoundfile as sf
myarray = sf.read('myfile.wav', start=44100, stop=88200)

or

myarray = sf.read('myfile.wav', frames=44100, start=44100)

I think that's clearer (more explicit) and more flexible (either frames or stop can be specified) than your indexing example.

Admittedly, this (not explicitly closing the file on read shouldn't be a problem):

myarray = sf.open('myfile.wav')[44100:88200]

... is a little less typing than

myarray = sf.read('myfile.wav', start=44100, stop=88200)

... but if you want to avoid some typing, you can use positional parameters:

myarray = sf.read('myfile.wav',44100,44100)

... which is even one character less than the indexing solution, although I wouldn't recommend that outside of an interactive session because it will be confusing what the different numbers actually mean.

As for your multi-threading example: I suspect that it is totally not safe to access the same SoundFile object from different threads (even for reading) exactly for the reason of the read/write pointer.
It's true that you internally re-set the pointer after the indexing operation, but the whole operation is totally not atomic. Correct me if I'm wrong, I don't know much about Python's threading mechanism.
If I'm not mistaken, your example is unsafe and therefore your point is moot.

If you are only reading, it should be perfectly safe to open the same file twice (into two different SoundFile objects), which I would suggest in your multi-threading scenario.
If one of the threads is writing, much more care has to be taken anyway.

@bastibe
Copy link
Owner

bastibe commented Mar 6, 2014

Oh my, I finally get what you have been saying all along. All this time, you were talking about the read function, and I was thinking you were talking about the read method. See, I didn't get why you would want to introduce the keyword arguments for the method. For the function, they make a lot of sense! You are right, I am wrong. Sorry about that.

@mgeier
Copy link
Contributor Author

mgeier commented Mar 6, 2014

Yes, and my examples open and close the file each time. If you want to read the file in 1024-frame-chunks, you can just use the read() method repeatedly (s.read(1024)) as you mentioned.

You still have to convince me that there is a situation where you want to jump around in an open file where read() and seek_*() is not enough and you really need indexing.

@bastibe
Copy link
Owner

bastibe commented Mar 7, 2014

When doing spectral processing, I frequently want to read consecutive, overlapping blocks of data. So the first block might be [0:256], the next [128:384], then [256:512] and so on. You can implement this using read and seek like this:

while seek(0) < len(f)-256:
    data = f.read(256)
    f.seek(-128)

I think it is much cleaner to do something like

for idx in range(0, len(f)-256, 128):
    data = f[idx:idx+256]

@mgeier
Copy link
Contributor Author

mgeier commented Mar 8, 2014

Reading overlapping sections is indeed an interesting example!

I agree that you second example is better than the first, but what about this:

for idx in range(0, len(f) - 256, 128):
    f.seek_absolute(idx)
    data = f.read(256)

Sure, this is one more line, but I think it would be more explicit, less obscure and therefore easier to read and maintain.

@bastibe
Copy link
Owner

bastibe commented Mar 10, 2014

I disagree. I think indexing is much easier to understand than seek and read. Sorry.

@mgeier
Copy link
Contributor Author

mgeier commented Mar 10, 2014

Fair enough.

I'm closing this issue, the actual changes were done by @bastibe in ac983e5.

@mgeier mgeier closed this as completed Mar 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants