Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a generator for optionally overlapping blocks #35

Merged
merged 6 commits into from
Aug 3, 2014
Merged

Conversation

mgeier
Copy link
Contributor

@mgeier mgeier commented Jun 18, 2014

for block sf.blocks('filename'):
    # do something

The function signature should look something like this:

def blocks('filename', block_len=1024, overlap=0.0, fill_value=None, out=None):
  • overlap is a float of fractional block_len. overlap=0.5 would advance the read pointer by 512 in each iteration.
  • fill_value specifies the behavior in the last iteration: Either a full block_len is returned (padded with fill_value), or a shorter array is returned if fill_value=None.

@mgeier
Copy link
Contributor

mgeier commented Apr 25, 2014

I already implemented this in the branch https://github.com/bastibe/PySoundFile/tree/blocks.
I wanted to wait with a pull request until #34 is resolved and merged, because the block-functionality depends on it.

@mgeier
Copy link
Contributor

mgeier commented Apr 25, 2014

I would suggest to use overlap in frames, not as a fraction of the block size.

I think both are valid ways to implement the feature, but the former is much easier to implement because we don't actually have to know the block size (so we don't have to examine out).

This is a matter of personal taste and previous experience, but I also think block_len=1024, overlap=512 is easier to grasp at first sight than block_len=1024, overlap=0.5.

@bastibe
Copy link
Owner Author

bastibe commented Apr 25, 2014

I actually think that a fractional overlap is nicer to work with. But this is not particularly important, I'm fine with either solution.

I do want to point out though that both pyplot.specgram and pyplot.psd use noverlap instead of overlap. That is probably a good idea as it avoids the confusion around these terms.

@mgeier
Copy link
Contributor

mgeier commented Apr 29, 2014

I don't really like prefixing n, but that's of course a matter of taste.

We would have to use it in other places, too: nchannels, nframes, nsections, which I also think would be very ugly.

I also suspect that it's un-Pythonic and it's actually a mistake of the pyplot-people to use it. They probably use it because it's called like that in Matlab(TM), but that should be in no way a reason for us to use it.

@bastibe
Copy link
Owner Author

bastibe commented Apr 30, 2014

n means that it is an integer. In this case, there is reason to think that it could be a fraction. n makes that clear.

@mgeier
Copy link
Contributor

mgeier commented May 17, 2014

n means that it is an integer.

That reeks like Hungarian Notation, which nobody uses anymore since decades [citation needed].

In this case, there is reason to think that it could be a fraction. n makes that clear.

Yes, but it's also hideous and un-Pythonic.

If you prefer, we can make it a fraction.
We could also allow both and decide which one it is based on the type.
But all that would need more code.
And there is always the rounding issue which may lead to sneaky off-by-one errors.
I think it's better to leave the conversion from float to int to the user.

Initially, I wanted to call it hopsize (or stepsize), which has no such ambiguity, but sadly, this cannot be used without knowing the actual block size.

Actually, now that we dropped channels_first, it's not so much implementation effort anymore (we can just use len(out)) and we could re-consider using hopsize.

Would you prefer that?

@bastibe
Copy link
Owner Author

bastibe commented May 19, 2014

Let's go with both fraction and integer for overlap, then. That is the most versatile approach, and it won't add too much code. I think overlap is the canonical name, so we should stick with that.

Regarding len(out), we should probably do the same thing as in read, and ignore block_len if out is given.

@mgeier
Copy link
Contributor

mgeier commented May 22, 2014

Let's go with both fraction and integer for overlap, then. That is the most versatile approach, and it won't add too much code.

OK, implementation-wise this wouldn't be a problem.

But don't you fear people could get confused by this?

I think overlap is the canonical name, so we should stick with that.

What about hopsize?

We could also implement both (but not to be used together), but I think this would also be confusing ...

Regarding len(out), we should probably do the same thing as in read, and ignore block_len if out is given.

Sure, sounds reasonable.

@bastibe
Copy link
Owner Author

bastibe commented May 25, 2014

I think overlap is the canonical name. It is the name I see most often. I've never seen hopsize anywhere. Also, "hopping" is more or less synonymous to "jumping", which is not a fitting metaphor, as we are not "jumping over" anything really. I think overlap is a good name.

But don't you fear people could get confused by this?

Frankly, no. I don't see any failure state that could be triggered by misinterpreting overlap. The only reason I can think of against this is that "There should be one-- and preferably only one --obvious way to do it.". But then again, "[...] practicality beats purity.".

Regarding noverlap and your invocation of Hungarian Notation, I want to point you to Joel on Software, who shows how Apps Hungarian Notation can be used for good. I think noverlap would be such a good use, as it makes a semantic distinction between overlap-as-fraction and overlap-as-number-of-frames. It is not a simple type annotation, as Systems Hungarian would have it.

Thus, I am still in favor of noverlap as a number of frames of overlap, as this is the only un-ambiguous solution I can come up with. If you still oppose this, we can compromize on overlap as both a fraction and a number of samples. This will at least always work, even though it is less clean.

Any thoughts?

@mgeier
Copy link
Contributor

mgeier commented Jun 4, 2014

"Hop size" is quite common, especially when doing windowed signal analysis. A web search for "stft hop size" shows plenty of examples.
I can't say which one is used more often, I think both are valid and reasonable names.
And I think "jumping" is quite a good metaphor, as far as metaphors go. We jump from one block to the next.
The point is I'm not inventing this stuff, the term is already widely in use.

I do like the sound and look of overlap, it's just very ambiguous:

  1. overlap in frames
  2. overlapping frames divided by block size
  3. overlap as fraction, but specifying the denominator, e.g. overlap=4 means a quarter of the block size is overlapping

I've seen all 3 out there. I'd prefer the first interpretation.
In addition to these 3 options, one could also read it as total overlap on both sides of one block. One could say that with a block size of 1024 and a hop size of 512 the frames fully overlap (meaning overlap == 1.0) because there is no part of the signal where there is no overlap.
To be honest, i've never seen that, but it would be a possible interpretation.

I've read Joel's blog entry on Hungarian Notation years ago, it didn't convince me then, it doesn't convince me today (although it's interesting to know the history!).
Everybody agrees that Systems Hungarian is utter crap. Apps Hungarian may have some limited use cases in other languages but I'd say it should never be used in a Python program.
Also, it only really works when using lowerCamelCase, which is a no-go in Python.

Do you know a case of something named n* in the Python standard lib or any other widely used Python library?

I only know ndims in NumPy, but this is not Hungarian, this is just an abbreviation for "Number of DIMensionS".
I don't think this is a particularly beautiful choice, but I can accept it.
noverlap doesn't fall into this category, because it is not an abbreviation for "Number of OVERLAPs".
Another bad example which is regularly used is nfft for the FFT size, in which case I'd use the much more Pythonic fftsize.

My opinion in short form: I strongly oppose noverlap, I think overlap is OK, but I'm not sure about the fraction stuff and I think hopsize is also OK.

@bastibe
Copy link
Owner Author

bastibe commented Jun 6, 2014

Frankly, I have never seen hop size before. But since you have, it's a fine name as well.

However, Numpy is using overlap, so I think we should go with that name instead of hopsize, if only for familiarity to Numpy users. Also, I simply like it better.

As for noverlap, you are probably right. overlapsize would then be the right name, if a bit wordy. Actually, I would be fine with that. Or we just use overlap and say in the docs that it is in frames. Users should be able to figure that one out. We can even add an error check for it. Since there should be only one way to do things, let's stick with that and simply disallow fractions.

Would you be fine with that?

@mgeier
Copy link
Contributor

mgeier commented Jun 10, 2014

Sure, I'm fine with an integer overlap called overlap, this is what I used in my initial implementation: e306f58, this also leads to a very simple implementation.

I actually grep'd the NumPy sources and didn't find any occurence of "overlap" as (part of) a function argument.
What function are you referring to?

I also checked in SciPy, and there I found only one: http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.signal.welch.html

This uses noverlap, but I think this function is specialized enough not to influence our decision.

@bastibe
Copy link
Owner Author

bastibe commented Jun 10, 2014

Yes, I was thinking of scipy.signal.welch.

Ok, are we decided on the function signature then?

blocks(file, block_len=1024, overlap=512, fill_value=None, out=None)

Do you agree with the default values?

@mgeier
Copy link
Contributor

mgeier commented Jun 11, 2014

I don't like the name block_len.
I think we should try to avoid abbreviations wherever it's reasonable. Also, I think that underscores should only be used if necessary to improve readability (as I understand PEP 8).
Therefore, blocklength would seem better to me.
But even better, and my favorite, is blocksize.

I think fill_value should keep the underscore because it's easier to read then fillvalue (although I'm not sure about that). Apart from that, fill_value is heavily used in numpy.ma.MaskedArray.

I don't like setting a default block size. There is no reason why it should be exactly 1024 or any other fixed number. Even if there were a meaningful default value (which there isn't), it would depend on the sampling rate (which is unknown at the time of the function definition).
Also, a default block size would conflict with the out argument.
Therefore, the default value should be blocksize=None.
blocksize shouldn't be optional, the only reason it's defaulted to None is to allow out.

I vote for overlap=0. If I don't specify an overlap, I don't expect any overlap.

fill_value=None and out=None is great.

There is one further argument max_blocks=-1 which is useful for write-only files (and it may of course also be used in 'r' and 'rw' mode).

And then there are a lot of arguments missing from read() and open(), namely mode='r', sample_rate=None, channels=None, subtype=None, endian=None, format=None, closefd=True, dtype='float64' and always_2d=True.

I'm not quite sure if we should also add start=None and stop=None like in the read() function.
Currently I don't see a reason why not to allow them.
I don't see a reason why blocks() should be less powerful than read().
I still have to check if start and stop work with write-only files. start probably doesn't but I guess stop could be useful.

For SoundFile.blocks() some of the arguments are not necessary, there we only need blocksize=None, overlap=0, max_blocks=-1, dtype='float64' and always_2d=True, fill_value=None and out=None.
start doesn't make sense there and I think stop would be too unclear and confusing.

I both cases we could think about adding a frames=-1 argument for completeness.
Actually, this would be quite useful, I don't know why I didn't think about this before ...

@bastibe
Copy link
Owner Author

bastibe commented Jun 15, 2014

Yes, I missed quite a lot there, didn't I? This looks much better now:

SoundFile.blocks(file, blocksize=None, overlap=0, dtype='float64', always_2d=True, fill_value=None, out=None)
blocks(file, blocksize=None, overlap=0, dtype='float64', always_2d=True, fill_value=None, out=None, mode='r', sample_rate=None, channels=None, subtype=None, endian=None, format=None, closefd=None)

I like the idea of max_blocks a lot! I think that the same cause would be better served by using frames, though. With overlapping blocks, the number of blocks can become quite confusing. The main purpose of blocks is to abstract that confusion away. Thus, I think it is much easier to provide the file length in frames, than in number of blocks. Also, frames provides a nice symmetry with read.

I have not thought about writing to blocks yet though. Writing to blocks would have to either use out or generator.send (example):

out = np.empty(1024)
for data in sf.blocks('file.wav', 1024, mode='rw', out=out):
    out[:] = data[:]*0.5

for data in sf.blocks('file.wav', 1024, mode='rw') as writer:
    writer.send(data*0.5)

We could also return an array to write to:

for data, out in sf.blocks('file.wav', 1024, mode='rw'):
    out[:] = data[:]*0.5

The first example is compatible with the second and third example. The second and third however differ in their implementation, so we can only implement one of them. I guess that the second example is somewhat more pythonic, since it does not rely on mutating arguments. On the other hand, generator.send is probably one of the lesser-known features of Python and might be unfamiliar to many. Also, the third example would have a different number of arguments depending on whether blocks is called in read mode, read/write mode or write mode.

A fourth version would be to re-use the yielded value as output, thus

for data in sf.blocks('file.wav', 1024, mode='rw'):
    data[:] = data[:]*0.5

This might even be compatible with generator.send.

Any thoughts on this?

@mgeier
Copy link
Contributor

mgeier commented Jun 15, 2014

You're right, we don't need max_blocks.
I became aware of frames quite late, while I wrote the last paragraph of my previous comment. I didn't realize at the time that this would make max_blocks obsolete.

I guess your first example would work, but the out argument wasn't intended to be used like that. It's meant for reading, not for writing.
Therefore, I wouldn't advertise this usage.

Your second example isn't valid Python code.
Iteration doesn't mix well with the use of send(). You have to call send() to get the next item as return value.
In your example, you'd have to call send() in an infinite loop and use the (modified) return value in the next invocation of send().
As much as I like the send() method of generators, I don't think we should use it here.
I think the normal use case for this is to send something to the generator and receive something in response.
In our case it would be reversed, we would want to receive a block and then send a modified block back.
This would cause an awkward asymmetry because the very first time we would have to send(None) and the last return value would have to be ignored.
I think this would be a mis-use of the send() mechanism.

Your third example would work, but it would need a special case for 'rw' mode where both data and out are needed.
Also, it would need more memory than necessary in the case that the user can re-use input and output buffers.

Your fourth example is the one I used in my original implementation: e306f58

I think it's the best because all three modes work exactly the same way.

@bastibe
Copy link
Owner Author

bastibe commented Jun 17, 2014

Sounds good

So in case of out in 'rw', the function would just yield out, right?

@mgeier
Copy link
Contributor

mgeier commented Jun 17, 2014

It would yield whatever SoundFile.read() returns.
This would normally be out, except when the file has less frames or less frames were requested, then it would return a smaller view of out.

@mgeier
Copy link
Contributor

mgeier commented Jun 17, 2014

I added a possible implementation based on our discussion: b70f6d7

This depends on #46. Once this is resolved, I'll make a proper pull request.

@mgeier
Copy link
Contributor

mgeier commented Jun 18, 2014

I turned this issue into a pull request (with the very nice tool http://issue2pr.herokuapp.com/).

The logic is quite involved, there are many code paths, so it's very likely that there are some bugs.

The code would be much simpler if we would only support 'r' mode, but I think 'w' mode is equally useful, and 'rw' should be there for completeness (although I'm not quite sure if it will ever be used).

@mgeier mgeier mentioned this pull request Jun 18, 2014
@bastibe
Copy link
Owner Author

bastibe commented Jun 19, 2014

Looks pretty good.

@mgeier
Copy link
Contributor

mgeier commented Jun 23, 2014

Why did you change the default start in read?

Good question, I'm not quite sure.

I thought start=0 is a more natural default than start=None since the first frame is always 0.

The reason why I initially used None, is because this is also used in the built-in slice() function.
But I reckon that this connection is not very obvious (especially since step is not available), therefore it doesn't help anyone if we use a similar default value here.

But if you don't like it, I can change it back.
I think none of them is wrong.

Please add documentation for the underscore-methods.

Sorry I missed that. I added a new commit 1735daa.

Please add tests (possibly after #48 is resolved)

Sure, will do.

@bastibe
Copy link
Owner Author

bastibe commented Jun 24, 2014

I thought start=0 is a more natural default than start=None since the first frame is always 0.

My bad, I mistook the read function for the read method. In the function, this change makes sense.

Looks great!

@mgeier
Copy link
Contributor

mgeier commented Jun 25, 2014

blocks() can be used in list comprehensions, e.g.:

import pysoundfile as sf

rms = [np.sqrt(np.mean(b**2)) for b in sf.blocks('myfile.wav', blocksize=1024, overlap=512)]

Shall we put something like this into the docs?

Is there an even simpler (but still somehow realistic) example?

@bastibe
Copy link
Owner Author

bastibe commented Jun 26, 2014

This is awesome! What a wonderful feature!

Shall we put something like this into the docs?

Absolutely! (We should probably rewrite the readme anyway, though, so this can probably wait.)

@mgeier mgeier mentioned this pull request Jul 2, 2014
@mgeier
Copy link
Contributor

mgeier commented Jul 2, 2014

@mgeier
Copy link
Contributor

mgeier commented Jul 15, 2014

I rebased this PR onto the current master and added tests for the blocks() function.

Both the blocks() function and the blocks() method now have 100% test coverage, although the tests call only the function.

One situation that is not tested is the case when write and read position are different before calling the blocks() method (in 'rw' mode, of course).
This is a quite exotic situation, I don't know if we even want to test that ...

Any comments?
Errors?
Possible improvements?

@bastibe
Copy link
Owner Author

bastibe commented Jul 17, 2014

The whole read/write pointer logic is quite involved, and probably hard to understand. It makes anything involving rw mode pretty messy (not just blocks). It might be worth considering to simplify the pointer logic to one unified read/write-pointer. I can't really think of any valuable use for a separate pointer for reading and writing.

It is kind of sad that write mode does not allow for overlap. Overlap-Add is a very valuable technique. But maybe we should delegate that to a separate issue.

@bastibe
Copy link
Owner Author

bastibe commented Jul 17, 2014

Maybe we should remove the blocks method, and have blocks only as a function. I don't see a use case where blocks would be used on anything other than the whole file.

Also, this would create a clear distinction between the "high-level" functions and the "low-level" SoundFile.

What do you think?

@mgeier
Copy link
Contributor

mgeier commented Jul 17, 2014

The whole read/write pointer logic is quite involved, and probably hard to understand. It makes anything involving rw mode pretty messy (not just blocks).

I agree.
But it also kind of makes sense: I normally want to start reading in the beginning and start writing in the end, therefore I need separate read/write positions.

It might be worth considering to simplify the pointer logic to one unified read/write-pointer. I can't really think of any valuable use for a separate pointer for reading and writing.

I think that's a hard decision to make. I don't know a real use case either, but still I'm reluctant to deliberately reduce the functionality libsndfile provides.
Even if neither of us needs this, there may be other people who do.
And they will be really pissed if we remove this feature.

I already removed this feature in the blocks() function, knowing that whoever needs this, can use the lower-level blocks() method.

If we would change to a single read+write position, where should it be initially when opening an 'rw' file?

If we remove the separate read/write position, the next logical step would be to remove 'rw' mode altogether.
For my personal use cases this wouldn't even be a problem, but on the other hand, I would like to support as much of libsndfile as is reasonably doable.

It is kind of sad that write mode does not allow for overlap. Overlap-Add is a very valuable technique.

It indeed is.
But it cannot logically work in write mode. You have to read the values before you are able to accumulate the new values onto them.
Therefore it can only work in 'rw' mode, which I think is complicated enough already.

I think a more realistic use case is to read block-wise from one file and write block-wise to another file.
To allow overlap when writing, I'd write partial blocks, which would be easiest when using 50% overlap. If a block size of 1024 with 512 samples overlap is desired, I'd write in blocks of 512 (without overlap) and do the windowing and accumulation in user code.

But maybe we should delegate that to a separate issue.

Yes please!
This has no effect on the API and would be perfectly backwards compatible, so there's no hurry.

Maybe we should remove the blocks method, and have blocks only as a function. I don't see a use case where blocks would be used on anything other than the whole file.

I don't see one either, but that doesn't mean there won't be one.
Also, even if it's only used on the whole file it might be desirable to have a blocks() method, because you probably want to open the file first to get some information (sample rate, length, ...) and then use blocks() with arguments based on this information.

Also, this would create a clear distinction between the "high-level" functions and the "low-level" SoundFile.

Probably.
But for me this is not a strong motivation for anything.

I think it's important for new users to see that there are a few very simple functions (esp. sf.read() and sf.write()) which are probably enough for most people.
Everything else, including blocks(), is not for casual use but for people who have a reason for looking into those specialized things.
What I want to say with that is that I wouldn't draw the line between functions and methods, but already within the functions.

@mgeier
Copy link
Contributor

mgeier commented Jul 27, 2014

I added a commit to disable blocks() in 'w' mode (as discussed before): d29c034

The implementation indeed got simpler, but not as much as I hoped.
Especially the frames argument is surprisingly hard to handle correctly.
I initially missed a corner case, which I tried to fix with a0c5997.
Using out seems like a hack, but it was the only way I could come up with, that didn't require much code duplication between blocks() and read().

But probably I missed a much better solution?

One solution would be to get rid of the frames argument (and consequently also the start and stop arguments).
But I think there may actually be use cases for this, so I'm reluctant to remove this feature.

@bastibe
Copy link
Owner Author

bastibe commented Jul 27, 2014

I think we should require the blocksize (or out). It doesn't make sense to request blocks of unknown size, and defaulting to all the remaining frames just makes it a copy of read. This would also further simplify the code.

Looks good though.

@mgeier
Copy link
Contributor

mgeier commented Jul 27, 2014

It doesn't make sense to request blocks of unknown size, and defaulting to all the remaining frames just makes it a copy of read.

This wasn't my intention. If neither blocksize nor out was given, some obscure error would have been raised.
I added a more specific error message in 64e5d55.

This would also further simplify the code.

How?

@mgeier
Copy link
Contributor

mgeier commented Aug 3, 2014

Can I merge this and we do potential improvements later?

@bastibe
Copy link
Owner Author

bastibe commented Aug 3, 2014

Yes, please merge!

mgeier added a commit that referenced this pull request Aug 3, 2014
Implement a generator for optionally overlapping blocks
@mgeier mgeier merged commit b5a4abc into master Aug 3, 2014
@mgeier mgeier deleted the blocks branch August 3, 2014 16:11
@mgeier mgeier mentioned this pull request Oct 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants