Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add open(), read() and write() functions #34

Merged
merged 4 commits into from
Jun 15, 2014
Merged

Add open(), read() and write() functions #34

merged 4 commits into from
Jun 15, 2014

Conversation

mgeier
Copy link
Contributor

@mgeier mgeier commented Apr 21, 2014

This is the combination of a few commits from #18.
See also #14.

@mgeier
Copy link
Contributor Author

mgeier commented May 4, 2014

I re-read the discussion about channels_first and I didn't find any really convincing arguments, neither for nor against it (except that it would definitely make the code more complicated).
I still have the feeling that it would be a good feature, but as long as there is no really convincing argument and as long as there is no better name I guess it's better to drop it.

I removed it in 12294be, but I will change the whole pull request to get fewer and more consistent commits.

@mgeier
Copy link
Contributor Author

mgeier commented May 4, 2014

I combined the previous mess of commits into three fresh ones.
This also sent most of the discussion to some level of Github nirvana.

I'm not sure if we agree on all points now ...

@bastibe: we can take the changes to the tests from your branch open-read-write-alternative, they should still pass.

@bastibe
Copy link
Owner

bastibe commented May 5, 2014

I think we agree on the API for SoundFile.read and SoundFile.write now.

Implementation wise, have a look at a700fcf for simplifying SoundFile.read. I don't think we need the two helper functions. It would be nice to bring that complexity back down to a more manageable level.

Besides that, I am not satisfied with read write and open yet. But maybe we can postpone that and merge SoundFile.read and SoundFile.write first.

  • I think we agreed that read and write should contain some basic documentation. In particular, we should mention that file may be any IO class, a file name or a file object and that negative frames defaults to the whole file.
  • We should move the import in read to the top of the file.
  • Maybe it would be cleaner for read and write to have fixed arguments (no **kwargs). I feel that these two functions really meant to be simple above all else, so we don't lose much by limiting them to a subset SoundFile's functionality. If you want full control, using SoundFile itself is the right thing to do anyway.
  • I think that open really is redundant. I don't see how it is any simpler than SoundFile.

What do you think about these points?

@mgeier
Copy link
Contributor Author

mgeier commented May 5, 2014

I fail to see how SoundFile.read() is simpler in a700fcf, can you please elaborate?

The helper function _create_out_array() indeed seems superfluous, but I will use it later also in the blocks() method to create a block in write mode.

The other helper function _check_frames_and_channels() is used in three places (including the write() function), I wouldn't want to repeat the code three times.

But this raises a more fundamental question: how defensive should we be when handling user input?
The main idea behind _check_frames_and_channels() is that it includes some error messages which should help the user in case of wrong arguments.
E.g. if the user passes a scalar value as data or out, there is an error message saying that only one- and two-dimensional arrays are allowed.
Also, if the array happens to be empty, there is an error message saying so instead of raising a division-by-zero error a few lines later.
I think this helps unexperienced users if they pass wrong arguments by mistake, but it also may be a little bit too defensive?
I'm having a hard time where exactly to draw the line ...

I created a new pull request (#36) containing only the commits about the read() and write() methods. After merging that, we can continue here with the discussion about the open(), read() and write() functions.

@bastibe
Copy link
Owner

bastibe commented May 9, 2014

I think _check_frames_and_channels is not a great function yet. While it is used in three places, not all of those places actually need all the functionality it provides. Also, it provides two distinct sets of functionality:

  • Check data integrity and throw errors
  • Calculate the number of channels

At the very least, this should be factored in separate functions. In #37, I actually do not use the function at all.

@mgeier
Copy link
Contributor Author

mgeier commented May 11, 2014

You're right that _check_frames_and_channels() isn't a great function ...

I'm still not clear on your position regarding how defensive we should be when checking user input.
Your commits in #37 suggest that you want to be less defensive, is this true?

@bastibe
Copy link
Owner

bastibe commented May 11, 2014

Sorry I was so inactive last week. I was sick.

Anyway, about defensiveness:
Personally, I am fine with getting unclear error messages in case I blatantly mis-use something. I don't think we have to care about a user passing a scalar to out. After all, the documentation clearly states that out has to be a numpy array. We don't test the types for most inputs. This is just the reality of working with dynamic typing.
As for a zero-length out, I think we should simply not read anything. It is not technically an error to read zero frames. If the user asks for zero frames, he should get zero frames. I just added a test for this in d331c89.

This is the combination of a few commits from #18, plus a few more
things.
See also #14.
@mgeier
Copy link
Contributor Author

mgeier commented May 29, 2014

Now that we've merged #36, I rebased this PR onto the current master (0ca0eb3).

I didn't add tests yet because I want to wait for #41.

Also, I'd like to wait until we find an agreement on the discussions in #33.

I quickly checked, though, by replacing the methods with the respective functions in the tests (at least in those that don't involve seek() or other SoundFile-specific stuff) and all tests pass.

  • I think we agreed that read and write should contain some basic documentation. In particular, we should mention that file may be any IO class, a file name or a file object and that negative frames defaults to the whole file.

Yes, we agreed.
However, I think it's soon enough if we improve the docstrings in the course of #27.
I don't want to tune the docstrings now just to re-write everything in #27.

  • We should move the import in read to the top of the file.

I put it there to avoid the slightly ugly from inspect import getargspec as _getargspec.
But if you prefer that, I'll change it, no problem.

Another, probably not very convincing argument would be that if only the low-level stuff is used, the import doesn't have to be done at all.

  • Maybe it would be cleaner for read and write to have fixed arguments (no **kwargs). I feel that these two functions really meant to be simple above all else,

This wouldn't be cleaner nor simpler.
It would just be a maintenance burden.

It would be nice to manipulate the signature automatically based on the method's signatures, but AFAIK this in only possible with severe hackery involving exec() or eval().
It could probably be done like in the decorator module (https://pypi.python.org/pypi/decorator) or like there:
http://python.dzone.com/articles/copying-python-function%E2%80%99s
http://stackoverflow.com/a/18626691/500098

But as soon as we have proper docstrings (#27), I think the need for that will go away.

so we don't lose much by limiting them to a subset SoundFile's functionality.

Oh no, please let's not!

It would be crazy to unnecessarily limit the features of the functions.

All parameters as they are implemented in this PR make perfect sense.
Removing any of them would just deliberately limit the usefulness of PySoundFile as a whole, and I wouldn't want that.

If you want full control, using SoundFile itself is the right thing to do anyway.

Sure, there still remain use cases where the methods have to be used.

But I guess a huge percentage of users will never have to call any of the methods directly.

And that's a good thing!

It also makes for a much cleaner documentation. All the SoundFile stuff can be moved to "advanced topics".

  • I think that open really is redundant. I don't see how it is any simpler than SoundFile.

Obviously it's redundant in the sense that it's literally return SoundFile(*args, **kwargs) and nothing more.

But it's just so much more Pythonic!

If you want to open a text file in Python, you use f = open('myfile.txt').
You don't use f = io.TextIOWrapper(...).

Therefore, I think it perfectly makes sense to prefer f = sf.open('myfile.wav') instead of f = SoundFile('myfile.wav').

I tried to make this point already (https://github.com/bastibe/PySoundFile/pull/14#issuecomment-36464595), sorry for the repetition.

@mgeier
Copy link
Contributor Author

mgeier commented May 31, 2014

I thought a bit about the **kwargs stuff and I've probably changed my mind ...

I still think spelling out all arguments would mean an annoying amount of duplication and it would make the whole thing definitely harder to maintain.
But against these disadvantages from the view of the implementation there is a real benefit for users if they see the full function signature.
It's hard to decide, but probably the benefits outweigh the shortcomings ...?

Another benefit would be that we could avoid the whole getargspec stuff.

... instead of using *args and **kwargs.
@bastibe
Copy link
Owner

bastibe commented Jun 6, 2014

Yes, I think this definitely makes the functions much easier to understand.

However, I disagree with 49f0e23. I think the documentation should be placed where the functionality is actually implemented, not in some wrapper function. Ideally, the docstring information really should be available to both the functions and methods. Is there any way we can programmatically append the docstring from the method to the function?

Re-use constructor docstring for open()
@mgeier
Copy link
Contributor Author

mgeier commented Jun 10, 2014

It's no problem to re-use the init-docstring for open(), I did this in 5ccfdb5.

I'm not sure if this is also a good idea for read(), because the two docstrings differ.
We could of course put the second (common) part of the docstring in a separate string and concatenate it with the two original docstrings, but then the common string is somewhere in the module namespace and may be confusing people.
It could also be a class attribute, but I guess this wouldn't really improve anything.

Also, I don't know if we can still separate the two parts nicely once we switch to NumPy-style docstrings (#27).
In the upcoming Sphinx documentation I'd like to move the read() function to the very top (because I think it will be the most commonly used feature). If we do that, I think it would also make sense to put the full documentation there and a much shorter docstring in SoundFile.read().

@bastibe
Copy link
Owner

bastibe commented Jun 10, 2014

I am uncomfortable with not documenting a function/method.

If I look up the documentation for a function/method (most likely through some functionality of my text editor), I want to see the documentation. A referral to some other place is nice, but my text editor can't parse that.

I'd rather have duplicate documentation than no documentation. Right now I get no documentation at all on SoundFile.read. I think it is fine to refer to another function/method for the full documentation, but the docstring should at least contain an abbreviated version of the documentation.

Frankly, I'd prefer to have a full copy of the docstring in both read and SoundFile.read. Documentation in general is chock full of repetition anyway, so this should not be a problem. Anything else is just making the user's life harder for no benefit at all.

@mgeier
Copy link
Contributor Author

mgeier commented Jun 11, 2014

If you don't see any documentation for SoundFile.read(), there's something wrong.
Currently I get:

Read a number of frames from the file.

Reads the given number of frames in the given data format from
the current read position. This also advances the read
position by the same number of frames.
Use frames=-1 to read until the end of the file.

For further keyword arguments see the module-level function
read().

If you insist I can duplicate the text which is valid for both method and function, I don't like it but I don't want to get too religious about that, either. There are other questions where if have a much stronger opinion.

But I would like to point out that, as far as I've seen, it is common practice in NumPy to refer somewhere else for documentation.

E.g. numpy.ndarray.reshape():

Returns an array containing the same data with a new shape.

Refer to `numpy.reshape` for full documentation.

See Also
--------
numpy.reshape : equivalent function

See also http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.reshape.html#numpy.ndarray.reshape vs
http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html

Of course in our case it's not really an equivalent function, but it's still a similar situation.

@bastibe
Copy link
Owner

bastibe commented Jun 15, 2014

The main issue I see here is that we will probably have a lot of users who will use either the functions or the methods, but not both. It might be confusing for them to have full documentation available for one but not the other.

I feel that if we were to skip the documentation on one of SoundFile.read and read, it should be read that gets the abbreviated documentation, simply because the actual implementation is in SoundFile.read. However, this might be a disservice to users, as they are probably more likely to use read than SoundFile.read.

In the end, I would vote for having full documentation everywhere. I guess there is some middle ground though. In particular, I would probably be fine to simplify the function documentation a bit and defer some rarely-used details to the methods. This would make simple usage simpler and power-users more powerful.

We are in for a documentation rewrite anyway though, so we can easily defer this to a later time.

@mgeier
Copy link
Contributor Author

mgeier commented Jun 15, 2014

OK, that sounds reasonable.
Let's see how the documentation looks once we change it to NumPy style (#27).

I duplicated the documentation in read() for the time being.

Can we merge now?

bastibe added a commit that referenced this pull request Jun 15, 2014
Add open(), read() and write() functions
@bastibe bastibe merged commit c79c314 into master Jun 15, 2014
@mgeier mgeier deleted the open-read-write branch June 15, 2014 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants