Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several changes, basis for discussion #18

Closed
wants to merge 33 commits into from
Closed

Several changes, basis for discussion #18

wants to merge 33 commits into from

Conversation

mgeier
Copy link
Contributor

@mgeier mgeier commented Mar 6, 2014

WARNING: this pull request is not meant to be merged as is!
It is rather meant as a basis for discussion and a proof-of-concept of some ideas, but for many things improvements are needed.
Documentation is lacking for most of the new features.

In-between new features, there are also a few minor bugfixes which can be cherry-picked if desired.

Some of the commits became obsolete because of following commits.
e.g. the function decode_number() (d72075f) is not needed anymore because the specialized types introduced in d31a484 provide a much better solution to the problem.
With that, most changes from 2833332 became obsolete, too.

If desired, I can provide individual pull requests for certain features (I can also add documentation, where lacking).

mgeier added 19 commits March 5, 2014 10:07
In new-style classes the base class __setattr__() has to be called
instead of inserting the value into self.__dict__.

See http://docs.python.org/2/reference/datamodel.html#customizing-attribute-access
Moving 'mode' right after 'name' and 'format' after 'subtype' and 'endian'.
This way most situations can be handled with positional arguments only, e.g.:

f1 = SoundFile('file1.wav')
f2 = SoundFile('file2.wav', 'rw')
f3 = SoundFile('file3.wav', 'w', 48000, 1, sf.PCM_24)
... and set default channels=None
Also, use named constants similar to the ones used in libsndfile (but
underscore-prefixed to not clutter the module namespace).
... and a module-level function get_format_info()
This is just an idea about how to retrieve meaningful strings for format
codes etc.

Most likely there are much better methods!
Also, the quite explicit name _add_formats_to_module_namespace() should
now explain what's going on.
This is relevant for tools that do instrospection, e.g. for
auto-completion in IDEs.
There is now a much better (= cleaner and faster) solution involving
FormatType, SubtypeType and EndianType.
@mgeier
Copy link
Contributor Author

mgeier commented Mar 8, 2014

@bastibe: If you want, you can cherry-pick the following commits (in the given order they should apply cleanly):

0d0f47f
2572742
6ff89db
0162248
c2e4612

For the following commits I'll add some documentation and then do a new pull request:

2948783
beb81b7
d8afffd
1d10fdf (do you like the names format_string and subtype_string?

These didn't apply cleanly, I'll also create a new pull request

e35911c (is _snd_strings really the right name?)
c6d1fd1
ebea05b
313d359

These are a little messy, I'll create a new pull request for these:

2833332
0ed878f
d26b657
aa75302
e835cd3
d31a484
ee736ab

And these are garbage:

d72075f
3785edd

@mgeier
Copy link
Contributor Author

mgeier commented Mar 8, 2014

I'd like to put a few names up for discussion ... please tell me if any of them can be improved ...

The components of file types are now described by format, subtype and endian.
format could be misleading, because in libsndfile the combination of the three is called "format".
OTOH, format can also hold all three OR'ed together and it will still work.
An alternative name would be major or major_format but I suspect this name would be less clear.

The name subtype is used in the comments of libsndfile.h, but we could also call it subformat ...? Again, I like the first variant more.
On the website, this is also called encoding ...

I'm also not sure about endian, because correctly, this would be called "endian-ness". But both endianness and endian_ness look strange to me.

The class attributes format_string and subtype_string seem a little verbose, but OTOH it should be quite clear what they mean.
We could also try format_info and subtype_info ...
I called the function to get these strings get_format_info() because libsndfile uses SFC_GET_FORMAT_INFO. However, I think we should not mix *_string and *_info and use only one of them.

The internal name _snd_strings doesn't seem right to me, maybe it should be called _str_types? In libsndfile the values are called SF_STR_* and the variable to sf_set_string() is called str_type.

And there is a little discrepancy between sample_rate and samplerate. Former is used in PySoundFile, latter in libsndfile. I think it would be good to always use samplerate.

I invented a little helper function _avoid_format_types() ... I don't like the name but I couldn't come up with something better.

EDIT:

In 3abdaf1 I changed this to

assert _raise_error_if_format_type(channels)

... I'm not sure if that's better ...

@mgeier
Copy link
Contributor Author

mgeier commented Mar 8, 2014

Regarding 2572742: I just realized that 'rw' isn't allowed in Python. I guess it should be replaced by 'r+' (and 'w+' should be disallowed).
See http://docs.python.org/3.3/library/functions.html#open

Probably we should also accept 'rb', 'wb' and 'r+b' for completeness?

Also 'a' and 'a+' should be handled (by positioning the read/write pointer after opening).

And 'x', 't' and 'U' should be disallowed, as I think they don't make sense for libsndfile.

Also, the order of the characters seems to be arbitrary ... so this should be also handled by PySoundFile.

All in all, the situation is much more complicated than I initially thought.

See also #19.

@bastibe
Copy link
Owner

bastibe commented Mar 10, 2014

Here's an idea: How about using the binary modes for raw input/output? That actually makes some kind of sense.

@mgeier
Copy link
Contributor Author

mgeier commented Mar 10, 2014

That's an interesting idea, but I fail to see how this makes sense ... can you please elaborate?

What do you mean by "raw"?
The data type SF_FORMAT_RAW or raw bytes in memory as opposed to wrapped into a np.ndarray?

@mgeier mgeier mentioned this pull request Mar 12, 2014
@mgeier
Copy link
Contributor Author

mgeier commented Mar 12, 2014

OK, I think I now get what you mean with "raw".
You mean the functions sf_read_raw() and sf_write_raw(), right?

I don't think the 'b'-flag is suitable for that.
First, all files handled by libsndfile are binary files, there are no text files among them. Therefore it's confusing why you should open a file without 'b' if it's a normal sound file.
Second, it's the wrong time. The mode is relevant when opening the file, the selection about raw or not raw should happen when calling read() or write().

Another confusing aspect is that the "normal" read/write function use frames, and the raw functions use bytes, therefore I wouldn't recommend putting the functionality into the same function.

I guess it would be better to make new methods read_raw(buffer, bytes) and write_raw(buffer, bytes).

Mostly PCM_16, as supported according to the table at
http://www.mega-nerd.com/libsndfile/
@mgeier
Copy link
Contributor Author

mgeier commented Mar 21, 2014

The functions should probably be called read_bytes() and write_bytes() to avoid confusion with the major type "RAW" which just means header-less.

Also, the parameter should probably still be a NumPy array ...?

And read_raw() should return the buffer instead of having it as an argument.

And finally, I'm having a really hard time imagining a use case for this ...

@bastibe
Copy link
Owner

bastibe commented Mar 25, 2014

read_bytes and write_bytes look good to me. They should simply return/consume bytes. I guess you could use those if you wanted bit-perfect data. You could also use it to very efficiently stream data from one file into another without the overhead of converting to and from Numpy.

@mgeier
Copy link
Contributor Author

mgeier commented Mar 28, 2014

I created an issue for that: #25.

BTW, I think the overhead caused by NumPy (using frombuffer()) is negligible in most cases.

Anyway, I don't see a reason why not to provide this feature.

@mgeier mgeier mentioned this pull request Apr 19, 2014
bastibe added a commit that referenced this pull request Apr 20, 2014
A lot of smaller improvements based on #18, #22, and #30.
mgeier added a commit that referenced this pull request Apr 20, 2014
This commit holds several changes which were discussed in #22 (which was
itself based on #18).

Change handling of file formats:
File formats are handled as three simple strings: format, subtype and endian.

Add "which" argument to seek().
This is needed because the combination with logical or (e.g. SEEK_SET | READ)
isn't possible with string formats.

Update frames counter on write().

Hide all non-API names in module namespace (by prefixing _).
This is relevant for tools that do introspection, e.g. for
auto-completion in IDEs.

Support sf_open_fd(), remove obsolete argument virtual_io
The first constructor argument can now be:
- a string (path to a file)
- an int (a file descriptor)
- a file-like object
The argument virtual_io is not needed, because this can be detected
automatically.
Closes #19

Get file extension from 'name' attribute of a file-like object.

Change public attributes of SoundFile class to properties.

Add 'name' property.

Add properties 'format_info' and 'subtype_info'.

Proper handling of dtype's in read() and write().

Add function default_subtype().
mgeier added a commit that referenced this pull request May 29, 2014
This is the combination of a few commits from #18, plus a few more
things.
See also #14.
@mgeier mgeier closed this Jun 18, 2014
@mgeier mgeier deleted the proof-of-concept branch June 18, 2014 08:13
mgeier added a commit that referenced this pull request Dec 17, 2014
This commit holds several changes which were discussed in #22 (which was
itself based on #18).

Change handling of file formats:
File formats are handled as three simple strings: format, subtype and endian.

Add "which" argument to seek().
This is needed because the combination with logical or (e.g. SEEK_SET | READ)
isn't possible with string formats.

Update frames counter on write().

Hide all non-API names in module namespace (by prefixing _).
This is relevant for tools that do introspection, e.g. for
auto-completion in IDEs.

Support sf_open_fd(), remove obsolete argument virtual_io
The first constructor argument can now be:
- a string (path to a file)
- an int (a file descriptor)
- a file-like object
The argument virtual_io is not needed, because this can be detected
automatically.
Closes #19

Get file extension from 'name' attribute of a file-like object.

Change public attributes of SoundFile class to properties.

Add 'name' property.

Add properties 'format_info' and 'subtype_info'.

Proper handling of dtype's in read() and write().

Add function default_subtype().
mgeier added a commit that referenced this pull request Dec 17, 2014
This is the combination of a few commits from #18, plus a few more
things.
See also #14.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants