-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several changes to the read() and write() methods #36
Conversation
Replace ffi.new() with np.empty() and np.ascontiguousarray() Remove dicts for readers/writers read(): * reserve only as much memory as needed (if 'frames' is too large) * check return value of sf_readf_*() write(): * avoid copy if data has already the correct memory layout * check return value of sf_writef_*() * don't return the number of written frames! This is for symmetry with read() and it's redundant information anyway
Very cool! I like the way this is going! I think we are finally getting close to a merge.
I think this would clear up the structure of the code a bit. What do you think about this? Also, we should make sure to include tests before we merge (probably mostly cherry-picked from #37). |
What would be the advantage of fetching the function and not calling it? We could of course replace Probably I'm missing your point ...? I'm not quite sure about your position regarding defensiveness. Your example with very detailed error reporting seems to contradict you statement there: https://github.com/bastibe/PySoundFile/pull/34#issuecomment-42779149. I think that I actually used too much error checking initially and I'm trying to generally reduce the amount of error checking. What about something in-between: combine all checks and only raise a single exception? if (data.ndim not in (1, 2) or
data.ndim == 1 and self.channels != 1 or
data.ndim == 2 and data.shape[1] != self.channels):
raise ValueError("Invalid shape: %s" % repr(data.shape))
I think it makes more sense if we first merge #36 and from there you create a new PR with the changes to the tests in #37. The tests themselves have a few issues which I would prefer to work on in a separate PR authored by you. |
* add comment for documentation * add check for array shape * add an assertion
I added the shape check, now all tests from #37 pass except the one test for the |
Just a small measure of decoupling. I prefer methods that can be combined with other methods over methods that wrap other methods. But that's maybe not that important. The important thing is to separate reading/writing from error-checking.
I was thinking of
Right. Let's check as little as practical, but make sure we get the size right. We don't want to segfault the interpreter.
I disagree, but that's fine. Let's get this PR ready to merge as soon as possible. I am eagerly awaiting these features, and so are my colleagues. |
OK, I like functions that can be combined in different ways. Why should we split it if the functions are only used internally and only exactly two times in exactly the same combination?
I don't get it. I think as soon as we see that either part can be used somewhere else we should separate it into two functions, but not earlier. YAGNI comes to mind. If we were talking about public functions, this would probably be different, I'd often violate YAGNI, but we're talking about private functions here ... Or am I missing something?
I know that everything can be solved by using another level of indirection. And again, this isn't part of the public API, so there is no need to complicate things just to have marginally nicer string literals. I think
OK, great. |
Because a function should do one thing, and one thing only. This is maybe at the core of most of our arguments. I strongly believe that whenever you would have to put an and in the function description, that function should be split in two. By any other logic, all code should be lumped into one giant function with a myriad of different arguments that branch out into different functionalities (Matlab style). This is one of the lessons I learned from unit testing. Whenever functionality can be split in independent functions, one should do so, as it makes code easier to reason about.
This can be factored into two functions:
At the heart of this is the notion of splitting exception-raising code paths from memory-mutating code paths, so that they can be reasoned about independently. To make this even more useful, they can be tested independently. Code that does many things in an intertwined manner can be called complected, as coined by Rich Hickey, a hero of mine, in Simple Made Easy. Please split that function, then we're ready to merge. If you are still not convinced that this is the right way to go, do it for me. |
That sounds like a reasonable guideline.
That would be one extreme. I think we should aim for something in between.
That sounds resonable, too.
That sounds to me like a single action with some preparation specific to this single action.
[...]
Yes they could be tested independently. But what does that improve?
Thanks for the link. I watched it and I really liked it! But the code in question is not at all complex. It's totally linear. I think we would complect the code if we would add another function for no reason.
I'm indeed not convinced yet. But most probably I didn't split it the right way ... BTW: the "WIP" in the commit message means "Work In Progress", i.e. I don't want to merge this as is, I rather want to do some further work on it and then rewrite some commits. |
I indeed very much prefer it this way. In my first attempt to do this, I put the assertions into the
It improves me being able to reason about them independently from one another. Having two functions instead of one means that I can modify one without the other, and rest assured that my changes only alter the workings of one of them. Being able to test one without the other means that test failures can show me the origin of my error in more detail. Also, the code now contains two function invocations, one saying "this checks the dimensions of the array", the second saying "this puts new data into the array". This is a form of codified documentation. When I read the code for the I think that these benefits make splitting the function well worth it.
Call me crazy, but I prefer the simple
There are usually some parts which can not reasonably be split into different functions without breaking the control flow. The question is how to cut these atomic blocks into separate functions. My rule of thumb is to put them in as few functions as can be unambiguously named. As long as a function name can aptly describe its contents, it probably does not contain too much code. In this particular case, the
In my eyes, it beats the alternative of lumping both error checking and reading/writing in one function. It took me quite a while to appreciate short functions. When I first learned programming, my teachers were scientists, and their code was atrocious. Monster functions everywhere! Matlab code, where function invocations were expensive things! But the more I grew, the more I learned to appreciate short, clearly named functions. These days, I try to keep function complexity to an absolute minimum. This usually means splitting them up in many smaller functions. Short functions are like short sentences. They are easy to understand. They are easy to reason about. They are easy to disassemble and reuse. If you want to do a refactor of some giant function, it is never clear which parts can be taken out of context and still work, and which parts are intrinsic to the function and its data structures. In a function that only delegates to a bunch of other small functions, this is always obvious. Thus, refactoring becomes simple. However, I have to admit that I didn't do a stellar job at this when I first wrote PySoundFile. At the time, it was a quick-and-dirty hack for playing with the CFFI. I probably should have taken the time back then to factor my code better. In the end, it took your invaluable feedback to make me realize this and hopefully inform future decisions about new code I write. Thank you for that!
Be sure to check out Hammock Driven Development as well. Those two talks changed the way I think about programming. |
Fair enough. I changed the commit to have a proper commit message: ab33e66 Is this all now? Are we able to merge? I still don't like the changes, but I'm OK with merging anyway.
Indeed, I'm quite sure on this one. Thanks for your detailed response. Do you happen to know some sources of information about that? I think in our concrete case it doesn't really matter, but in a larger codebase it would probably be dangerous to separate the error checking from the actual function, because one could simply forget to call the error-checking function. In a larger context, I think it's also dangerous insofar as something could move in between error checking and the actual function and change the state which was carefully checked before. But these are just thoughts, I don't have practical (bad) experience with that.
It's un-Pythonic and, in a more general context, it's un-safe because the dict could change between the check and the actual access. If you don't like the ffi_type = _ffi_types.get(array.dtype)
if ffi_type is None:
raise ValueError("Some Error")
It's only guaranteed if it's also guaranteed that
Indeed.
OK. As you wish.
Exactly.
Thanks for the tip, I will. |
I really haven't ever used assertions before. But seeing how you used them and reading your comments made me realize their purpose. Very useful indeed!
Sadly, I don't. As an anecdote, one could maybe look at command line applications or web APIs, which always sanity-check/parse their arguments, and typically do so in some special function. I see our example in a similar light, since the point of the check-function is to sanity-check the array. I would not classify this as argument checking really, but that's just a feeling.
I think so! I'll run the code through the tests one more time, then I'll merge! Let's have a virtual beer for finally accomplishing this! |
Several changes to the read() and write() methods
Great, cheers! |
Although the issue is closed, I'd like to discuss a bit more about separating error checking from the rest of a function. I have the feeling that this point will come up again quite soon and it would be good to have some kind of guideline. I had a look at all functions/methods in PySoundFile (current master 7b0d0cb) regarding error checking. There are 3 methods just for error checking, which are nicely re-used: And there are 2 groups of functions which violate the "rule" that error checking should be separated:
I think those are perfectly fine, no need to extract any more error checking. Would it be really a good idea to create separate functions for the error checking of each of the above-mentioned functions? Very few of these error-checking functions could be re-used, so we would have to add probably around 8 new functions/methods! |
I am coming around on this. I am still thinking about it though. In general, I think we should split up functions as much as possible. We should probably even split some of the functions we already have in more, smaller, separable, ortogonal pieces (even if those pieces are only used once). In the case of splitting error checking from data usage, I might have been on the wrong track though. I'll think some more about it. |
See discussion in #34.