-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worries about Image class #117
Comments
I agree with your recommendation about
I think pixel boundaries cannot be shared, because as soon as you make a subimage the parent's boundaries no longer match the original. We could easily throw an exception if the boundaries of an image that shares data are shifted (or if the scale is changed), but these might be useful operations as they allow us to change the coordinate system of a subimage (because there may be more than one useful coordinate system for a particular image). |
Gary's concerns are similar to the ones I raised in the comments for #114. I think the API for the current Image class is pretty confusing and not very intuitive, specifically because of issues like this with shallow copies. So I think we should really have an extensive review of what functionality and behavior we want for Image. Jim changed the behavior pretty significantly with his overhaul, and I don't think the new version was ever really code reviewed. In particular, I think most of the problems come from the desire to be able to return Images by value. Do we really need this functionality? Can't we just always provide an image object to be edited rather than ask for images to be created for us and returned? Then we could switch to using deep copies, which have the intuitive behavior that I think we want. (See my code example in the #114 comments for an example of where shallow copies are very confusing.) |
I'll add some food for thought about a possible API that I think might make more sense, which is to mimic the behavior of the
I think this three-class model captures all of what Jim was trying to accomplish with the |
Mike's API is feasible. I originally avoided it because I thought it added too much of a burden on classes and functions that want to operate on images:
Overall, I think I'm more comfortable with the way the current |
I should also say that I do definitely share the concerns Mike raise on #114 about the operators, and I think we should make fixing that a priority. I took a different approach to that in ndarray than Mike took with TMV, but IMO either one is better than what we have now from a usability standpoint. But we should consider that what we have right now might be better from a maintainability standpoint - I think any satisfactory solution would add a lot of code to the If we don't do a big overhaul, however, we should take a look at the operators in Python and make sure they aren't similarly confusing. If they are, we could consider just removing them, as we can just delegate to the NumPy operators. |
I am open to an overhaul as I think sufficient motivation has been presented that some change is necessary (though I am on the fence as to whether we need an overhaul, or whether minor changes like dealing with problematic functions and Python operators would suffice). I would just like to raise a question of timing: do you think this is something that needs to be done for the 2nd milestone, or if it could wait for a cleanup period after this milestone? My big concern is that if we plan on writing lots of code to handle multi-object outputs, that might have to be changed in a significant way if the |
So here are several possible levels of intervention:
I would vote for (1) or (2), they'll work with no or minimal change to current use. (4) would definitely work but is more effort than I think is needed here (and also Jim noted requires more effort on all the client codes). (3) would be trickier but also not worth the effort. I do think it is important to be able to return an Image() from a function instead of always having to create one to pass in as an argument and get it filled in. It makes the client code simpler. And one of the virtues of SBProfile is that it will choose appropriate pixel scale and image sizes for you, making this usage possible & convenient. |
Jim wrote:
For the first one, one solution is to define everything in terms of For the second one, you can either derive all three class from a common base class ( Gary wrote:
The automatic sizing happens for the pass by reference version of draw too. In fact, the entirety of the
Are we really saying that we can't just declare the image first and then pass it by reference? That does't strike me as a big imposition on the client code. |
I agree with Mike on the pass-by-reference issue. It makes the code a little less user-friendly, but also programmatically cleaner. The setup of things like pixel scale is something we can expect from the user to do. On the original issue how to deal with deep and shallow copies of |
Jim or Peter, Could you explain in detail why you think the python version of Image wouldn't work well with this kind of approach? It's not obvious to me what is different or hard. And I actually care more about the python layer being clear than the C++ layer, since that's the code that the end user is going to interact with. Both of you are more expert than I am wrt python, so I'm willing to believe you. But I'd like to understand the difficulty you're seeing. |
Mike, I do think we could make the So, here's a start at a concrete proposal somewhere between Gary's (2) and (4): The current That leaves us with two problems:
and here's how you'd call it in Python:
|
Jim, this sounds very reasonable. I would definitely prefer a deep assignment operator, and, as you pointed out, this would immediately get rid of some confusion. Regarding |
What is the problem with the overload in python? This seems like a case where the overload resolution is easy, since the signature is the same, just the type is changing. Since python uses duck typing, it will accept either one. Then you pass it along to the C++ draw function which actually does do the overload resolution. It would be even easier if we separate out the automatic resize functionality from the actual drawing step. Then the python version could be written as:
Then the SBProfile version only needs to be defined in terms of an ImageView. But the python version would preserve the automatic resize feature for real Image's, but not ImageView's. |
Hello all, I've not weighed in on this up until now, feeling (largely) totally unqualified to comment on most of the issues that have arisen :) It's been instructive though... I agree that assignment vs copy operator semantics (deep vs shallow) is not a terrible evil: the fact that it can be expressed relatively simply is already a big plus point. As regards |
On Apr 24, 2012, at 6:52 PM, Barnaby Rowe wrote:
Probably very little and easily worked around. I added that return value mostly because I could, and it had no disadvantage. |
Thanks Gary! So far my investigations (using, admittedly, a blunt tool like Mike: I'm afraid I don't have a definitive answer, but I know that the C++ wrapping in Boost.Python can be picky about getting the right type of argument supplied for each wrapped function. I'm probably expressing this poorly, but in order not to duplicate all the draw calling signatures I think we might have to think of a clever way to handle this in C++ without the Python wrappers "knowing" (or rather, caring) about the difference between an Image and an ImageView... No idea if this is possible... |
Mike's example would work fine; by "overloading isn't pretty in Python" I just meant "you have to write if statements with isinstance checks". It's not horrible by any means. But I don't see that adding a deep-semantics |
The value of having both Image and ImageView in both C++ and python is in clarity. You always know who owns the data, and you can very easily declare whether or not you want to duplicate data:
This seems like a useful distinction in both python and C++. I think the only difference between the C++ and python syntax is the meaning of
In python, this is always just a rename, so basically a shallow copy, whereas in C++ we can make it a deep copy. I do find the python behavior confusing, but it's a basic confusion of the python language, not something specific to this discussion. |
Hi all, As this is important and relevant to the Milestone Issues #38 and #82 (among others), and more generally, Rachel and I have 'upgraded' this into also being a Milestone Issue :) In light of this, we've also pushed the Milestone deadline back a week; we think this is important. We're going to email the code list about all this too. In the next couple of hours (I need to get the bus into work now ;) I will summarize the options presented here so far for a final discussion, poll and plan for implementation. All this will happen on this Issue discussion, so watch this space! Any further ideas, solutions hybrids etc. that people want to present, feel free and I'll make sure they get in the summary... |
OK, so here's an attempt at a summary of options that have gained some measure of popular support (using capitalized latin characters due to the mild profusion of other enumerators already on this list!). These seem to me to be the main ones, but please add more if you feel I've missed something that has been discussed: A) Gary's first "conservative" option:
B) Gary's second "conservative" option:
C) Gary's option 3, not discussed very much:
D) Basically Gary's option 4, combined with the first 'flavour' of Jim's proposed solution, which I think is being most strongly advocated by Mike based on his experience with TMV.This seems like it would be in the form of a As Jim says:
Example usage given by Mike would be:
Mike also points out that if we can separate out the automatic resize functionality from the actual drawing step of the
As Mike says:
In this design the copy constructor (shallow) and assignment operator (deep) don't have the same semantics, but many on the discussion seemed fine with that -- and indeed some were happy indeed in the change to the previous assignment semantics. Jim, however, says this:
Mike replies:
(see usage example above). E) The first option proposed by Jim which isn't (D), although much is shared with (D) such as making the bounds and pixel scale immutable, and additionally making the assignment operator deep rather than shallow. These two I will simply enumerate as (E) and (F) in what follows. The problem addressed by Jim's multiple options is that if all we have are the View-like Image classes, we don't have anything we can pass "empty" into a draw routine and have the draw routine resize and allocate it. The first solution is basically the approach advocated in (D) above. The second approach, which we are labelling (E) here, is to say (Jim):
F) Similar to (E), except that instead of giving
_Summary_ Once we know what we want to do, we'll decide on who will do it! :) |
As you might imagine my preference is D, since I think it has the clearest syntax. I think much of the confusion with the current implementation comes from having effectively two classes (
The concepts seems to track exactly with the same concepts for matrices that I dealt with for TMV, and choice of shallow/deep copy constructors and assignment operators that I use there have proven to be very intuitive. So I think this would also be true of a similar implementation for Image. And if we do choose this option, I'd be happy to take the lead on implementing it. Peace, On Apr 25, 2012, at 6:05 PM, Barnaby Rowe wrote:
|
Also unpredictably, my personal preference is for F. I think Mike's assessment is correct about there being three different concepts in C++, from the conceptual standpoint of C++'s ability to distinguish between shared ownership, transfer of ownership, and copying at a basic level (not to mention constness). But in Python, where ownership of the pixels is reference-counted (as I think it must be in any implementation that works with NumPy), and the language doesn't distinguish between pass-by-value and pass-by-reference (Python passes everything by reference), having different classes for different categories of ownership is actually more confusing. I think the following must be true for an version of option D that works with NumPy and/or behaves "naturally" in Python:
So from the Python perspective, separating images and views is just one interface for supporting deep copies and resize operations. It's the most intuitive and clearest API for those operations in C++, but I don't think it's nearly as intuitive in Python, where shallow copies are the norm, copy constructors aren't any more standard than having a All that said, I think D is a better solution for a pure C++ design, and it may be the best one for a hybrid C++/Python design in which we expect to do a lot of full-image manipulation (not just pixel-setting) in C++. Most of my arguments are predicated on the assumption that our C++ code will rarely need to allocate, resize, or deep-copy images, and when it does, option F (or E) would be sufficient. |
On Apr 25, 2012, at 9:38 PM, Jim Bosch wrote:
Mike |
On 04/25/2012 10:48 PM, Mike Jarvis wrote:
In both cases, it's not so much a use case as a safety guarantee. All That makes my concern sound like it's just an implementation question - But I also think invalidating any views - even by raising exceptions -
If there isn't any reference counting, then it makes sense to only But if you have reference counted ownership, any of the objects can do Jim |
On Apr 25, 2012, at 11:35 PM, Jim Bosch wrote:
Sorry to be a broken record about this, but I still don't see the use case where this would become an issue. Why would someone want to get a numpy array view of an Image and then expect it to be valid after the original Image goes out of scope? Mike |
On Apr 26, 2012, at 8:02 AM, Mike Jarvis wrote:
I don't know the technical python issues here so I won't intervene on that account. But I suspect, Mike, the issue is not whether someone would want to do this. The issue is whether they will do it, intentionally or not, and then if they will get a clear indication that they have made a mistake (instead of silently incorrect answers) and how to fix it. And also of course whether the code allows them to make the mistake in the first place. In other words, are we making a system that is robust to typical python coding practice and some level of malpractice by our future users who will not be as well-informed about the nature of the code as the coding group is. |
Agreed. But I still want to know the kind of use case that we need to either allow or guard against. Maybe it's due to my C++ mindset, but I just don't have a good idea of what kind of code Jim is concerned about here. |
Here's a slightly contrived example; I can't think of why we'd want to do this in the context of simulating galaxies, but I think it's a reasonable thing to be able to do with an image class and I certainly wouldn't want to rule it out. Say we wanted to load a FITS image from disk, but we really only care about a subset of the image:
There are other ways to accomplish the same thing that don't have view invalidation problems, of course, and some of them may be more efficient. But this is a not-unreasonable way to do it, and we don't want to force people writing Python to start worrying about ownership or memory management when deciding how to do something like this. |
…ses to make im3 = im1 + im2 and similar more efficient. (#117)
Merged in pull request (issue #128), closing. |
Sorry I was late getting on this morning. I hear that you gave my talk for me. Thanks. :) Mike |
There are a couple of things about the C++
Image
class that might be sources of future problems and I want to poll you all to see if the current behavior is viewed as a desired feature. The issue is that the pixel data of the class are shared whenever the image is copied or assigned, but once twoImage
s (or subimages) share pixels, there is no enforcement that they continue to share their information about pixel boundaries and scales. The oldImage
class shared the ancillary information with copies, not just the pixel array.Sharing the pixel array is essential, so that functions may return
Image
objects without placing huge arrays on the stack. But I am nervous about "partial sharing" of the objects' meaning, because it might be confusing to have two different objects in the code that have different interpretations of what the pixels mean. Or do you all consider this a feature, that you can have two Image objects that are assigning different pixel coordinates to the same data?In any case the
resize()
method is potentially troublesome as written, as it sometimes severs its relationship with the original pixels and sometimes does not. So when some routine copies your image and resizes it, you are not going to know whether this new object is still sharing pixels with your original one. I suspect that for sanity, the code should be changed so that it always allocates a new array if there are any other instances sharing the data (currently it does not if the array size is not changing).The
redefine()
method is described as "dangerous" - is there any case when this is needed? Would it be better to just make people get themselves a newImage
?The text was updated successfully, but these errors were encountered: