Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hsm code updates to use Image class / eliminate cfitsio / make python interface #38

Closed
rmandelb opened this issue Mar 20, 2012 · 43 comments
Assignees

Comments

@rmandelb
Copy link
Member

Now that we have our new Image class (#12), we will want to make a python interface to the hsm code using the Image class, and eliminate the cfitsio dependency from GalSim entirely.

I've tentatively assigned this to myself, as I am the one who knows the hsm code best, but I have a lot to learn about making a python interface and will probably be pinging Barney and Jim for help. Should be a good learning experience (and won't hold up our achieving the first project milestone if I'm slow).

@rmandelb
Copy link
Member Author

I hope to work on this issue this week, now that I have some more experience with various things and have a better idea what we want. Here's what I'm thinking (opinions would be appreciated from anyone, but I am particularly pinging @barnabytprowe , @TallJimbo , @rmjarvis ):

If you look at examples/MeasMoments.cpp and examples/MeasShape.cpp, you will see that most of the code in those files is (a) handling of FITS file read-in and (b) preparing various inputs (e.g., allocating images and reading them in) before calling a single key function that does all the work -- find_ellipmom_2 and general_shear_estimator in MeasMoments and MeasShape, respectively. Those two functions have two kinds of inputs - (1) some of their arguments are simple strings / floats / doubles, (2) others are classes that are defined in the hsm namespace, RectImage and ObjectData.

So one way to proceed would be as follows:
I could make C++ versions of find_ellipmom_2 and general_shear_estimator (with names that are more consistent with our naming conventions) that are nearly the same as the current ones, but that take as arguments Images (of various types - would be template functions) plus various quantities expressed as floats, strings, whatever. Those C++ functions would then repackage the information in the Images and other arguments to be in the form that the existing find_ellipmom_2 and general_shear_estimator want, as RectImage and ObjectData, and would then call those functions. This is rather clunky, but it would avoid my having to rewrite the guts of the various functions in Psfcorr.cpp to handle Images, which I think is a plus (not only because I'm lazy, but because I think that there is plenty to do in this project that is more urgent than this). I would write wrappers for these 2 template functions so that we can call them from python -- they would return an array of values, similar to how MeasShape and MeasMoments print several numbers to stdout. If we want to eliminate the cfitsio dependency from GalSim then we can disable examples/MeasMoments.cpp and examples/MeasShape.cpp once these functions exist (and, for those that still want command-line access to these functions, I could write analogous pythong scripts, examples/MeasMoments.py and examples/MeasShape.py).

I thought of other ways to go, but this seems like the simplest / cleanest in terms of allowing easy access to the HSM functionality from python without having to rewrite large amounts of C++. It does mean more overhead due to all inputs having to be manipulated into RectImages and ObjectData each time the program is run.

Thoughts?

@TallJimbo
Copy link
Member

This seems like a good overall plan to me. I don't think we need to worry about the extra overhead, and I agree that it would be a good idea not to spend a lot of time modifying the core implementation.

@rmjarvis
Copy link
Member

The only tweak I would make to this proposal is the return value of the functions. A bare array of numbers isn't very easy to use. e.g. One of them has e1,e2 at positions 4,5 and the other at 1,2. So I would suggest returning a simple structure that can be queried with things like result.e1 and such rather than result[4].

But I do agree that a wrapper is better than going into the guts to change the underlying implementation. The surface layer shouldn't be too much overhead. And in addition to the reasons you gave, it lowers the chance that you will introduce a bug in the conversion.

@TallJimbo
Copy link
Member

+1 on returning a struct.

In fact, it might make sense to return a galsim::Shear object for the ellipticities in that case.

@rmandelb
Copy link
Member Author

Yes, I was mostly worried about bugs that might be introduced in such a conversion! Rewrite = not too time-consuming, testing and fixing bugs = potentially an infinite amount of time. :)

You make a good point about return values. Would it make sense to define a class that carries around the information about objects, which is defined in C++ and wrapped for python access? I would have to think a little more about the design of such a class - it has the potential to be complicated since there are moments and shapes of the observed object, and then there are PSF-corrected quantities.

@rmandelb
Copy link
Member Author

Oops, I didn't see Jim's comment when I wrote mine. galsim::Shear has a lot of what I want, but I think we also want a way of carrying around observed and PSF-corrected quantities (for some objects we might just get moments in which case only the observed quantities would be known, for others we might carry out PSF-correction and save the PSF-corrected quantities). This is why I was thinking of defining something new. Or, as an alternate, we could define a class that just includes two galsim::Shear objects, one for observed and one for PSF-corrected quantities?

@rmjarvis
Copy link
Member

I think it makes sense to have a single class that has all the stuff that you want to return with descriptively named elements. But you could have result.obs_shear and result.corr_shear (or whatever) both be Shears. And if there is anything else in addition that you want to return (e.g. the moments), you can have those in the struct as well.

@barnabytprowe
Copy link
Member

The plan sounds very sensible - I agree with Mike that I don't think an extra Image > RectImage / ObjectData conversion overhead will be crippling.

The a container for two galsim::Shear objects sounds like a neat, minimal-extra-code solution to a handy output structure for HSM.

@barnabytprowe
Copy link
Member

(& Mike makes a good point about the names being descriptive... They could still both be galsim::Shear objects).

@rmandelb
Copy link
Member Author

Yes, sounds like a plan! Thanks for the useful discussion.

@rmandelb
Copy link
Member Author

A question:

Currently, the functions find_ellipmom_2 and general_shear_estimator require the specification of some arguments that we probably don't want the general python user to have to specify... usually a default would suffice. I need to figure out how to handle this for the C++ functions that call them and that I will be wrapping for access via python. One option would be to overload those C++ functions, with versions that specify all parameters and others that specify a bare minimum (the Images only). Is there some other option that would be preferable?

In case anyone is wondering what I mean, I just pushed a branch #38 that has modifications to include/galsim/hsm/Psfcorr.h which show a rather simple form for these functions. I would be overloading with versions that take additional arguments beyond what is already there. Also, that file contains the definition for the class that we discussed for the outputs of these HSM functions.

@TallJimbo
Copy link
Member

If you can specify the defaults in the form of default arguments, that's slightly easier to wrap for Python (and it feels more natural, because pure-Python functions don't support overloading). But if you need to have multiple overloads of the same function, that's also quite doable.

A side note: You probably want to use either std::string or const char * instead of simple char * for strings. I get the impression all of the strings here are literals defined in the code, so there's no worries about memory management, which would mean const char * would work fine. But if there's any question about that, use std::string (and use .c_str() to get a char const * from it to pass to the C code). Many compiler will warn about string literals in non-const char * variables.

@rmjarvis
Copy link
Member

I think what you want is to use default arguments. e.g. char * shear_est = "REGAUSS". The only down side with these is that if you only want to override one of the later arguments, you also need to specify all the previous ones as well. So they're not quite as nifty as python's kwargs.

If there will be a lot of arguments that people might want to specify, it might be cleaner to have a struct for the input parameters as well with the defaults all set in the constructor, so people would only redefine the ones that they want to change.

@TallJimbo
Copy link
Member

Mike's reply made me think of one more thing: if you use default arguments in C++, we can turn them into Python kwargs at the wrapper stage, so from Python you will be able to pass them in any order and specify later ones while leaving earlier ones defaulted.

@rmandelb
Copy link
Member Author

Wow, you guys are quick, thanks!

  1. yes, default arguments sound like what I want. I was worried about the issue of having to specify all of them if I want to change some later ones, but it occurs to me now that there are some that people are much more likely to want to change than others, so I will make sure to put those first. The fact that they can be turned into kwargs at the wrapper stage is very nifty; can you point me to an example of that in one of the existing wrappers so that I can emulate it here?

  2. you're right about the strings, will use const char *

@rmjarvis
Copy link
Member

Some other (minor) style suggestions.

  • You want at least const char*, not char* for the shear_est argument. Or possibly const std::string& to be more "C++-ish", although the functionality difference in this case is pretty non-existant. But the value isn't changed by the function, so char* alone implies the wrong behavior.
  • I would recommend using struct rather than class for HSMShapeData. The only difference between a struct and a class is that a struct starts out with an implicit public: and class starts with an implicit private:. But common practice is for structs to be things that are a collection of values that are intended to be accessed directly, whereas classes usually hide their data members and access everything through methods. Since HSMShapeData is really the former behavior, I'd switch to calling it a struct.
  • The default values for the elements of HSMShapeData are probably superfluous, since they should all be defined by your two functions anyway. So no need for a default.

@TallJimbo
Copy link
Member

Here's an example of C++ default arguments turned into Python kwargs (downside is that you have to repeat the defaults and keep them in sync):

C++
https://github.com/GalSim-developers/GalSim/blob/master/include/galsim/Shear.h#L193

Python Wrappers:
https://github.com/GalSim-developers/GalSim/blob/master/pysrc/Shear.cpp#L133

@rmandelb
Copy link
Member Author

  • struct vs. class: I realized I was making a class act just like a struct in terms of public vs. private members, which is kind of silly, but I wasn't sure if structs could be wrapped and I knew that classes can be. (Was going to be on my list to ask Jim once I got to the stage of having actually code to wrap!)
  • defaults for HSMShapeData: I wanted to have defaults because that way the moments-only function (i.e., no PSF-correction) doesn't have to define values for anything having to do with shape measurement, i.e. CorrectionMethod and CorrectionStatus. I agree that having a default for MomentStatus is pointless.

@rmjarvis
Copy link
Member

In that case, I would recommend moving the defaults into a constructor. I think having them at the declaration line will fail on at least some compilers.

Oh, and I think we've been using lower-case for variables and using underscores between words rather than camelCase.

Although looking through the code, it seems we haven't been very consistent about this either way. We should probably decide on a style for this and try to stick to it. e.g. The constructor to OpticalPsf has circular_pupil (what I thought our style was), interpolantxy (lowercase, but no underscore), and padFactor (camelCase). @barnabytprowe !!! :)

We have been pretty good I think at sticking to upper-case CamelCase for class names and lower-case camelCase for functions and methods. Just the style for variables seems to be a bit of a mix.

@rmandelb
Copy link
Member Author

  1. Okay, will move defaults into a constructor, no problem. (Or, perhaps I'm just being lazy about not wanting to set CorrectionMethod and CorrectionStatus when I don't carry out PSF correction!)

  2. As for style, I have

HSMShapeData FindAdaptiveMom(Image <T> object_image);
HSMShapeData EstimateShearHSM(Image<T> gal_image, Image<T> PSF_image, char *shear_est);  

which I think is okay, isn't it, or perhaps the functions would be better as findAdaptiveMom and estimateShearHSM? (I mean, ignoring the modifications I said I would make - I'm only asking about naming conventions for functions and classes vs. for the args) Were you concerned about the members of HSMShapeData, which are

        galsim::Shear ObservedMoments;
        galsim::Shear CorrectedMoments;
        char *CorrectionMethod="None";
        int MomentStatus = -1;
        int CorrectionStatus = -1;

?? I confess that here I was following one particular example, and I see that we're quite inconsistent throughout... not that we're going to go through and change tons of existing code, but I agree we should choose a convention for code that gets written from now onwards.

@rmjarvis
Copy link
Member

I was referring to the members of HSMShapeData. For free functions, I tend to prefer the initial capital as you had it, which is different from the LSST standard we were nominally following (poorly apparently).

So how should we decide a style to follow? Vote? You and Barney talk and dictate something?

I'll just put out my own preference as a straw man for discussion:

  • Classes (and structs) are CamelCase
  • Free functions are CamelCase
  • Member functions are camelCase
  • Public variables (including function parameters and kwargs) are lower_case
  • Private variables are _lower_case
  • Local scope variables/functions can be whatever the author prefers.

@barnabytprowe
Copy link
Member

Ooops yes, that is bad of me Mike; I think I do like the underscore at the Python level, I'll change those...

On 24 Apr 2012, at 11:14, Rachel Mandelbaum wrote:

  1. Okay, will move defaults into a constructor, no problem. (Or, perhaps I'm just being lazy about not wanting to set CorrectionMethod and CorrectionStatus when I don't carry out PSF correction!)

  2. As for style, I have

HSMShapeData FindAdaptiveMom(Image <T> object_image);
HSMShapeData EstimateShearHSM(Image<T> gal_image, Image<T> PSF_image, char *shear_est);  

which I think is okay, isn't it, or perhaps the functions would be better as findAdaptiveMom and estimateShearHSM? (I mean, ignoring the modifications I said I would make - I'm only asking about naming conventions for functions and classes vs. for the args) Were you concerned about the members of HSMShapeData, which are

       galsim::Shear ObservedMoments;
       galsim::Shear CorrectedMoments;
       char *CorrectionMethod="None";
       int MomentStatus = -1;
       int CorrectionStatus = -1;

?? I confess that here I was following one particular example, and I see that we're quite inconsistent throughout... not that we're going to go through and change tons of existing code, but I agree we should choose a convention for code that gets written from now onwards.


Reply to this email directly or view it on GitHub:
#38 (comment)

Barnaby Rowe

Jet Propulsion Laboratory
California Institute of Technology
MS 169-237
4800 Oak Grove Drive
Pasadena CA 91109
United States of America

Department of Physics & Astronomy
University College London
Gower Street
London WC1E 6BT
United Kingdom

@barnabytprowe
Copy link
Member

+1 for Mike's proposal, Rachel if you agree shall we put that in credo.txt?

On 24 Apr 2012, at 11:35, Mike Jarvis wrote:

I was referring to the members of HSMShapeData. For free functions, I tend to prefer the initial capital as you had it, which is different from the LSST standard we were nominally following (poorly apparently).

So how should we decide a style to follow? Vote? You and Barney talk and dictate something?

I'll just put out my own preference as a straw man for discussion:

  • Classes (and structs) are CamelCase
  • Free functions are CamelCase
  • Member functions are camelCase
  • Public variables (including function parameters and kwargs) are lower_case
  • Private variables are _lower_case
  • Local scope variables/functions can be whatever the author prefers.

Reply to this email directly or view it on GitHub:
#38 (comment)

Barnaby Rowe

Jet Propulsion Laboratory
California Institute of Technology
MS 169-237
4800 Oak Grove Drive
Pasadena CA 91109
United States of America

Department of Physics & Astronomy
University College London
Gower Street
London WC1E 6BT
United Kingdom

@rmandelb
Copy link
Member Author

Hmmm, you're right, I wasn't distinguishing between free vs. member functions. In that case, I could leave the functions and args the same, and just change the members of HSMShapeData.

I just checked credo.txt and unless I've missed something, we are rather vague on this where we should be explicit. Your proposal seems okay to me, but let's get some input from others. And we should keep in mind the rule we already agreed on, to violate camelCase (or CamelCase) for abbreviations like PSF, CCD, etc. (so Psfcorr.* should be PSFCorr.*, will fix on this branch). To add to your list above, file names should be CamelCase for consistency with SBProfile conventions... I think this is something that we did already agree on explicitly, but it's good to have everything listed all together.

@rmandelb
Copy link
Member Author

Sorry, Barney, I didn't see your message before I posted mine. I can make that change in credo.txt on master in a few minutes...

rmandelb pushed a commit that referenced this issue May 6, 2012
rmandelb pushed a commit that referenced this issue May 6, 2012
rmandelb pushed a commit that referenced this issue May 6, 2012
rmandelb pushed a commit that referenced this issue May 7, 2012
rmandelb pushed a commit that referenced this issue May 7, 2012
rmandelb pushed a commit that referenced this issue May 7, 2012
rmandelb pushed a commit that referenced this issue May 7, 2012
rmandelb pushed a commit that referenced this issue May 7, 2012
rmandelb pushed a commit that referenced this issue May 7, 2012
rmandelb pushed a commit that referenced this issue May 8, 2012
rmandelb pushed a commit that referenced this issue May 8, 2012
rmandelb pushed a commit that referenced this issue May 8, 2012
rmandelb pushed a commit that referenced this issue May 8, 2012
#38 - python interface for moments and shape measurement code
@rmandelb rmandelb closed this as completed May 8, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants