-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better removal of bad reference stars #20
Conversation
Using R squared statistic as a test, reject based on the 2D gaussian fits we perform. Also use this to get a better initial estimate of FWHM.
Codecov Report
@@ Coverage Diff @@
## devel #20 +/- ##
==========================================
- Coverage 17.84% 17.21% -0.63%
==========================================
Files 22 22
Lines 1457 1516 +59
==========================================
+ Hits 260 261 +1
- Misses 1197 1255 +58
Continue to review full report at Codecov.
|
Currently uses pandas dataframe, we should rewrite this to remove the pandas dependency!
This also corrects our oversight on proper motion! We know take the RA&DEC at current observation epoch and convert that to pixel coordinates for the references, instead of their RA&DEC from our catalog, which is defined on 2015.5 as per GAIA DR2. |
Hi @emirkmo, Just did a few (mostly) cosmetic changes to the PR. I did do a change from using |
Could you please check that it is still performing as you expect? |
It's working well on the 5 test images! Good catch on the I get different nreferences values when using the the len just like np.sum. But you're right that it should give all true and false values.. It's not doing that for me. Maybe I need to consider removing the masked arrays from the masking because I'm not used to them yet. Anyway, your suggested change works and hopefully that solves any issues for now. |
logger.info("FWHM: %f", fwhm) | ||
if np.isnan(fwhm): | ||
raise Exception("Could not estimate FWHM") | ||
|
||
# if minimum references not found, then take what we can get with even a weaker cut. | ||
# TODO: Is this right, or should we grab rsq_min (or even weaker?) | ||
min_references_now = min_references - 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this code block, are you in fact not just picking the min_references_now
largest rsquears's? I don't think there is a need for a loop, which potentially could go into an infinite loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the loop is there to try to get as many stars as possible, we soft reset the minimum references, and then try again with the lowest R^2 value. If it doesn't work, we decrease R^2 and minimum references until we do finally reach enough. This should exit gracefully since min_references and the R^2 minimum are lowered in each loop and will quickly go to zero. Then nreferences will be >= min_references and it will exit.
It actually cannot go infinite since min_references will keep decreasing to zero, and n references is finite (it's either zero or finite).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can try to exit early by checking that there are no R^2 values between 0 and 1, and raising and exception in that case. But in reality this outcome is just a few iterations away and we don't really need to explicitly check for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since you are just lowering the limit until you get enough stars, it would be quicker to just calculate which limit would give you that required number of stars... Or just sorting the stars by rsquare and picking the number you require
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something along the lines of this?
mask = (masked_rsqs >= rsq_min) & (masked_rsqs < 1.0)
mask = np.argsort(masked_rsqs[mask]) # Sort by largest rsquare
mask = mask[:min(min_references, len(mask))] # Only take the min_references largest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: Let's check the minimum R^2 that gives us at least 2 stars. If it's >=0.15 we are happy, if it's <0.15 but >0.00, we should print a warning and try, if it's <0.01, we should raise. Like you said, we don't need a loop for this.
Well we are doing it greedily. Meaning that we change what it means to get "enough stars" at the same time as we lower R^2.
If we do as you suggest (which is more efficient), then we need to preselect the minimum acceptable R^2 value. In my opinion, that depends on the minimum acceptable stars said value gives. (Perhaps AIC/BIC would work better here anyway.)
I set it up here so that we first try to get at least 2 stars at our previous minimum R^2 = 0.2. (But higher is better, this all happens in the previous loop)
Then, in this loop, we soft reset and try to get at least 4 stars from R^2 = 0.15. If we don't, then we lower to 3 and 0.08. then 2 and 0.01 (worst case minimum acceptable scenario, but we only want this if 0.15 and 0.08 don't work.)
(Maybe it's easier to pre-calculate if you can code it that way?) But I think the pre-sort would work.
. Otherwise, we could just go to the R^2 value that will give us 2 stars. But we want to know whether there is an R^2 value that will give us more than 2 stars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is just unclear to me what the goal is here...
If it is to get as many reference stars as possible which have a rsquare over 0.15, why not just take all the ones which are above 0.15 and then warn/stop if there are not enough?
If it is to get a set number of reference stars, then just sort by rsquare and pick the desired number of stars from the top of that list, and then warn/stop if there are not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's hacky but the goal is to filter out low R^2 value stars, if possible, if not, go lower until we reach just enough stars. I think it's going to work better in the wcs checking version I'm working on...
We basically want both high R^2 and high number of stars. We can't really set the limit independently. The more we lower R^2, the higher chance of bad stars. But if we just do a high R^2 cutoff, we risk missing stars that we could have used, and having too few for the zero point statistics.
Just to illustrate:
If we can have 20 stars, we don't really want to use R^2 < 0.5. To minimize risk of bad stars.
If we can have 5 stars, we don't really want to use R^2 < 0.2, to minimize risk of bad stars but maximize number of stars.
if we can only have 3 stars, we are willing to go even lower in R^2. and so on.
I think we should do as you suggest with pre-computing. But this was my hacky solution to what I want to accomplish above. I'm sure there's a better way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I suggest is adding "improving the false star rejection" as an issue, and working on that when we get a chance. I think using some other goodness of fit criteria might work better for us. I'm not really willing to work more on this hacky solution at the moment as I think it will be redundant when we do so.
I will merge this now, but we should definitely create a new issue for improving this further. |
I created #25 to track this. I think it will naturally come when we move to a more standard goodness of fit statistical test. Then we can use established values for that test to guide us. |
This PR accomplishes a more robust removal of bad reference stars based on the 2D gaussian fits we are doing. Using this removal process, we also get a better initial estimate of the FWHM which results in better ePSFs. I tested this on several images, including Fids = 1243 and 1251 for 2019muj and 1791 for 2020yss.
There's quite a bit of test code in it still.