Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better removal of bad reference stars #20

Merged
merged 14 commits into from
Jan 18, 2021
Merged

Conversation

emirkmo
Copy link
Member

@emirkmo emirkmo commented Jan 6, 2021

This PR accomplishes a more robust removal of bad reference stars based on the 2D gaussian fits we are doing. Using this removal process, we also get a better initial estimate of the FWHM which results in better ePSFs. I tested this on several images, including Fids = 1243 and 1251 for 2019muj and 1791 for 2020yss.

There's quite a bit of test code in it still.

rhandberg and others added 4 commits January 5, 2021 16:45
Using R squared statistic as a test, reject based on the 2D gaussian fits we perform. Also use this to get a better initial estimate of FWHM.
@codecov-io
Copy link

codecov-io commented Jan 6, 2021

Codecov Report

Merging #20 (2d6dcca) into devel (c13d6e8) will decrease coverage by 0.62%.
The diff coverage is 2.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##            devel      #20      +/-   ##
==========================================
- Coverage   17.84%   17.21%   -0.63%     
==========================================
  Files          22       22              
  Lines        1457     1516      +59     
==========================================
+ Hits          260      261       +1     
- Misses       1197     1255      +58     
Impacted Files Coverage Δ
flows/photometry.py 8.20% <2.63%> (-1.13%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c13d6e8...2d6dcca. Read the comment docs.

Currently uses pandas dataframe, we should rewrite this to remove the pandas dependency!
@emirkmo emirkmo mentioned this pull request Jan 6, 2021
@emirkmo
Copy link
Member Author

emirkmo commented Jan 8, 2021

This also corrects our oversight on proper motion! We know take the RA&DEC at current observation epoch and convert that to pixel coordinates for the references, instead of their RA&DEC from our catalog, which is defined on 2015.5 as per GAIA DR2.

@rhandberg
Copy link
Contributor

Hi @emirkmo, Just did a few (mostly) cosmetic changes to the PR. I did do a change from using len(np.isfinite(...)) to np.sum(np.isfinite(...)) when calculating the number of good values in your big loop. The first will always give the length of the array, no matter how many were finite or not.

@rhandberg
Copy link
Contributor

Could you please check that it is still performing as you expect?

@emirkmo
Copy link
Member Author

emirkmo commented Jan 11, 2021

It's working well on the 5 test images! Good catch on the len(np.isfinite(. Weirdly:

I get different nreferences values when using the the len just like np.sum. But you're right that it should give all true and false values.. It's not doing that for me. Maybe I need to consider removing the masked arrays from the masking because I'm not used to them yet. Anyway, your suggested change works and hopefully that solves any issues for now.

logger.info("FWHM: %f", fwhm)
if np.isnan(fwhm):
raise Exception("Could not estimate FWHM")

# if minimum references not found, then take what we can get with even a weaker cut.
# TODO: Is this right, or should we grab rsq_min (or even weaker?)
min_references_now = min_references - 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this code block, are you in fact not just picking the min_references_now largest rsquears's? I don't think there is a need for a loop, which potentially could go into an infinite loop?

Copy link
Member Author

@emirkmo emirkmo Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the loop is there to try to get as many stars as possible, we soft reset the minimum references, and then try again with the lowest R^2 value. If it doesn't work, we decrease R^2 and minimum references until we do finally reach enough. This should exit gracefully since min_references and the R^2 minimum are lowered in each loop and will quickly go to zero. Then nreferences will be >= min_references and it will exit.

It actually cannot go infinite since min_references will keep decreasing to zero, and n references is finite (it's either zero or finite).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to exit early by checking that there are no R^2 values between 0 and 1, and raising and exception in that case. But in reality this outcome is just a few iterations away and we don't really need to explicitly check for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since you are just lowering the limit until you get enough stars, it would be quicker to just calculate which limit would give you that required number of stars... Or just sorting the stars by rsquare and picking the number you require

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something along the lines of this?

mask = (masked_rsqs >= rsq_min) & (masked_rsqs < 1.0)
mask = np.argsort(masked_rsqs[mask]) # Sort by largest rsquare
mask = mask[:min(min_references, len(mask))] # Only take the min_references largest

Copy link
Member Author

@emirkmo emirkmo Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: Let's check the minimum R^2 that gives us at least 2 stars. If it's >=0.15 we are happy, if it's <0.15 but >0.00, we should print a warning and try, if it's <0.01, we should raise. Like you said, we don't need a loop for this.


Well we are doing it greedily. Meaning that we change what it means to get "enough stars" at the same time as we lower R^2.

If we do as you suggest (which is more efficient), then we need to preselect the minimum acceptable R^2 value. In my opinion, that depends on the minimum acceptable stars said value gives. (Perhaps AIC/BIC would work better here anyway.)

I set it up here so that we first try to get at least 2 stars at our previous minimum R^2 = 0.2. (But higher is better, this all happens in the previous loop)

Then, in this loop, we soft reset and try to get at least 4 stars from R^2 = 0.15. If we don't, then we lower to 3 and 0.08. then 2 and 0.01 (worst case minimum acceptable scenario, but we only want this if 0.15 and 0.08 don't work.)

(Maybe it's easier to pre-calculate if you can code it that way?) But I think the pre-sort would work.

. Otherwise, we could just go to the R^2 value that will give us 2 stars. But we want to know whether there is an R^2 value that will give us more than 2 stars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is just unclear to me what the goal is here...

If it is to get as many reference stars as possible which have a rsquare over 0.15, why not just take all the ones which are above 0.15 and then warn/stop if there are not enough?

If it is to get a set number of reference stars, then just sort by rsquare and pick the desired number of stars from the top of that list, and then warn/stop if there are not enough?

Copy link
Member Author

@emirkmo emirkmo Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's hacky but the goal is to filter out low R^2 value stars, if possible, if not, go lower until we reach just enough stars. I think it's going to work better in the wcs checking version I'm working on...

We basically want both high R^2 and high number of stars. We can't really set the limit independently. The more we lower R^2, the higher chance of bad stars. But if we just do a high R^2 cutoff, we risk missing stars that we could have used, and having too few for the zero point statistics.

Just to illustrate:
If we can have 20 stars, we don't really want to use R^2 < 0.5. To minimize risk of bad stars.
If we can have 5 stars, we don't really want to use R^2 < 0.2, to minimize risk of bad stars but maximize number of stars.
if we can only have 3 stars, we are willing to go even lower in R^2. and so on.

I think we should do as you suggest with pre-computing. But this was my hacky solution to what I want to accomplish above. I'm sure there's a better way!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I suggest is adding "improving the false star rejection" as an issue, and working on that when we get a chance. I think using some other goodness of fit criteria might work better for us. I'm not really willing to work more on this hacky solution at the moment as I think it will be redundant when we do so.

@rhandberg
Copy link
Contributor

I will merge this now, but we should definitely create a new issue for improving this further.

@rhandberg rhandberg merged commit 1d02f5c into SNflows:devel Jan 18, 2021
@emirkmo emirkmo deleted the fix_daofind branch January 19, 2021 13:31
@emirkmo
Copy link
Member Author

emirkmo commented Jan 19, 2021

I created #25 to track this. I think it will naturally come when we move to a more standard goodness of fit statistical test. Then we can use established values for that test to guide us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants