Better removal of bad reference stars #20

emirkmo · 2021-01-06T10:43:45Z

This PR accomplishes a more robust removal of bad reference stars based on the 2D gaussian fits we are doing. Using this removal process, we also get a better initial estimate of the FWHM which results in better ePSFs. I tested this on several images, including Fids = 1243 and 1251 for 2019muj and 1791 for 2020yss.

There's quite a bit of test code in it still.

Using R squared statistic as a test, reject based on the 2D gaussian fits we perform. Also use this to get a better initial estimate of FWHM.

codecov-io · 2021-01-06T10:46:06Z

Codecov Report

Merging #20 (2d6dcca) into devel (c13d6e8) will decrease coverage by 0.62%.
The diff coverage is 2.63%.

@@            Coverage Diff             @@
##            devel      #20      +/-   ##
==========================================
- Coverage   17.84%   17.21%   -0.63%     
==========================================
  Files          22       22              
  Lines        1457     1516      +59     
==========================================
+ Hits          260      261       +1     
- Misses       1197     1255      +58

Impacted Files	Coverage Δ
flows/photometry.py	`8.20% <2.63%> (-1.13%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c13d6e8...2d6dcca. Read the comment docs.

Currently uses pandas dataframe, we should rewrite this to remove the pandas dependency!

emirkmo · 2021-01-08T17:01:57Z

This also corrects our oversight on proper motion! We know take the RA&DEC at current observation epoch and convert that to pixel coordinates for the references, instead of their RA&DEC from our catalog, which is defined on 2015.5 as per GAIA DR2.

rhandberg · 2021-01-11T10:36:23Z

Hi @emirkmo, Just did a few (mostly) cosmetic changes to the PR. I did do a change from using len(np.isfinite(...)) to np.sum(np.isfinite(...)) when calculating the number of good values in your big loop. The first will always give the length of the array, no matter how many were finite or not.

rhandberg · 2021-01-11T10:36:52Z

Could you please check that it is still performing as you expect?

emirkmo · 2021-01-11T11:32:49Z

It's working well on the 5 test images! Good catch on the len(np.isfinite(. Weirdly:

I get different nreferences values when using the the len just like np.sum. But you're right that it should give all true and false values.. It's not doing that for me. Maybe I need to consider removing the masked arrays from the masking because I'm not used to them yet. Anyway, your suggested change works and hopefully that solves any issues for now.

rhandberg · 2021-01-11T11:36:38Z

flows/photometry.py

 	logger.info("FWHM: %f", fwhm)
 	if np.isnan(fwhm):
 		raise Exception("Could not estimate FWHM")

+	# if minimum references not found, then take what we can get with even a weaker cut.
+	# TODO: Is this right, or should we grab rsq_min (or even weaker?)
+	min_references_now = min_references - 2


Regarding this code block, are you in fact not just picking the min_references_now largest rsquears's? I don't think there is a need for a loop, which potentially could go into an infinite loop?

So the loop is there to try to get as many stars as possible, we soft reset the minimum references, and then try again with the lowest R^2 value. If it doesn't work, we decrease R^2 and minimum references until we do finally reach enough. This should exit gracefully since min_references and the R^2 minimum are lowered in each loop and will quickly go to zero. Then nreferences will be >= min_references and it will exit.

It actually cannot go infinite since min_references will keep decreasing to zero, and n references is finite (it's either zero or finite).

We can try to exit early by checking that there are no R^2 values between 0 and 1, and raising and exception in that case. But in reality this outcome is just a few iterations away and we don't really need to explicitly check for it.

But since you are just lowering the limit until you get enough stars, it would be quicker to just calculate which limit would give you that required number of stars... Or just sorting the stars by rsquare and picking the number you require

Something along the lines of this?

mask = (masked_rsqs >= rsq_min) & (masked_rsqs < 1.0) mask = np.argsort(masked_rsqs[mask]) # Sort by largest rsquare mask = mask[:min(min_references, len(mask))] # Only take the min_references largest

TL;DR: Let's check the minimum R^2 that gives us at least 2 stars. If it's >=0.15 we are happy, if it's <0.15 but >0.00, we should print a warning and try, if it's <0.01, we should raise. Like you said, we don't need a loop for this.

Well we are doing it greedily. Meaning that we change what it means to get "enough stars" at the same time as we lower R^2.

If we do as you suggest (which is more efficient), then we need to preselect the minimum acceptable R^2 value. In my opinion, that depends on the minimum acceptable stars said value gives. (Perhaps AIC/BIC would work better here anyway.)

I set it up here so that we first try to get at least 2 stars at our previous minimum R^2 = 0.2. (But higher is better, this all happens in the previous loop)

Then, in this loop, we soft reset and try to get at least 4 stars from R^2 = 0.15. If we don't, then we lower to 3 and 0.08. then 2 and 0.01 (worst case minimum acceptable scenario, but we only want this if 0.15 and 0.08 don't work.)

(Maybe it's easier to pre-calculate if you can code it that way?) But I think the pre-sort would work.

. Otherwise, we could just go to the R^2 value that will give us 2 stars. But we want to know whether there is an R^2 value that will give us more than 2 stars.

I guess it is just unclear to me what the goal is here...

If it is to get as many reference stars as possible which have a rsquare over 0.15, why not just take all the ones which are above 0.15 and then warn/stop if there are not enough?

If it is to get a set number of reference stars, then just sort by rsquare and pick the desired number of stars from the top of that list, and then warn/stop if there are not enough?

I agree it's hacky but the goal is to filter out low R^2 value stars, if possible, if not, go lower until we reach just enough stars. I think it's going to work better in the wcs checking version I'm working on...

We basically want both high R^2 and high number of stars. We can't really set the limit independently. The more we lower R^2, the higher chance of bad stars. But if we just do a high R^2 cutoff, we risk missing stars that we could have used, and having too few for the zero point statistics.

Just to illustrate:
If we can have 20 stars, we don't really want to use R^2 < 0.5. To minimize risk of bad stars.
If we can have 5 stars, we don't really want to use R^2 < 0.2, to minimize risk of bad stars but maximize number of stars.
if we can only have 3 stars, we are willing to go even lower in R^2. and so on.

I think we should do as you suggest with pre-computing. But this was my hacky solution to what I want to accomplish above. I'm sure there's a better way!

What I suggest is adding "improving the false star rejection" as an issue, and working on that when we get a chance. I think using some other goodness of fit criteria might work better for us. I'm not really willing to work more on this hacky solution at the moment as I think it will be redundant when we do so.

rhandberg · 2021-01-18T09:03:00Z

I will merge this now, but we should definitely create a new issue for improving this further.

emirkmo · 2021-01-19T13:32:03Z

I created #25 to track this. I think it will naturally come when we move to a more standard goodness of fit statistical test. Then we can use established values for that test to guide us.

rhandberg and others added 4 commits January 5, 2021 16:45

Merge branch 'devel'

cd58008

Reject references based on 2D gaussian fits

d0a9d79

Using R squared statistic as a test, reject based on the 2D gaussian fits we perform. Also use this to get a better initial estimate of FWHM.

Merge tag 'v0.3.2' into fix_daofind

8e7207c

Comment out pandas based plot of 2D Gaussian stars

ae7017b

emirkmo added 2 commits January 6, 2021 11:47

Remove test and debug code&comments

55c18d8

Add positions_g2d plot for new check step

5b08b36

Currently uses pandas dataframe, we should rewrite this to remove the pandas dependency!

emirkmo mentioned this pull request Jan 6, 2021

DAOFIND detection issues #16

Closed

emirkmo added 3 commits January 8, 2021 17:26

flake8

7691273

pep8 changes

e2c5924

undoing some overzealous PEP8 changes by PyCharm

f8cf12d

emirkmo and others added 3 commits January 10, 2021 12:31

pep8 pycharm fix

b7510be

Mostly cosmetic changes

5b741fb

Removed rogue comment

e5ccf2a

rhandberg added 2 commits January 11, 2021 11:41

Flake8 fixes

8d912cb

Another len to np.sum

2d6dcca

rhandberg reviewed Jan 11, 2021

View reviewed changes

rhandberg merged commit 1d02f5c into SNflows:devel Jan 18, 2021

emirkmo deleted the fix_daofind branch January 19, 2021 13:31

emirkmo mentioned this pull request Jan 19, 2021

Improve bad star rejection #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better removal of bad reference stars #20

Better removal of bad reference stars #20

emirkmo commented Jan 6, 2021

codecov-io commented Jan 6, 2021 •

edited

Loading

emirkmo commented Jan 8, 2021

rhandberg commented Jan 11, 2021

rhandberg commented Jan 11, 2021

emirkmo commented Jan 11, 2021

rhandberg Jan 11, 2021

emirkmo Jan 11, 2021 •

edited

Loading

emirkmo Jan 11, 2021

rhandberg Jan 11, 2021

rhandberg Jan 11, 2021

emirkmo Jan 11, 2021 •

edited

Loading

rhandberg Jan 11, 2021

emirkmo Jan 11, 2021 •

edited

Loading

emirkmo Jan 11, 2021

rhandberg commented Jan 18, 2021

emirkmo commented Jan 19, 2021

Better removal of bad reference stars #20

Better removal of bad reference stars #20

Conversation

emirkmo commented Jan 6, 2021

codecov-io commented Jan 6, 2021 • edited Loading

Codecov Report

emirkmo commented Jan 8, 2021

rhandberg commented Jan 11, 2021

rhandberg commented Jan 11, 2021

emirkmo commented Jan 11, 2021

rhandberg Jan 11, 2021

Choose a reason for hiding this comment

emirkmo Jan 11, 2021 • edited Loading

Choose a reason for hiding this comment

emirkmo Jan 11, 2021

Choose a reason for hiding this comment

rhandberg Jan 11, 2021

Choose a reason for hiding this comment

rhandberg Jan 11, 2021

Choose a reason for hiding this comment

emirkmo Jan 11, 2021 • edited Loading

Choose a reason for hiding this comment

rhandberg Jan 11, 2021

Choose a reason for hiding this comment

emirkmo Jan 11, 2021 • edited Loading

Choose a reason for hiding this comment

emirkmo Jan 11, 2021

Choose a reason for hiding this comment

rhandberg commented Jan 18, 2021

emirkmo commented Jan 19, 2021

codecov-io commented Jan 6, 2021 •

edited

Loading

emirkmo Jan 11, 2021 •

edited

Loading

emirkmo Jan 11, 2021 •

edited

Loading

emirkmo Jan 11, 2021 •

edited

Loading