readiness test (to catch bugs before catalog goes into image pipeline) #57

yymao · 2018-01-12T00:32:08Z

Now that we are preparing for Run 1.1, we should make sure there is no obvious bug (like incorrect units, labels, etc) in the catalog before the catalog goes into the image pipeline. @evevkovacs @katrinheitmann and I think it would be a good idea to have a "readiness test" that checks if the quantities that are used by CatSim have reasonable distributions (for example, max, min, mean, std, and histogram for visual inspection).

This test needs to be done by 1/19 (hopefully well before that).

evevkovacs · 2018-01-12T00:41:36Z

I am thinking that we need a generic test that takes a list of input quantities from the yaml config file and plots the histogram for each quantity, and reports the max, min, mean and std. devn I think I can code this up pretty quickly. This would be flexible since quantities can be added/removed from the list.
Other ideas?

katrinheitmann · 2018-01-12T00:50:23Z

I think this sounds very good. There should be also a test that then puts out a warning if the values are crazy?

…

On 1/11/18 4:41 PM, Eve Kovacs wrote: I am thinking that we need a generic test that takes a list of input quantities from the yaml config file and plots the histogram for each quantity, and reports the max, min, mean and std. devn I think I can code this up pretty quickly. This would be flexible since quantities can be added/removed from the list. Other ideas? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#57 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMQ9jATT1eSa0THMZJZnMccQc2XVyqc2ks5tJqpBgaJpZM4Rbp4I>.

rmandelb · 2018-01-12T00:50:36Z

That's a good idea. A few other ideas that should be quick:

Report the median in addition to mean (because median is more stable to outliers and a big difference between median and mean can sometimes be concerning, though it's OK for intrinsically skewed distributions).
Check for NaN/inf values using np.isnan and np.isinf, and report any problems.
You might want to have some means of outlier rejection, and report the outlier fraction. See for example the options on https://stackoverflow.com/questions/11686720/is-there-a-numpy-builtin-to-reject-outliers-from-a-list . You can then use the non-rejected values for settings the histogram range. (Otherwise you might have some quantity with a tiny outlier fraction the results in a fairly useless histogram. If you use outlier rejection, you learn about the existence of outliers but also still get a useful histogram of everything else.)

yymao · 2018-01-12T00:55:00Z

@rmandelb good suggestions, thanks!

@evevkovacs given the time scale, we should collaborate on this one. We can work together on the readiness-test branch (i.e., https://github.com/LSSTDESC/descqa/tree/readiness-test, you have push permission to it). And we should commit frequently to avoid conflicts.

evevkovacs · 2018-01-12T05:12:09Z

@rmandelb thanks for the outlier suggestion. I was worried about this issue. @yymao Yes, collaborating will be the fastest. I'll get started tomorrow morning and make a skeleton that we can flesh out. I was planning to put the acceptable ranges/tests for each quantity in a big dict (of dicts) with keys labelled by the quantity name. We can just keep adding as many checks as we want, depending on the quantity.

evevkovacs · 2018-01-17T19:52:39Z

@jchiang87 @rmandelb @yymao @katrinheitmann Here is an example of the readiness test page for checking ellipticities. https://portal.nersc.gov/project/lsst/descqa/v2/?run=2018-01-17_10 and details in (https://portal.nersc.gov/project/lsst/descqa/v2/?run=2018-01-17_10&test=Check_ellipticity.) If you click on the graph in the left panel, and enlarged version is diosplayed in the right panel. If you click on Check_protoDC2_test_summary.txt, the output is shown in the right panel. I mocked up some values for tests on the means and ranges for the histogrammed quantities. You can see that if any of the individual tests fail, then the the test fails.
The print out indicates if a quantity is out of range.
I'll post some other examples in a few minutes, I wanted to get your feed back on what I have so far. The test is written in a flexible way. The config files specifies groups of quantities that appear on the same plot and are subjected to the same test criteria. So, we should be able to make lists of whatever we want tested and just feed them to the test. More soon!

evevkovacs · 2018-01-17T20:04:31Z

@jchiang87 @rmandelb @yymao @katrinheitmann I just ran for the test out of the box for ellipticities, sizes and magnitudes. Here is the result:https://portal.nersc.gov/project/lsst/descqa/v2/?run=2018-01-17_11&test=Check_quantities Magnitudes worked. The plot for sizes needs some adjustment because I haven't done any outlier rejection as suggested by Rachel. And I seem to have a bug in the reporting, but this is so you can get an idea.

rmandelb · 2018-01-17T20:09:37Z

I see. So for example the mean ellipticity of the bulge was set to be in the range 0.2-0.8 for this test, and that's why it failed, but everything else related to ellipticity passed. (And I assume this test value will be changed.) This looks like a promising start, modulo the range problem on the size plot.

So one more question - I was just thinking about failure modes, and I thought there was an error at some point where all e2 values were equal to e1. Can we generalize the test to catch things like that? (I could also imagine that happening with e.g. magnitudes in different bands, if there is a band indexing error when creating the catalog.) For example, could we make part of the validation test for a given column (like e1) that the values in that column should not be numerically identical to the values in any other column? Is this worthwhile? It seems like it could be useful but I realize it's adding some complexity, since currently you are just testing specific quantities rather than comparing them to others.

evevkovacs · 2018-01-17T20:28:15Z

@rmandelb Yes, that's correct on the ellipticities. The test values will need to be vetted. I put in obvious ones, but will need guidance on others. Regarding generalizing the tests, those kinds of checks are possible. Right now, one could catch that kind of problem by looking at the histogram and seeing that 2 curves are on top of one another. That's one bonus of grouping like quantities together. In terms of what will be ready soon, we need to prioritize our effort into covering the most important checks first. It would be good to assemble a list of all the quantities that we want to check, so we can try out how that works in the present scheme and assemble the acceptable bounds and so on.

yymao · 2018-01-18T06:20:24Z

@rmandelb

Improving upon Eve's version, I've implemented a cleaner overview for the readiness test. See an example here

In the table on the right hand side, numbers in red indicate that they are outside the accepted ranges. Numbers in grey mean the accepted ranges are not specified. The accepted range will show up when mouse hovers on.

The test also detects potential identical quantities as you can see one of the example (here the duplication is due the inclusion of both native and derived quantities, it is just used as an example).

Corresponding plots are displayed on the left as usual, but sorted in the same order as in the table.

The accepted ranges are specified in the yaml config file. This is still a manual step that we have to complete.

Comments are welcome!

yymao · 2018-01-19T16:20:29Z

Posting the latest readiness test example in case people come here to check the status of the readiness test.

Further discussion should move to #60

yymao added the validation: extragalactic label Jan 12, 2018

yymao assigned yymao and evevkovacs Jan 12, 2018

yymao mentioned this issue Jan 13, 2018

Add readiness test #58

Merged

yymao closed this as completed in #58 Jan 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readiness test (to catch bugs before catalog goes into image pipeline) #57

readiness test (to catch bugs before catalog goes into image pipeline) #57

yymao commented Jan 12, 2018

evevkovacs commented Jan 12, 2018

katrinheitmann commented Jan 12, 2018 via email

rmandelb commented Jan 12, 2018

yymao commented Jan 12, 2018

evevkovacs commented Jan 12, 2018

evevkovacs commented Jan 17, 2018 •

edited

Loading

evevkovacs commented Jan 17, 2018

rmandelb commented Jan 17, 2018

evevkovacs commented Jan 17, 2018

yymao commented Jan 18, 2018

yymao commented Jan 19, 2018

readiness test (to catch bugs before catalog goes into image pipeline) #57

readiness test (to catch bugs before catalog goes into image pipeline) #57

Comments

yymao commented Jan 12, 2018

evevkovacs commented Jan 12, 2018

katrinheitmann commented Jan 12, 2018 via email

rmandelb commented Jan 12, 2018

yymao commented Jan 12, 2018

evevkovacs commented Jan 12, 2018

evevkovacs commented Jan 17, 2018 • edited Loading

evevkovacs commented Jan 17, 2018

rmandelb commented Jan 17, 2018

evevkovacs commented Jan 17, 2018

yymao commented Jan 18, 2018

yymao commented Jan 19, 2018

evevkovacs commented Jan 17, 2018 •

edited

Loading