Example Notebook #29

hammannr · 2021-10-28T13:19:50Z

This PR adds an example notebook that can be used as a guide for using the package for the first time.

Additionally a few more small changes:

Improve the __str__ method of the GOFTestclass so that print(gof_test) provides a more readable result like:

GOFevaluation.gof_test
GOF measures: ADTestTwoSampleGOF, KSTestTwoSampleGOF, PointToPointGOF

ADTestTwoSampleGOF
gof = 1.6301454042304904
p-value = 0.08499999999999996

KSTestTwoSampleGOF
gof = 0.13999999999999996
p-value = 0.09799999999999998

PointToPointGOF
gof = -0.7324060759792504
p-value = 0.128
Add mybinder to the repo. This way one can give the package a try without installing it
Include coveralls to monitor code coverage (and improve it in the future)
Make linting a bit pickier to make sure code remains readable in the future (and remove my formatting sins from the past..)
Update the readme

…nto mybinder

WolfXeHD · 2021-12-27T19:45:08Z

Hi @hammannr
I think this PR is good. I only have one comment. I would promote the example notebook more in the README. Maybe the little "mybinder"-badge is not clear to everyone what that means.
How can I start the notebook in mybinder? When I try to launch it from the corresponding branch (with the badge) I think it pulls from master (and not from mybinder).
Will the formatting of the output (boldprint) be problematic when using the package from the command line?
Cheers,
Tim

hammannr · 2021-12-27T20:41:20Z

Hey @WolfXeHD , thanks a lot for checking! I wrote a sentence about how to use mybinder. The link of the badge will always open the git head of the repo (which I think makes sense). To open a specific branch you can specify it in the link. For the mybinder branch, you can e.g. just open https://mybinder.org/v2/gh/XENONnT/GOFevaluation/mybinder .
Regarding the bold printing: This should not be a problem in CL as long as you run python, right? I checked on midway and everything looked good. If you have any concerns though I can just remove it!

Cheers!

WolfXeHD · 2021-12-27T21:08:56Z

Hi @hammannr
many thanks for your comments.
For the notebook, I would change the color of "good fit" and "bad fit". - It is hardly readable in the legend. (maybe less alpha?)

pdf_0 = ref_model_0.pdf(bin_centers)
pdf_0 /= np.sum(pdf_0)
pdf_0 = pdf_0.reshape(np.shape(binned_data))
expectations_0 = pdf_0 * n_expectations

This code snippet I would either make part as a function in the notebook or in the utils of the package as it might be useful for users as well.

When thinking about it, the __str__ might be misleading: I think it is convention that when printing the object what you get is a string, which you can copy to create the object. - Maybe you move your printing to object.result (or something similar)? (If I am wrong with my assumption about conventions, everything is fine - just convince me.)

Is it expected that for the binned GOF-tests in the example, one gets a p-value of 0?

Should we maybe fix some random seed when generating data?

That's it for a quick first look.

hammannr · 2021-12-28T13:11:16Z

Hey @WolfXeHD , thanks a lot for catching this! No you are absolutely right, we should not expect a p-value of zero for the parent model.. I screwed up the transposing, reshaping and inverting of matrices. Now I think everything should be correct and the test produces a reasonable p-value.

I put the alpha of the reference samples to 1 and only plot the first 1000 so one can see what's going on in the plot.

I'm not so sure about the code snippet. It might be misleading as it only evaluates the pdf at the center of each bin. This is fine for a granular binning but introduces systematics in coarse binnings with large gradients of the PDF. I would prefer if the user takes care about getting the inputs right him/herself but we can discuss that!

The conventions for the dunder methods that I tried to follow here is that __str__ (returned for print(gof_object)) gives a pretty user-output while __repr__ (returned for gof_object) should be a helpful developer-output for debugging / rebuilding the object. For a gof_object this will be something like GOFevaluation.evaluators_nd.BinnedPoissonChi2GOF(dict_keys(['_name', 'gof', 'pvalue', 'pdf', 'binned_reference', 'binned_data'])), which is not sufficient to rebuild the object but probably still reasonably helpful. If you deal with a data sample with 1e6 data points, printing out everything is clearly not a good solution so I thought this might do but I'm open to discuss an improvement of these methods.

You mean fixing the random seed of the toy data used to get the empirical distribution of the test statistic under the null hypotheis? I'm afraid that this will give rise to artifacts and if your statistics is sufficiently large, the p-values should be the same anyways. I want to look into the binomial error of the p-value in the future. Returning this together with the p-value will probably be a better option than just artificially removing the fluctuation I think.

Cheers,
Robert

WolfXeHD · 2021-12-28T15:06:58Z

Hi @hammannr
many thanks for checking and adding further explanations. All sound good to me.
Cheers,
Tim

WolfXeHD

Good!

WolfXeHD · 2021-12-27T18:10:40Z

README.md

@@ -84,27 +95,38 @@ gof_object.get_gofs(d_min=d_min)
 # OUTPUT:
 # OrderedDict([('ADTestTwoSampleGOF', 1.6301454042304904),
 #              ('KSTestTwoSampleGOF', 0.14),
-#              ('PointToPointGOF', 0.00048491049630050576)])
+#              ('PointToPointGOF', -0.7324060759792504)])


Why does only PointToPointGOF change?

The definition of the test statistic was changed in #25 but I forgot to adapt the reader accordingly.

WolfXeHD · 2021-12-27T18:11:05Z

README.md

+
+# PointToPointGOF
+# gof = -0.7324060759792504
+# p-value = 0.128


New output is great!

WolfXeHD · 2021-12-27T19:37:51Z

requirements.txt

@@ -2,5 +2,3 @@ numpy
 scipy
 sklearn
 matplotlib
-flake8
-pytest


That's nice to get rid of these dependencies as we only need them for CI checking.

hammannr and others added 27 commits October 16, 2021 13:09

Update README.md

a431520

initialise notebook

19c09bf

Merge branch 'master' into mybinder

803b02a

update readme and history

56d25e9

add codefactor badge

7e45295

start filling notebook

af260c4

update tutorial notebook

cd0f0b5

add coveralls action

3da5f0b

Update python-package.yml

8c34f59

Update requirements.txt

a6c9b62

Update requirements.txt

da03326

Update python-package.yml

28674cb

Update README.md

c34c025

Update python-package.yml

446eea4

make flake8 happy

dc7ec1c

Merge branch 'mybinder' of https://github.com/XENONnT/GOFevaluation i…

1d1d65f

…nto mybinder

Update python-package.yml

8f6670a

Update requirements.txt

0d4f6c5

Update python-package.yml

e9df9e9

change gitignore

5823c13

Merge branch 'mybinder' of https://github.com/XENONnT/GOFevaluation i…

3207b5c

…nto mybinder

Update notebook

213a99b

fix bug & further improvement of __str__ method

57efaa8

Update Readme

c92cb64

Re-run notebook

3734fd3

Merge branch 'master' into mybinder

16389a2

Update README.md

c276ebe

hammannr changed the title ~~Mybinder~~ Example Notebook Dec 17, 2021

hammannr added the enhancement New feature or request label Dec 17, 2021

hammannr requested a review from WolfXeHD December 17, 2021 14:50

hammannr marked this pull request as ready for review December 17, 2021 17:41

hammannr added 2 commits December 27, 2021 21:32

Update README.md

968c046

Update README.md

a9f9d4a

hammannr added 2 commits December 28, 2021 06:45

Fix bug ind binned data

de8562b

forgot one line...

f98c22b

WolfXeHD approved these changes Dec 28, 2021

View reviewed changes

hammannr merged commit f1a3d87 into master Dec 28, 2021

hammannr deleted the mybinder branch December 28, 2021 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example Notebook #29

Example Notebook #29

hammannr commented Oct 28, 2021 •

edited

WolfXeHD commented Dec 27, 2021

hammannr commented Dec 27, 2021

WolfXeHD commented Dec 27, 2021

hammannr commented Dec 28, 2021

WolfXeHD commented Dec 28, 2021

WolfXeHD left a comment

WolfXeHD Dec 27, 2021

hammannr Dec 28, 2021

WolfXeHD Dec 27, 2021

WolfXeHD Dec 27, 2021

Example Notebook #29

Example Notebook #29

Conversation

hammannr commented Oct 28, 2021 • edited

WolfXeHD commented Dec 27, 2021

hammannr commented Dec 27, 2021

WolfXeHD commented Dec 27, 2021

hammannr commented Dec 28, 2021

WolfXeHD commented Dec 28, 2021

WolfXeHD left a comment

Choose a reason for hiding this comment

WolfXeHD Dec 27, 2021

Choose a reason for hiding this comment

hammannr Dec 28, 2021

Choose a reason for hiding this comment

WolfXeHD Dec 27, 2021

Choose a reason for hiding this comment

WolfXeHD Dec 27, 2021

Choose a reason for hiding this comment

hammannr commented Oct 28, 2021 •

edited