Python script for t-test #1

ypark234 · 2019-10-08T22:46:57Z

This adds a Python script that runs two-sample t-test with two data sets.
t-test is a statistical calculation that can be used to determine if there is a statistically significant difference between two sample groups.
In this script, unpaired equal variance two-sample t-test will be conducted for comparison of data that are assumed to be taken from an identical distribution.

Future work:

Switch to Panda dataframe

bam241 · 2019-10-10T20:26:43Z

Maybe add some kind of readme to explain how to use and ref to the stats

bam241 · 2019-10-10T20:29:23Z

maybe separate snippet and unit_tests

in this PR or a separate PR, maybe setup CI-testing ?

kkiesling

Other thoughts to add:

2-D/3-D histogram of the p-values for like a mesh
write out the actual set of p-values (connecting them to the data sets/dicts that already exist)

Suggested structure changes:

expect users to provide two lists of data (not necessarily as a dictionary with keys) - perhaps two lists of tuples (data, standard error).
calculate t-test and return the data set with all the stats (np structured array? see line 212)
provide additional functions for writing that data set to a file and another function that can plot it in a histogram

Add a README

stats/t_test.py

gonuke · 2020-01-27T23:53:16Z

Are we waiting on CI for this?

ypark234 · 2020-01-27T23:57:25Z

I am adding README and making last few changes, while waiting for CI config to be added by a separate PR.

gonuke · 2020-04-10T20:39:57Z

Maybe I should clarify my earlier CI questions.... should I wait for CI to be implemented before merging this PR? Or is this something that can be merged prior to CI being implemented?

ypark234 · 2020-04-10T20:49:27Z

CI will be implemented in a separate PR.
I expect this PR to be merged after CI is properly added.

ypark234 · 2020-04-19T14:29:28Z

This PR is ready to be reviewed.
(CI test for the repo still needs to be triggered properly.)

kkiesling

This is a very well done and thorough "snippet"! I mostly just have questions.

The README has a lot of theory information in it and I am wondering if it is really necessary? I appreciate all the information, but I at the same time I think that the README for a script should really just include the information needed for the users to run the script. So that would basically be just information on how the input should be structured and how to run and then what the results should look like. I think it's plenty to just refer users to a resource to learn about the t-test if they don't know what it is. Though I also appreciate that you took all the time and effort to put the resource together. Maybe the "usage" and the background information parts of the README just need to be two separate documents?

On a different note, I think it's awesome that you have this structured so that one could just import the modules in a python script or actually run your script with command line arguments. Allows for a lot of different use which is nice.

t_test/README.md

t_test/twosample_ttest.py

kkiesling · 2020-04-20T19:11:11Z

t_test/twosample_ttest.py

+    Arguments:
+        m1 (float): Mean of sample 1.
+        sd1 (float): Standard deviation of sample 1.
+        n1 (float): Size of sample 1.


Theory understanding question: If this test is being run on a tally, is the sample size n the number of histories run in the simulation or only the number of histories that contributed to that tally?

In theory, it should be the number of histories that actually contributed to that tally,
which I assume is hard/impossible to know from MCNP output? (correct me if I'm wrong!)

I think this should be the total number of histories. The fact that some contribute a score of 0 is part of the distribution.

t_test/twosample_ttest.py

kkiesling · 2020-05-15T18:46:29Z

@ypark234 do you want me to review this again or are we waiting on CI first?

ypark234 · 2020-09-10T18:03:55Z

Sorry, let me do a final touch on this sooon and ask for possibly final review.

ypark234 · 2020-09-16T22:59:43Z

It is finally ready for another review & merge.

kkiesling

This looks good to me

kkiesling · 2020-10-14T02:52:43Z

pending approval from @bam241 and this could probably be merged?

gonuke · 2021-03-02T13:38:05Z

Thanks @ypark234 - This is a great example of well-documented and complete code!

I think there are some design questions worth discussing, however, and might be great for a software meeting. Primarily, I think the twosample_ttest.py should have a minimal set of functions to perform a t-test on data with a well defined structure, or variations for different well-defined structures. The functions to load and format the data into those well-defined structures should live elsewhere like the script example you have provided.

For example, it seems like you have two well-defined structures right now: (1) a vector of data, (2) a grid of data. For each of these, I would anticipate:

a way to perform an element-wise t-test
a way to plot the t-value or p-value
a histogram of t-values or p-values

In addition to a script that demonstrates the basic functionality, we could also have scripts (different PRs) that apply this test to some standard data we use in CNERG, esp. MCNP mesh tallies.

gonuke

Here is one little comment.

t_test/twosample_ttest.py

Co-authored-by: Paul Wilson <paul.wilson@wisc.edu>

gonuke

Just one small suggestion- thanks @ypark234

scripts/run_twosample_ttest.py

gonuke

Thanks @ypark234 - these last changes look great!

ypark234 changed the title ~~Script for t-test~~ Python script for t-test Oct 8, 2019

ypark234 force-pushed the t-test branch from 939d65f to 90bab21 Compare October 8, 2019 23:03

kkiesling reviewed Oct 10, 2019

View reviewed changes

stats/t_test.py Outdated Show resolved Hide resolved

stats/t_test.py Outdated Show resolved Hide resolved

stats/t_test.py Outdated Show resolved Hide resolved

ypark234 force-pushed the t-test branch from a76fe1b to 2113308 Compare April 19, 2020 14:10

kkiesling reviewed Apr 20, 2020

View reviewed changes

kkiesling mentioned this pull request Jun 16, 2020

jupyter notebook for heatmap snippet, with three options #6

Open

ypark234 changed the base branch from master to main June 18, 2020 21:06

ypark234 force-pushed the t-test branch from cd34a98 to 8b75f27 Compare June 23, 2020 15:41

ypark234 changed the title ~~Python script for t-test~~ [WIP] Python script for t-test Jun 23, 2020

ypark234 force-pushed the t-test branch 2 times, most recently from ef68ee0 to 336fec6 Compare September 16, 2020 22:55

ypark234 changed the title ~~[WIP] Python script for t-test~~ Python script for t-test Sep 16, 2020

ypark234 requested review from bam241, gonuke and kkiesling September 16, 2020 22:59

kkiesling approved these changes Sep 22, 2020

View reviewed changes

ypark234 added 3 commits October 20, 2020 12:52

Initial commit for t-test script

492f5ea

Add pytest file for the script

fcc8a7f

Print out more information on rejected cases

772ccaa

ypark234 added 6 commits October 20, 2020 12:54

Trigger tests on t-test module in CI

c807ae8

Add project root dir to PYTHONPATH for Python module import

9c49aae

Minor ex instruction update

594f271

Switch pytest trigger method

d324843

Minor fix

5bd85b0

Add Docker Hub auth on new test

1e79db0

ypark234 force-pushed the t-test branch from 5846b84 to 1e79db0 Compare October 20, 2020 17:55

ypark234 added 3 commits March 1, 2021 18:43

Bring t-test module up to date - see cnerg/UWNR#371

326a1a2

Bring t-test script up to date - see cnerg/UWNR#371

15dc8d1

Minor fixes and formattings

40f8da2

ypark234 force-pushed the t-test branch 2 times, most recently from e98098a to 40f8da2 Compare March 2, 2021 01:46

init in t_test unnecessary

120215a

gonuke requested changes Mar 4, 2021

View reviewed changes

t_test/twosample_ttest.py Outdated Show resolved Hide resolved

ypark234 and others added 9 commits March 4, 2021 16:16

One-line boolean

9cc39fa

Co-authored-by: Paul Wilson <paul.wilson@wisc.edu>

Move customizable functions into template script

6a2624a

Import with alias instead of wildcard

1093245

Add a new argument for reject_only

62bfb2f

Fix errors

1dc6622

Giving the rest defaults

611c5e8

Minor formatting

870fa05

Update README

da3cce7

Remove deprecated module

c3307af

gonuke requested changes Mar 9, 2021

View reviewed changes

scripts/run_twosample_ttest.py Outdated Show resolved Hide resolved

ypark234 added 3 commits March 9, 2021 12:17

Rename data loading function

25f0b9f

Utilize numpy array to generalize slicing

b1bfdf6

Add comments

a3540e8

gonuke approved these changes Mar 14, 2021

View reviewed changes

gonuke merged commit 1880a3b into cnerg:main Mar 14, 2021

Python script for t-test #1

Python script for t-test #1

Uh oh!

Conversation

ypark234 commented Oct 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bam241 commented Oct 10, 2019

Uh oh!

bam241 commented Oct 10, 2019

Uh oh!

kkiesling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gonuke commented Jan 27, 2020

Uh oh!

ypark234 commented Jan 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gonuke commented Apr 10, 2020

Uh oh!

ypark234 commented Apr 10, 2020

Uh oh!

ypark234 commented Apr 19, 2020

Uh oh!

kkiesling left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kkiesling Apr 20, 2020

Choose a reason for hiding this comment

Uh oh!

ypark234 Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

gonuke Apr 25, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kkiesling commented May 15, 2020

Uh oh!

ypark234 commented Sep 10, 2020

Uh oh!

ypark234 commented Sep 16, 2020

Uh oh!

kkiesling left a comment

Choose a reason for hiding this comment

Uh oh!

kkiesling commented Oct 14, 2020

Uh oh!

gonuke commented Mar 2, 2021

Uh oh!

gonuke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gonuke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gonuke left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

ypark234 commented Oct 8, 2019 •

edited

Loading

ypark234 commented Jan 27, 2020 •

edited

Loading