Skip to content

Conversation

@ypark234
Copy link
Contributor

@ypark234 ypark234 commented Oct 8, 2019

This adds a Python script that runs two-sample t-test with two data sets.
t-test is a statistical calculation that can be used to determine if there is a statistically significant difference between two sample groups.
In this script, unpaired equal variance two-sample t-test will be conducted for comparison of data that are assumed to be taken from an identical distribution.

Future work:

  • Switch to Panda dataframe

@ypark234 ypark234 changed the title Script for t-test Python script for t-test Oct 8, 2019
@bam241
Copy link
Member

bam241 commented Oct 10, 2019

Maybe add some kind of readme to explain how to use and ref to the stats

@bam241
Copy link
Member

bam241 commented Oct 10, 2019

maybe separate snippet and unit_tests

in this PR or a separate PR, maybe setup CI-testing ?

Copy link
Member

@kkiesling kkiesling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other thoughts to add:

2-D/3-D histogram of the p-values for like a mesh
write out the actual set of p-values (connecting them to the data sets/dicts that already exist)

Suggested structure changes:

  1. expect users to provide two lists of data (not necessarily as a dictionary with keys) - perhaps two lists of tuples (data, standard error).
  2. calculate t-test and return the data set with all the stats (np structured array? see line 212)
  3. provide additional functions for writing that data set to a file and another function that can plot it in a histogram

Add a README

@gonuke
Copy link
Member

gonuke commented Jan 27, 2020

Are we waiting on CI for this?

@ypark234
Copy link
Contributor Author

ypark234 commented Jan 27, 2020

I am adding README and making last few changes, while waiting for CI config to be added by a separate PR.

@gonuke
Copy link
Member

gonuke commented Apr 10, 2020

Maybe I should clarify my earlier CI questions.... should I wait for CI to be implemented before merging this PR? Or is this something that can be merged prior to CI being implemented?

@ypark234
Copy link
Contributor Author

CI will be implemented in a separate PR.
I expect this PR to be merged after CI is properly added.

@ypark234
Copy link
Contributor Author

This PR is ready to be reviewed.
(CI test for the repo still needs to be triggered properly.)

Copy link
Member

@kkiesling kkiesling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very well done and thorough "snippet"! I mostly just have questions.

The README has a lot of theory information in it and I am wondering if it is really necessary? I appreciate all the information, but I at the same time I think that the README for a script should really just include the information needed for the users to run the script. So that would basically be just information on how the input should be structured and how to run and then what the results should look like. I think it's plenty to just refer users to a resource to learn about the t-test if they don't know what it is. Though I also appreciate that you took all the time and effort to put the resource together. Maybe the "usage" and the background information parts of the README just need to be two separate documents?

On a different note, I think it's awesome that you have this structured so that one could just import the modules in a python script or actually run your script with command line arguments. Allows for a lot of different use which is nice.

Arguments:
m1 (float): Mean of sample 1.
sd1 (float): Standard deviation of sample 1.
n1 (float): Size of sample 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theory understanding question: If this test is being run on a tally, is the sample size n the number of histories run in the simulation or only the number of histories that contributed to that tally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, it should be the number of histories that actually contributed to that tally,
which I assume is hard/impossible to know from MCNP output? (correct me if I'm wrong!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the total number of histories. The fact that some contribute a score of 0 is part of the distribution.

@kkiesling
Copy link
Member

@ypark234 do you want me to review this again or are we waiting on CI first?

@ypark234 ypark234 changed the base branch from master to main June 18, 2020 21:06
@ypark234 ypark234 changed the title Python script for t-test [WIP] Python script for t-test Jun 23, 2020
@ypark234
Copy link
Contributor Author

Sorry, let me do a final touch on this sooon and ask for possibly final review.

@ypark234 ypark234 force-pushed the t-test branch 2 times, most recently from ef68ee0 to 336fec6 Compare September 16, 2020 22:55
@ypark234 ypark234 changed the title [WIP] Python script for t-test Python script for t-test Sep 16, 2020
@ypark234
Copy link
Contributor Author

It is finally ready for another review & merge.

Copy link
Member

@kkiesling kkiesling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me

@kkiesling
Copy link
Member

pending approval from @bam241 and this could probably be merged?

@ypark234 ypark234 force-pushed the t-test branch 2 times, most recently from e98098a to 40f8da2 Compare March 2, 2021 01:46
@gonuke
Copy link
Member

gonuke commented Mar 2, 2021

Thanks @ypark234 - This is a great example of well-documented and complete code!

I think there are some design questions worth discussing, however, and might be great for a software meeting. Primarily, I think the twosample_ttest.py should have a minimal set of functions to perform a t-test on data with a well defined structure, or variations for different well-defined structures. The functions to load and format the data into those well-defined structures should live elsewhere like the script example you have provided.

For example, it seems like you have two well-defined structures right now: (1) a vector of data, (2) a grid of data. For each of these, I would anticipate:

  • a way to perform an element-wise t-test
  • a way to plot the t-value or p-value
  • a histogram of t-values or p-values

In addition to a script that demonstrates the basic functionality, we could also have scripts (different PRs) that apply this test to some standard data we use in CNERG, esp. MCNP mesh tallies.

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one little comment.

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small suggestion- thanks @ypark234

Copy link
Member

@gonuke gonuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ypark234 - these last changes look great!

@gonuke gonuke merged commit 1880a3b into cnerg:main Mar 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants