-
Notifications
You must be signed in to change notification settings - Fork 7
Python script for t-test #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Maybe add some kind of readme to explain how to use and ref to the stats |
|
maybe separate snippet and unit_tests in this PR or a separate PR, maybe setup CI-testing ? |
kkiesling
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other thoughts to add:
2-D/3-D histogram of the p-values for like a mesh
write out the actual set of p-values (connecting them to the data sets/dicts that already exist)
Suggested structure changes:
- expect users to provide two lists of data (not necessarily as a dictionary with keys) - perhaps two lists of tuples (data, standard error).
- calculate t-test and return the data set with all the stats (np structured array? see line 212)
- provide additional functions for writing that data set to a file and another function that can plot it in a histogram
Add a README
|
Are we waiting on CI for this? |
|
I am adding README and making last few changes, while waiting for CI config to be added by a separate PR. |
|
Maybe I should clarify my earlier CI questions.... should I wait for CI to be implemented before merging this PR? Or is this something that can be merged prior to CI being implemented? |
|
CI will be implemented in a separate PR. |
|
This PR is ready to be reviewed. |
kkiesling
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very well done and thorough "snippet"! I mostly just have questions.
The README has a lot of theory information in it and I am wondering if it is really necessary? I appreciate all the information, but I at the same time I think that the README for a script should really just include the information needed for the users to run the script. So that would basically be just information on how the input should be structured and how to run and then what the results should look like. I think it's plenty to just refer users to a resource to learn about the t-test if they don't know what it is. Though I also appreciate that you took all the time and effort to put the resource together. Maybe the "usage" and the background information parts of the README just need to be two separate documents?
On a different note, I think it's awesome that you have this structured so that one could just import the modules in a python script or actually run your script with command line arguments. Allows for a lot of different use which is nice.
| Arguments: | ||
| m1 (float): Mean of sample 1. | ||
| sd1 (float): Standard deviation of sample 1. | ||
| n1 (float): Size of sample 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theory understanding question: If this test is being run on a tally, is the sample size n the number of histories run in the simulation or only the number of histories that contributed to that tally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, it should be the number of histories that actually contributed to that tally,
which I assume is hard/impossible to know from MCNP output? (correct me if I'm wrong!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be the total number of histories. The fact that some contribute a score of 0 is part of the distribution.
|
@ypark234 do you want me to review this again or are we waiting on CI first? |
|
Sorry, let me do a final touch on this sooon and ask for possibly final review. |
ef68ee0 to
336fec6
Compare
|
It is finally ready for another review & merge. |
kkiesling
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me
|
pending approval from @bam241 and this could probably be merged? |
e98098a to
40f8da2
Compare
|
Thanks @ypark234 - This is a great example of well-documented and complete code! I think there are some design questions worth discussing, however, and might be great for a software meeting. Primarily, I think the For example, it seems like you have two well-defined structures right now: (1) a vector of data, (2) a grid of data. For each of these, I would anticipate:
In addition to a script that demonstrates the basic functionality, we could also have scripts (different PRs) that apply this test to some standard data we use in CNERG, esp. MCNP mesh tallies. |
gonuke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is one little comment.
Co-authored-by: Paul Wilson <paul.wilson@wisc.edu>
gonuke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small suggestion- thanks @ypark234
gonuke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ypark234 - these last changes look great!
This adds a Python script that runs two-sample t-test with two data sets.
t-test is a statistical calculation that can be used to determine if there is a statistically significant difference between two sample groups.
In this script, unpaired equal variance two-sample t-test will be conducted for comparison of data that are assumed to be taken from an identical distribution.
Future work: