Add a function for extracting a test statistic #24

ararslan · 2023-07-29T19:13:13Z

This is added to StatsAPI rather than to HypothesisTests since StatsAPI also houses HypothesisTest, pvalue, and other relevant functionality.

Defining this function will bring a resolution to the 7-year-old issue JuliaStats/HypothesisTests.jl#79, which has received a number of duplicates over the years, suggesting that it would be of general interest.

I considered the name statistic, and still rather prefer that (teststatistic has too many s's and t's 😩), but wasn't sure whether it was insufficiently descriptive. I'd be interested in hearing thoughts both on naming and on whether this should be required for HypothesisTest as pvalue is or whether it should be optional (what I have currently).

This is added to StatsAPI rather than to HypothesisTests since StatsAPI also houses `HypothesisTest`, `pvalue`, and other relevant functionality. Defining this function will bring a resolution to the 7-year-old issue JuliaStats/HypothesisTests.jl#79, which has received a number of duplicates over the years, suggesting that it would be of general interest.

codecov-commenter · 2023-07-29T19:14:10Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (64d7d28) 100.00% compared to head (27e6dc6) 100.00%.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #24   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            3         3           
  Lines           37        37           
=========================================
  Hits            37        37

Files Changed	Coverage Δ
src/StatsAPI.jl	`100.00% <ø> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ararslan · 2023-07-29T23:10:16Z

@nalimilan, did you have thoughts on the function name? I find statistic appealing. Looks like @babaq would prefer teststat (at least over teststatistic), which seems okay I guess but I generally like to avoid shortening words when possible.

nalimilan · 2023-07-30T14:27:51Z

I'm afraid statistic is too general so I prefer teststatistic or teststat. I'm not sure whether it's best to abbreviate or not, as we are quite inconsistent in that regard (e.g. confint vs. loglikelihood).

devmotion · 2023-07-31T16:26:51Z

I like teststatistic most, I think. I agree that statistic seems a bit too general, and I prefer not shortening the function name (but I'd be fine with teststat as well).

palday · 2023-07-31T16:27:07Z

statistic feels like it’s asking for a name collision and teststatistic is long and awkward. My ranking: teststat > statistic >> teststatistic

ararslan · 2023-07-31T16:52:30Z

statistic feels like it’s asking for a name collision

Only two registered packages define a function called statistic: Bootstrap and Hecke. Bootstrap's definition could/should extend this one from StatsAPI. Hecke defines statistic but doesn't use, document, or export it; it seems to be dead code.

MixedAnova, AnovaBase, and WildBootTests all define a teststat function.

Amusingly, HypothesisTests defines teststatistic, which I didn't realize. It's only defined for VarianceEqualityTest and isn't documented nor exported though.

ararslan · 2023-07-31T18:23:07Z

So actually, wouldn't the generality of statistic be a reasonable thing for an interface function? After all, the whole point of StatsAPI is for packages to share names. 😄 The meaning becomes unambiguous in the context of the type of the input.

palday · 2023-07-31T21:59:12Z

After a little bit of thinking, statistic actually sounds nice because it could also be used in other, non-testing contexts.

nalimilan · 2023-08-04T20:46:41Z

People often complain that we abuse generic functions by overloading them with methods which actually have little of nothing in common, so I though using a more specific name like teststat would be more appropriate.

Maybe ping the authors of the packages you mentioned to get their opinion? We definitely want all packages to use the same function so we need some of them to agree switching to the new function.

ararslan · 2023-08-04T22:18:19Z

People often complain that we abuse generic functions by overloading them with methods which actually have little of nothing in common

I was not aware of this. What are other examples?

ararslan · 2023-08-04T22:40:26Z

Maybe ping the authors of the packages you mentioned to get their opinion? We definitely want all packages to use the same function so we need some of them to agree switching to the new function.

I think present company have HypothesisTests covered and I don't think this is relevant for Hecke, but @yufongpeng for AnovaBase/MixedAnova, @droodman for WildBootTests, and @juliangehring for Bootstrap: hello! Thanks for contributing to the Julia ecosystem. We're thinking of introducing a function in StatsAPI which, if named generically, could be useful to your respective packages. For AnovaBase, MixedAnova, and WildBootTests, it would correspond to the function @yufongpeng and @droodman have called teststat. For Bootstrap, it would correspond to what @juliangehring has called statistic. Since your respective packages all have StatsAPI as a transitive dependency already (by way of Distributions for WildBootTests and StatsBase for the others), extending the function defined here would not require taking on additional dependencies that wouldn't already need to be loaded. If you would be interested in integrating in this way, we would love your input here! What would be your preferred name for this function? Current contenders are:

statistic
teststatistic
teststat

This was originally motivated by the need for a generic accessor function to extract the value of a test statistic from a HypothesisTest object but its scope does not need to be limited to that.

yufongpeng · 2023-08-05T02:11:02Z

Since this function is for test statistics, I prefer a more specific name.
teststat or teststatistic is good, but statistic is too general.

droodman · 2023-08-05T09:48:52Z

I agree. I'm happy to conform to any standards developed assuming it makes sense for my package.

…

On Fri, Aug 4, 2023, 10:11 PM Yu-Fong, Peng ***@***.***> wrote: Since this function is for test statistics, I prefer a more specific name. teststat or teststatistic is good, but statistic is too general. — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGB2Z2LUVMA6QXWRZTV33FDXTWTUDANCNFSM6AAAAAA24VSA3Q> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ararslan · 2023-08-06T18:56:59Z

Thank you @yufongpeng and @droodman for your input!

Since this function is for test statistics

It isn't necessarily, that was just initial use case that prompted this discussion. Another notable example is Bootstrap, which defines a statistic function that returns the function used to compute the statistic (which needn't be a test statistic) on each sample. For example, statistic(bootstrap(mean, randn(20), BasicSampling(100))) == mean. Bootstrap could extend statistic from StatsAPI but it probably wouldn't make sense to extend teststatistic/teststat as that's insufficiently general.

nalimilan · 2023-08-12T13:42:42Z

I was not aware of this. What are other examples?

Probably the most problematic function is fit, for which we don't even document possible arguments. But most other functions in StatsAPI have a well defined signature so that's really an exception.

ararslan requested a review from nalimilan July 29, 2023 19:13

ararslan mentioned this pull request Jul 29, 2023

[Feature Proposal] add new StatsAPI function to extract statistics JuliaStats/HypothesisTests.jl#306

Closed

nalimilan approved these changes Jul 29, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a function for extracting a test statistic #24

Add a function for extracting a test statistic #24

ararslan commented Jul 29, 2023

codecov-commenter commented Jul 29, 2023 •

edited

Loading

ararslan commented Jul 29, 2023

nalimilan commented Jul 30, 2023

devmotion commented Jul 31, 2023

palday commented Jul 31, 2023

ararslan commented Jul 31, 2023

ararslan commented Jul 31, 2023

palday commented Jul 31, 2023

nalimilan commented Aug 4, 2023

ararslan commented Aug 4, 2023

ararslan commented Aug 4, 2023

yufongpeng commented Aug 5, 2023

droodman commented Aug 5, 2023 via email

ararslan commented Aug 6, 2023

nalimilan commented Aug 12, 2023

Add a function for extracting a test statistic #24

Are you sure you want to change the base?

Add a function for extracting a test statistic #24

Conversation

ararslan commented Jul 29, 2023

codecov-commenter commented Jul 29, 2023 • edited Loading

Codecov Report

ararslan commented Jul 29, 2023

nalimilan commented Jul 30, 2023

devmotion commented Jul 31, 2023

palday commented Jul 31, 2023

ararslan commented Jul 31, 2023

ararslan commented Jul 31, 2023

palday commented Jul 31, 2023

nalimilan commented Aug 4, 2023

ararslan commented Aug 4, 2023

ararslan commented Aug 4, 2023

yufongpeng commented Aug 5, 2023

droodman commented Aug 5, 2023 via email

ararslan commented Aug 6, 2023

nalimilan commented Aug 12, 2023

codecov-commenter commented Jul 29, 2023 •

edited

Loading