Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P-values less than threshold values #156

Closed
0todd0000 opened this issue Sep 14, 2021 · 12 comments
Closed

P-values less than threshold values #156

0todd0000 opened this issue Sep 14, 2021 · 12 comments

Comments

@0todd0000
Copy link
Owner

Redirected from #154

In all the examples shown the shaded area was where p was less than threshold values. Any examples on where the shaded area is greater than and associated explanation?

@0todd0000
Copy link
Owner Author

(I think this is referring to either the threshold directionality, or cluster-specific p values, but if I have misinterpreted please clarify.)


Threshold directionality:
The threshold is one-directional. Either the test statistic crosses the the threshold or it does not. If it crosses the threshold, then one or more suprathreshold clusters (i.e., shaded areas) emerge. If it does not cross the threshold, then no suprathreshold clusters exist.


Maximum cluster-level p-value:
The maximum cluster-level p-value is alpha. The critical threshold is calculated based on alpha: the probability that a smooth, 1D Gaussian process will produce a test statistic whose maximum value exceeds that threshold is alpha. Thus, if the test statistic for a particular dataset just touches the threshold (and produces an infinitely small cluster), that cluster's associated p-value is alpha. The larger the cluster, the smaller its p-value.



The probability associated with the test statistic maximum itself can be greater than alpha, but this probability is usually not reported in the SPM literature. Please consider the figure below, which is referenced from spm1d.org/rft1d/Theory. This shows the probability that the the test statistic maximum (z_max) will reach an arbitrary threshold (u) for different smoothness values (FWHM). As you can see, this probability can be greater than alpha=0.05. But usually only the alpha=0.05 threshold is reported, because this is the quantity that is most important to hypothesis testing.

@rude10
Copy link

rude10 commented Sep 15, 2021

20210915_025444

This is based on a non parametric t- test of a small sample continuous data set. The spm ensemble plot did look as though there were some differences but that was result of test.

@0todd0000
Copy link
Owner Author

OK. Do you have a question about the results?

@rude10
Copy link

rude10 commented Sep 15, 2021

Yes, in this case is this saying that at 90-100% of the movement there was not a significant difference since the p value was greater than 0.05? And what does that then say about describing the rest of the curve? I asked since as you mentioned earlier there is a lack of confidence with confirming certain assumptions when going the non-parametric route. In the instances so far when I've seen this the parametric test has not returned a curve which crosses any thresholds. So I'm seeking some clarity on how this scenario is reported when reporting.

@0todd0000
Copy link
Owner Author

Something is very strange with these results, the p value shouldn't be bigger than alpha.

Can you please re-run the code below, then copy-and-paste the output of the disp(spmi) command into this issue?

spm  = spm1d.stats.ttest_paired(Y1, Y2);
spmi = spm.inference(0.05, 'two_tailed',true, 'interp',true);
disp( spmi )

@rude10
Copy link

rude10 commented Sep 15, 2021

I agree Todd. Results below:

Parametric results:

SPM{t} inference
z: [1×101 double]
df: [1 3]
fwhm: 10.5179
resels: [1 9.5076]
alpha: 0.0500
zstar: 16.3811
h0reject: 0
p_set: 1
p: []

@0todd0000
Copy link
Owner Author

Two points:

  1. (Major point) Since p is empty, and since h0reject is 0, these results imply that the t statistic does not cross the critical threshold. Please send the output for the case depcited in the figure above, where the p value is 0.063.
  2. (Minor point) Very small sample size: it looks like the sample size is just 4. Statistical analysis is possible, but may be under-powered. (This point is irrelevant to the p = 0.063 problem.)

@rude10
Copy link

rude10 commented Sep 16, 2021

Hi Todd. See below for the non-parametric test:

### Code I'm running:

%(1) Conduct non-parametric test:
rng(0)
alpha = 0.05;
two_tailed = false;
iterations = -1;
snpm = spm1d.stats.nonparam.ttest_paired(s1, s2);
snpmi = snpm.inference(alpha, 'two_tailed', two_tailed, 'iterations', iterations);
disp('Non-Parametric results')
disp( snpmi )

Non-Parametric results:

SnPM{t} inference (1D)
z: [1×101 double]
nPermUnique: 16
nPermActual: 16
alpha: 0.0500
zstar: 7.3780
h0reject: 1
p: 0.0625

@0todd0000
Copy link
Owner Author

Thank you very much for this output, this clarifies the problem.

The problem is that there are only 16 unique iterations, so the minimum probability is pmin = 1/16 which is 0.0625. In other words, the dataset is too small to allow for smaller p values when using spm1d's nonparametric permutation approach.

To check this, try artificially doubling the dataset size:

s1 = [s1; s1];
s2 = [s2; s2];
snpm = spm1d.stats.nonparam.ttest_paired(s1, s2);

You should then see that the p-value falls below 0.05.

Only artificially increase the sample size this way for debugging purposes; do not do this for reporting purposes.

I will need to add a warning message to the software that warns users if the minimum p-value is greater than alpha. I'll work on this bug-fix and I'll update the software tomorrow.

Thank you for reporting this problem!

@rude10
Copy link

rude10 commented Sep 16, 2021

Hi Todd,

Thanks for that. yes, I had actually already tried the doubling aspect when I got the error for normality being tied to 8 observations to just see the code response.

I thought that the non-parametric took care of the potential lack of normal distribution and the potentially small sample since my understanding is that they utilize some measure of wilcoxon ranked test under the hood and normality using shapiro-wilk. My question then is given the small sample sizes, then are you saying that these tests cannot accurately provide a result that I can potentially report in literature on and I would need to use SPSS or some other tool?

@0todd0000
Copy link
Owner Author

I thought that the non-parametric took care of the potential lack of normal distribution and the potentially small sample since my understanding is that they utilize some measure of wilcoxon ranked test under the hood and normality using shapiro-wilk.

spm1d uses a nonparametric permutation technique. See Fig.4 and Appendix A in this article (free download available). This test is not a rank test, but performs quite well, especially for moderate-to-large sample sizes.



My question then is given the small sample sizes, then are you saying that these tests cannot accurately provide a result that I can potentially report in literature on and I would need to use SPSS or some other tool?

The tests are accurate, the problem is power. No technique (parametric or nonparametric, including those in SPSS) can give powerful, stable results for such small sample sizes. If you report results for a sample size of N=4 in a submitted paper, editors and/or reviewers will likely flag the result as problematic due to small sample size, especially if there is no clear a priori justification for that sample size.

Considering the figure below (from this site), it is clear the a sample size of N=4 is insufficient for reaching the standard power level of 0.8, unless the hypothesized effect is extremely large.

So my suggestion is: conduct a priori power analysis to determine the approximate sample size that you need. I suspect that the sample size would be in the range 8-to-12 for moderate effects and approximately 20 for smaller effects.

@rude10
Copy link

rude10 commented Sep 16, 2021

Hi Todd

Understood, thanks much for the additional resources. I have already gotten approval for my sample size, so I am just looking for the best test(s) that suit my needs in terms of reporting but will consider all the above accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants