Spearman cor test #304

mapi1 · 2023-07-17T13:36:46Z

This PR adds the SpearmanCorrelationTest as suggested in #236.
For the confidence interval I took inspiration from this StackExchange thread and used the suggested variance estimator to counter the non-normal distribution of the ranks.
Unfortunately, I could not really add meaningful tests for it as R's cor.test does not give the intervals for Spearman correlation and uses another algorithm to calculate the p-value as well. Maybe someone has an idea here or knows a tool that can calculate this already correctly.

codecov-commenter · 2023-07-17T13:40:33Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.06 🎉

Comparison is base (932eaac) 93.75% compared to head (8869900) 93.81%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #304      +/-   ##
==========================================
+ Coverage   93.75%   93.81%   +0.06%     
==========================================
  Files          28       28              
  Lines        1729     1746      +17     
==========================================
+ Hits         1621     1638      +17     
  Misses        108      108

Impacted Files	Coverage Δ
src/correlation.jl	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

nalimilan

Thanks, looks mostly good!

I'm not sure which existing implementations could be used to check results. Have you looked at those mentioned by Wikipedia, as there are several which return p-values?

src/correlation.jl

nalimilan · 2023-07-30T10:04:07Z

src/correlation.jl

+"""
+    SpearmanCorrelationTest(x, y)
+
+Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the rank-based Spearman correlation 


Break all lines at 92 chars like in the CorrelationTest docstring. Also I would say "Spearman rank correlation" rather than "rank-based".

nalimilan · 2023-07-30T10:05:26Z

src/correlation.jl

+    end
+end
+
+testname(p::SpearmanCorrelationTest) =  "Spearman correlation"


Suggested change

testname(p::SpearmanCorrelationTest) = "Spearman correlation"

testname(p::SpearmanCorrelationTest) = "Spearman correlation"

src/correlation.jl

nalimilan · 2023-07-30T10:07:26Z

test/correlation.jl

+    let out = sprint(show, w)
+        @test occursin("reject h_0", out) && !occursin("fail to", out)
+    end
+    # let ci = confint(w)


Why is this commented out?

nalimilan · 2023-07-30T10:10:46Z

test/correlation.jl

+    #     @test first(ci) ≈ -0.1105478 atol=1e-6
+    #     @test last(ci) ≈ 0.0336730 atol=1e-6
+    # end
+    @test pvalue(x) ≈ 0.09275 atol=1e-2 # value from R's cor.test(..., method="spearman") which does not use a t test algorithm AS 89 


Can we find a more precise value to test against? This way of writing the test is misleading as even 0.09 would pass.

In the worst case, we should test with a lower tolerance against the value we return, and just note in a comment the value returned by R.

nalimilan · 2023-07-30T10:12:41Z

src/correlation.jl

+Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of:
+
+* small sample sizes n < 25


Suggested change

Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of:

* small sample sizes n < 25

Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation, which performs insufficiently in the case of:

* sample sizes below 25

nalimilan · 2023-07-30T10:13:01Z

src/correlation.jl

+[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.
+
+[2] A. J. Bishara and J. B. Hittner, “Confidence intervals for correlations when data are not normal,” Behav Res, vol. 49, no. 1, pp. 294–309, Feb. 2017, doi: 10.3758/s13428-016-0702-8.
+


Suggested change

nalimilan · 2023-07-30T10:18:00Z

src/correlation.jl

+Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of:
+
+* small sample sizes n < 25
+* a high true population Spearman correlation


According to the StackExchange thread this is more precisely:

Suggested change

* a high true population Spearman correlation

* a true population Spearman correlation above 0.95

It also mentions ordinal data. Did you omit it on purpose? I admit it's not super explicit.

nalimilan · 2023-07-30T10:23:47Z

src/correlation.jl

+In these cases a bootstrap confidence interval can perform better [2].
+
+# External resources
+[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.


Suggested change

[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.

[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating Pearson, Kendall and Spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.

mapi1 · 2023-08-01T11:52:08Z

Thanks for your detailed review! I tried to incorporate it as suggested.

I used the spearmanCI R library to get values for the CIs for testing. They suffer from the same problem as the p value, as they have a low number of matching significant digits. I still wrote the tests as a form of documentation and also added tests to compare against the vales we return to catch feature changes/bugs etc.

Also mention now the ordinal data as it is the main message of the Ruscio paper (Now added to docstring).

nalimilan

Sorry for the delay. Looks almost ready. I've just made a few more comments.

Regarding comparison of CIs against R, I hadn't realized spearmanCI doesn't implement the same CI method. In that case I don't think it makes sense to test against these values, as there are mathematically legitimate reasons to get different results. Maybe you could check against code that isn't included in a package such as this one instead? Then if that matches we can just test against the exact values we return to prevent regressions.

nalimilan · 2023-09-09T08:48:30Z

src/correlation.jl

+    dof(test) > 1 || return (-one(T), one(T))  # Otherwise we can get NaNs
+    q = quantile(Normal(), 1 - (1-level) / 2)
+    fisher = atanh(test.r)
+    bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000)


Suggested change

bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000)

# Estimates variance as in Bonett and Wright (2000)

bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q

nalimilan · 2023-09-09T08:54:54Z

src/correlation.jl


 function population_param_of_interest(p::CorrelationTest)
-    param = p.k != 0 ? "Partial correlation" : "Correlation"
+    param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation"


It seems that nobody says "partial Pearson correlation" even if that would sound more explicit.

Suggested change

param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation"

param = p.k != 0 ? "Partial correlation" : "Pearson correlation"

nalimilan · 2023-09-09T08:59:24Z

src/correlation.jl

+end
+
+function StatsAPI.confint(test::SpearmanCorrelationTest{T}, level::Float64=0.95) where T
+    dof(test) > 1 || return (-one(T), one(T))  # Otherwise we can get NaNs


Can you add a test for this case? Maybe also for other corner cases like having NaNs or Inf in the input.

nalimilan · 2023-09-09T09:51:25Z

src/correlation.jl

+Perform a t-test for the hypothesis that ``\\text{Cor}(x,y) = 0``, i.e. the Spearman rank
+correlation ρₛ of vectors `x` and `y` is zero.
+
+Implements `pvalue` for the t-test.


Suggested change

Implements `pvalue` for the t-test.

Implements `pvalue` for the t-test using the Fisher transformation.

nalimilan · 2023-09-09T09:57:05Z

test/correlation.jl

+        @test first(ci) ≈ -0.1333692 atol=1e-6
+        @test last(ci) ≈ 0.01065576 atol=1e-6
+    end
+    @test pvalue(x) ≈ 0.09274721 atol=1e-2 # value from R's cor.test(..., method="spearman") which does not use a t test algorithm AS 89 


AFAICT R will use AS 89 if you pass exact=TRUE, right? It would be good to test against an implementation which uses AS 89, even if we need to use another software.

MariusPille added 2 commits July 17, 2023 15:25

Add SpearmanCorrelationTest

8508cb1

Typo

8869900

nalimilan reviewed Jul 30, 2023

View reviewed changes

incorporate review

ed3313b

nalimilan reviewed Sep 9, 2023

View reviewed changes

nalimilan mentioned this pull request Sep 9, 2023

Making CorrelationTest nonparametric #236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spearman cor test #304

Spearman cor test #304

mapi1 commented Jul 17, 2023

codecov-commenter commented Jul 17, 2023 •

edited

nalimilan left a comment

nalimilan Jul 30, 2023

nalimilan Jul 30, 2023

nalimilan Jul 30, 2023

nalimilan Jul 30, 2023

nalimilan Jul 30, 2023

nalimilan Jul 30, 2023

nalimilan Jul 30, 2023

nalimilan Jul 30, 2023

mapi1 commented Aug 1, 2023

nalimilan left a comment

nalimilan Sep 9, 2023

nalimilan Sep 9, 2023

nalimilan Sep 9, 2023

nalimilan Sep 9, 2023

nalimilan Sep 9, 2023 •

edited

	testname(p::SpearmanCorrelationTest) = "Spearman correlation"
	testname(p::SpearmanCorrelationTest) = "Spearman correlation"

		Implements `confint` using an approximate confidence interval adjusting for the non-normality of the ranks based on [1]. This is still an approximation and which performs insufficient in the case of:

		* small sample sizes n < 25

		[1] D. G. Bonett and T. A. Wright, “Sample size requirements for estimating pearson, kendall and spearman correlations,” Psychometrika, vol. 65, no. 1, pp. 23–28, Mar. 2000, doi: 10.1007/BF02294183.

		[2] A. J. Bishara and J. B. Hittner, “Confidence intervals for correlations when data are not normal,” Behav Res, vol. 49, no. 1, pp. 294–309, Feb. 2017, doi: 10.3758/s13428-016-0702-8.

	* a high true population Spearman correlation
	* a true population Spearman correlation above 0.95

	bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q # Estimates variance as in Bonnet et al. (2000)
	# Estimates variance as in Bonett and Wright (2000)
	bound = sqrt((1 + test.r^2 / 2) / (dof(test)-1)) * q

	param = p.k != 0 ? "Partial Pearson correlation" : "Pearson correlation"
	param = p.k != 0 ? "Partial correlation" : "Pearson correlation"

	Implements `pvalue` for the t-test.
	Implements `pvalue` for the t-test using the Fisher transformation.

Spearman cor test #304

Are you sure you want to change the base?

Spearman cor test #304

Conversation

mapi1 commented Jul 17, 2023

codecov-commenter commented Jul 17, 2023 • edited

Codecov Report

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mapi1 commented Aug 1, 2023

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan Sep 9, 2023 • edited

Choose a reason for hiding this comment

codecov-commenter commented Jul 17, 2023 •

edited

nalimilan Sep 9, 2023 •

edited