-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculating a p-value for a Pearson's correlation #40
Comments
@kovasap this is something which would be an excellent addition to the library. There are a handful of significance tests already in the I've sketched a solution (using (def data [{:x 1 :y 2} {:x 2 :y 2} {:x 3 :y 4} {:x 4 :y 4}])
(require '[kixi.stats.core :as kixi]
'[kixi.stats.math :refer [sq sqrt]]
'[kixi.stats.digest :as digest]
'[kixi.stats.distribution :as dist]
'[kixi.stats.test :as t])
(defn correlation-test
[fx fy]
(kixi/post-complete
(digest/sum-squares fx fy)
(fn [{:keys [n ss-x ss-y ss-xy]}]
(let [denominator (sqrt (* ss-x ss-y))]
(when-not (zero? denominator)
(let [r (/ ss-xy denominator)
degrees-of-freedom (- n 2)
t-stat (/ (* r (sqrt degrees-of-freedom))
(sqrt (- 1 (sq r))))]
{:correlation r
:test (t/test-result t-stat (dist/t {:v degrees-of-freedom}))}))))))
(def result
(transduce identity (correlation-test :x :y) data))
;; Return the calculated correlation
(-> result :correlation)
;; Return the p-value from the test result
(-> result :test t/p-value)
;; Perform a two-tailed test of significance given 5% alpha
(-> result :test (t/significant? 0.05 :<>)) |
Thanks for the detailed response! By count of rows do you mean the number of data points used to compute the correlation ( If/when I end up writing a test for |
That's right. The number of observations is required for calculating significance and this manifests as the t distribution degrees of freedom. For this reason I haven't used It would be good to consider a consistent strategy for optionally returning significance tests alongside statistics. PRs welcome! |
Ok got it. I'm in the process of getting this working in my hands for kixi.stats/src/kixi/stats/test.cljc Lines 10 to 28 in 7dbc5fd
Specifically, I'm wondering what the difference is between the p-value calculation and the significant? calculation. Based on my (mediocre) understanding of statistics, I would expect the significant? calculation to just check if the p-value is below the given value (0.05 in your case). Instead it seems to be using a different distribution to calculate it than the one used for p-value ?
|
That's a fair question: it would be possible to leverage the p-value calculation in order to determine significance at a given alpha. Given a test statistic, a test distribution, and a desired alpha, we can determine significance in either of at least two ways:
It happens that the latter approach seems a little more straightforward to me, and I think this is borne out by the complexity of the implementations of |
Got it that makes sense. I'm asking because I want to display p-values in a table I'm making and I want to make sure that it's accurate to say that p-values less than my desired alpha are significant (so I can color/sort them appropriately). I wanted to make sure that would give the same results as using |
I'm interested in calculating a p-value for a correlation that I'm calculating using
kixi.stats.core/correlation
. I'm curious if a function to do this already exists in this library, or if functions that would help do this exist. For example, based on the description at https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/how-to/correlation/methods-and-formulas/, I need to be able to generate a t distribution (I think?) in order to calculate this p-value.Or maybe it's possible to calculate this p-value from the error functions that already exist in kixi.stats?
If this is not possible and/or if there is a better way to do this, I'm happy to hear about it!
The text was updated successfully, but these errors were encountered: