Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog/chi-square-test-of-independence-in-r/ #46

Closed
utterances-bot opened this issue Jan 7, 2021 · 9 comments
Closed

blog/chi-square-test-of-independence-in-r/ #46

utterances-bot opened this issue Jan 7, 2021 · 9 comments
Labels
comment Comments for the blog

Comments

@utterances-bot
Copy link

Chi-square test of independence in R - Stats and R

Learn when and how to use the Chi-square test of independence in R. See also how it works in practice and how to interpret the results of the Chi-square test

https://statsandr.com/blog/chi-square-test-of-independence-in-r/

@AntoineSoetewey AntoineSoetewey added the comment Comments for the blog label Jan 7, 2021 — with utterances
Copy link
Owner

Comment written by Herivelto Cordeiro dos Santos on November 30, 2020 11:55:35:

Hi Antoine! I realy liked your blog, I think is clear and focus! Let me ask you about this one... I have tried the scripts method #1 and #2 and I got different p-values. Do you know what could be the reason for that as they should be equal? My table is matrix (c(63,78,94,65), ncol=2).

@AntoineSoetewey
Copy link
Owner

Comment written by Herivelto Cordeiro dos Santos on November 30, 2020 11:55:35:

Hi Antoine! I realy liked your blog, I think is clear and focus! Let me ask you about this one... I have tried the scripts method #1 and #2 and I got different p-values. Do you know what could be the reason for that as they should be equal? My table is matrix (c(63,78,94,65), ncol=2).

Comment written by Antoine Soetewey on November 30, 2020 12:20:12:

Thank you for your feedback!

I just tried on my side, and I have the same p-values with your data, see my code here.

One potential reason you have different p-values is due to the fact that first method use the Yate's continuity correction by default. Add the argument correct = FALSE in the chisq.test() function to prevent from applying this continuity correction.

(I've added a note at the end of this section following your comment.)

Hope this helps.

Regards,
Antoine

Copy link

Clarice3 commented Jan 7, 2021

Hi Antoine! Thank you so much for your blog, it's very helpful! I have a question for you: I have a data frame with several (10) different categorical variables that I would like to test for possible correlations between each other. Is there a way that I can test them all at the same time, like you explained for the quantitative variables? Or is it really only possible for two variables at the same time? Not sure how I would do this for 10 variables...
Thanks in advance!
Regards,
Clarice

@AntoineSoetewey
Copy link
Owner

AntoineSoetewey commented Jan 7, 2021

Hi Antoine! Thank you so much for your blog, it's very helpful! I have a question for you: I have a data frame with several (10) different categorical variables that I would like to test for possible correlations between each other. Is there a way that I can test them all at the same time, like you explained for the quantitative variables? Or is it really only possible for two variables at the same time? Not sure how I would do this for 10 variables...
Thanks in advance!
Regards,
Clarice

Dear Clarice,

Do you want to compute correlation coefficients or perform chi-square tests? You mentioned correlations but you posted the comment on the article about chi-square test, so I'm not sure.

For correlation, if your categorical variables are ordinal, you can simply use cor(dat, method = "spearman"), where dat is the name of your dataframe. See more details in this article about correlation coefficient in R.

The standard Chi-square test for independence (with the chisq.test() function and presented in this article) is only possible between two categorical variables at the same time, so you'd need to tweak your code a bit to do it for all possible pairs of variables. Or if your dataset contains a relatively small number of variables, you can copy paste your code for each pair of variables.

Hope this helps.

Regards,
Antoine

@Clarice3
Copy link

Clarice3 commented Jan 11, 2021 via email

@AntoineSoetewey
Copy link
Owner

AntoineSoetewey commented Jan 12, 2021

Hey Antoine! Thank you so much for your quick answer. My goal is to find out whether the different variables correlate with each other or not, so I can exclude them before computing a model. From your blog I learned that it’s not possible to compute correlation coefficients between two categorical variables (if I understood that correctly?) but only to do a contingency analysis. Some of my categorical variables are ordinal with 3-4 levels, most of them are nominal though. I tried to use the corr() function that you suggested, but unfortunately I just can’t make it work with my R version… not sure why. So I’ll have to find another way I guess! Best wishes, Hanna

You understood correctly:

  • You can compute the correlation between your ordinal variables (thanks to the cor() function, with only one r and not two as you wrote in your comment),
  • But for your nominal variables, you cannot compute the correlation. You'll need to apply the Chi-square test of independence (with the chisq.test() function).

Regards,
Antoine

@Clarice3
Copy link

Clarice3 commented Jan 13, 2021 via email

Copy link

Hello Mr Antoine,
It was a nice article about Chi squared test.
I am trying to replicate your earlier blog on using t.test and ANOVA on multiple columns at the same time in the case of Chisquared test. But unable to.
I tried using looping
comparison1 <- lapply(df[, 1:4], function(x)t.test(x~df$var)) - this worked for t.test but not for chi squared test
Any suggestions?

@AntoineSoetewey
Copy link
Owner

AntoineSoetewey commented May 29, 2021

Hello Mr Antoine,
It was a nice article about Chi squared test.
I am trying to replicate your earlier blog on using t.test and ANOVA on multiple columns at the same time in the case of Chisquared test. But unable to.
I tried using looping
comparison1 <- lapply(df[, 1:4], function(x)t.test(x~df$var)) - this worked for t.test but not for chi squared test
Any suggestions?

Hello,

Here is a reproducible example using for loop:

df <- data.frame(sex = sample(c("male", "female"), size = 100, replace = TRUE),
                 smoke = sample(c("smoker", "non smoker"), size = 100, replace = TRUE),
                 sport = sample(c("athlete", "non athlete"), size = 100, replace = TRUE))

for (i in 2:ncol(df)) {
  print(names(df)[i])
  print(chisq.test(table(df[, 1], df[, i])))
}

Hope this helps.

Regards,
Antoine

Repository owner locked and limited conversation to collaborators Jun 3, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
comment Comments for the blog
Projects
None yet
Development

No branches or pull requests

4 participants