Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev #55

Merged
merged 23 commits into from
Mar 25, 2020
Merged

dev #55

merged 23 commits into from
Mar 25, 2020

Conversation

DominiqueMakowski
Copy link
Member

No description provided.

@codecov-io
Copy link

codecov-io commented Mar 22, 2020

Codecov Report

Merging #55 into master will increase coverage by 0.61%.
The diff coverage is 81.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
+ Coverage   73.69%   74.30%   +0.61%     
==========================================
  Files          29       31       +2     
  Lines         764      833      +69     
==========================================
+ Hits          563      619      +56     
- Misses        201      214      +13     
Impacted Files Coverage Δ
R/correlation.R 78.30% <ø> (ø)
R/simulate_simpson.R 0.00% <0.00%> (ø)
R/utils_bootstrapping.R 0.00% <ø> (ø)
R/z_fisher.R 0.00% <ø> (ø)
R/cor_test_distance.R 59.21% <42.85%> (-1.61%) ⬇️
R/utils_find_correlationtype.R 80.55% <80.55%> (ø)
R/cor_test_biserial.R 88.09% <88.09%> (ø)
R/cor_test.R 84.78% <100.00%> (+0.78%) ⬆️
R/cor_test_freq.R 76.92% <100.00%> (ø)
R/cor_test_tetrachoric.R 90.90% <100.00%> (-2.20%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19f73b8...81393c9. Read the comment docs.

@DominiqueMakowski
Copy link
Member Author

@strengejacke @mattansb @IndrajeetPatil although JOSS submissions are closed anyway for now, I just felt inspired to initialize the paper. It can be fast and light work since the scope is quite narrow. Feel free to expand and add details and things.

I'd like to get a nice main figure, but I'm not sure how it would look like... ideas?

@mattansb
Copy link
Member

I really like the last two plots here:
https://easystats.github.io/see/articles/correlation.html

@DominiqueMakowski
Copy link
Member Author

I really like the last two plots here:

yeah, these will go in as part of the examples, but I was thinking about like a main figure 1 that would summarize the goal and features of the package... ^^

Images that come to my mind are:

  • a scatterplot showing a slightly non-linear relationship between two variables
    • some obvious outliers
  • And on it, or on the side or something, somehow show the output (but what? how?) of the different methods for correlations... The estimate's value as barplots on the side? behind the scatterplot?
  • Or one scatter plot repeated for each method, and for each method, we visualize what the method does. For instance, for the rank transformed, we show each point mirroring into its rank position (like a translation). For the percentage bend, we show the extreme points as transparent (suggesting they are not "included"), for Bayesian we show all of the posterior draws etc. But then this gets impossible to do for all I think.

So I don't know...

@mattansb
Copy link
Member

Hmmm.. seeing as how the methods aren't new, I am inclined to stick to plots we can actually show with our package/s?

But I'm also not sure what would be "nice" and what would be "informative" about the pkg... 🤷‍♂️

@DominiqueMakowski
Copy link
Member Author

For bayestestR, we also had in the intro nice plots to "visualize" HDIs and BFs etc., although these were not new methods either, and then in the examples, we had an example of what you could obtain using it.

Here I have in mind the one figure that will be the illustration of the paper...

#' @keywords internal
.cor_test_biserial_biserial <- function(data, x, y, continuous, binary, ci){

# TODO: get rid off psych https://www.statisticshowto.datasciencecentral.com/point-biserial-correlation/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strengejacke @mattansb the maths geniuses there's an easy formula https://www.statisticshowto.datasciencecentral.com/point-biserial-correlation/ for the biserial correlation so that we don't depend on psych... care to have a look? 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe like:

set.seed(123)
y <- rbinom(100, 1, .3)
x <- rnorm(100)

m1 <- mean(x[y == 1])
m0 <- mean(x[y == 0])
sn <- sd(x)
q <- mean(y)
p <- 1 - q

((m1 - m0) / sn) * sqrt(p * q)
#> [1] 0.06151908


y2 <- y
y2[y == 0] <- "a"
y2[y == 1] <- "f"

y3 <- performance:::.factor_to_numeric(y2, lowest = 0)

m1 <- mean(x[y == 1])
m0 <- mean(x[y == 0])
sn <- sd(x)
q <- mean(y)
p <- 1 - q

((m1 - m0) / sn) * sqrt(p * q)
#> [1] 0.06151908

Created on 2020-03-23 by the reprex package (v0.3.0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups, that was point biseral...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are:

own_biserial <- function(x, y) {
  cc <- complete.cases(x, y)
  x <- x[cc]
  y <- y[cc]
  
  y <- performance:::.factor_to_numeric(y, lowest = 0)
  
  m1 <- mean(x[y == 1])
  m0 <- mean(x[y == 0])
  sn <- sd(x)
  q <- mean(y)
  p <- 1 - q
  
  zp <- dnorm(qnorm(q))
  
  (((m1 - m0) * (p * q / zp)) / sd(x))
}

set.seed(123)
y <- rbinom(100, 1, .3)
x <- rnorm(100)

own_biserial(x, y)
#> [1] 0.08155037
psych::biserial(x, y)
#>            [,1]
#> [1,] 0.08155037


set.seed(456)
y <- rbinom(100, 1, .3)
x <- rnorm(100)

own_biserial(x, y)
#> [1] 0.02964972
psych::biserial(x, y)
#>            [,1]
#> [1,] 0.02964972

Created on 2020-03-23 by the reprex package (v0.3.0)

@DominiqueMakowski
Copy link
Member Author

figure1

Something like this, it's just for illustrative purposes

@strengejacke strengejacke mentioned this pull request Mar 23, 2020
@strengejacke
Copy link
Member

@DominiqueMakowski See #56

@DominiqueMakowski
Copy link
Member Author

figure1

Alright, what abouuuuut we submit 😁

is it sudden? definitely, but one has to use the motivation when it comes 🤷‍♂

@mattansb
Copy link
Member

mattansb commented Mar 24, 2020 via email

@DominiqueMakowski
Copy link
Member Author

Would make sure to note in the caption that Bayesian is also Pearson (:

Is it tho? Isn't Pearson's correlation by nature frequentist corresponding to a particular formula? Isn't it more like Bayesian pseudo-Pearson 😅

@mattansb
Copy link
Member

Pearson defined the population linear correlation Rho, which both the freq and Bayesian methods try to estimate :)

(While the other methods are estimating something else - close sometimes, but not Rho exactly.)

@DominiqueMakowski
Copy link
Member Author

On a side note, what are your thoughts about standardizing the output name, for now it's sometimes "r", "rho" or "tau". What about naming them all "r"? or "rho"?

@mattansb
Copy link
Member

They're not all estimates of Rho, so that would be misleading. (This is why Spearman isn't just a robust Person!)
I thought the easyverse style guide states that column names should be specific to what they contain (and not go the broom route)? 🤔

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Mar 24, 2020

Oh, I thought that JOSS closed the papers processing but that you could still submit, but in fact they closed the submission form so we'll have to wait a bit until the situation improves 🤞

I thought the easyverse style guide states that column names should be specific to what they contain (and not go the broom route)? 🤔

Right right 😅

@strengejacke
Copy link
Member

although it doesn't seem to impact the results? Why not leave it as is?

  m1 <- mean(var_x[var_y == 1])
  m0 <- mean(var_x[var_y == 0])

@DominiqueMakowski
Copy link
Member Author

right...

@DominiqueMakowski
Copy link
Member Author

that was via voice recognition when I was on my bike

on a home exercise bike you mean 👁

@strengejacke
Copy link
Member

on a home exercise bike you mean 👁

No, I'm not doing any sports (beside driving to work by bike and running up'n'down the stairs to pick up things from the kids they just drop anywhere)

m1 <- mean(var_x[var_y == 1])
m0 <- mean(var_x[var_y == 0])
m1 <- mean(var_x[var_y == unique(var_y)[1]])
m0 <- mean(var_x[var_y == unique(var_y)[2]])
sn <- stats::sd(var_x)
q <- mean(var_y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q <- mean(var_y) computes the proportion, which only works when the values are 0/1...

@DominiqueMakowski
Copy link
Member Author

you forgot gym for fingers via coding in R

if (.vartype(data[[binary]])$is_factor | .vartype(data[[binary]])$is_character) {
data[[binary]] <- as.numeric(as.factor(data[[binary]]))
}
data[[binary]] <- as.vector((data[[binary]] - min(data[[binary]], na.rm = TRUE)) / diff(range(data[[binary]], na.rm = TRUE), na.rm = TRUE))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strengejacke it should be alright coz it's normalized here

# )[1])
m1 <- mean(var_x[var_y == unique(var_y)[2]])
m0 <- mean(var_x[var_y == unique(var_y)[1]])
sn <- stats::sd(var_x)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw @strengejacke do you know what purpose does this sn serve? it doesn't seem to be re-used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewritten the code once or twice, maybe l just forgot to remove it...

@DominiqueMakowski DominiqueMakowski merged commit 99ef225 into master Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants