Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p_direction: return as 0.5 - 1 float or keep as 50 - 100 percentage? #168

Closed
strengejacke opened this issue Jun 6, 2019 · 21 comments
Closed

Comments

@strengejacke
Copy link
Member

@strengejacke strengejacke commented Jun 6, 2019

Not sure, probably it's too late, but I just realized that pd() returns porportions multiplied by 100, instead, say, .65 for 65%.

Accordingly, convert_pd_to_p() requires a number between 1 and 100, where I would expect a number between 0 and 1 (the convention for the probability, I think we had this discussion with ci_level as well...).

Any thoughts? Should we keep the current behaviour, or try breaking changes to the functions?

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 6, 2019

I remember we already had this discussion about pd at the very beginning 😅 , and I think it makes sense here to break this particular convention and force its report as a percentage. Two main reasons being 1) it's less likely to be confused with the other indices (for which the lower the better) and 2) it adds a level of precision (as people usually report two decimals) which is for pd is particularly relevant.

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 6, 2019

and also it's indeed probably too late :) However, we might maybe emphasize the fact that although it is called probability of direction it is actually usually reported as a percentage. But again the two are the same in the end, and since the print method returns a percentage sign I think it's okay to keep things as they are.

@strengejacke
Copy link
Member Author

@strengejacke strengejacke commented Jun 6, 2019

and since the print method returns a percentage sign I think it's okay to keep things as they are.

I think, the print-method should do all the job of "transformation", while the raw data is a value between 0 and 1. This will most likely be a confusion if users further process the return-results of pd(), or want pipe input into convert_pd_to_p(), when these are the only two functions in the world that use "percentage" values instead of their numerical equivalent (proportion) - that is my concern.

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 6, 2019

but convert_pd_to_p() works currently appropriately (i.e., it takes in percentages)?

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 6, 2019

I think I would prefer to rename the pd as being the percentage of direction than changing the return

@strengejacke
Copy link
Member Author

@strengejacke strengejacke commented Jun 6, 2019

Yes, maybe it's a confusion due to the wording. First, "probability" suggests value between 0 and 1; and second, I can't recall if there are many functions that really take a "percentage" value as input, so typically I would expect and am used to values between 0 to 1 (ratios), rather than "percentages". Even arguments that are labelled as "percentage values" take value between 0 and 1 (see level in confint(), or conf.level in t.test()).

I think these points are responsible for my initial confusion here.

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 6, 2019

Even arguments that are labelled as "percentage values" take value between 0 and 1 (see level in confint(), or conf.level in t.test()).

This was also why originally ci's level took a 0-100 value 😁😅

I understand the issue, but I think that since the probability/percentage of direction is anyway a new thing (to my knowledge it has not yet been formalized as such), people that will use it will know its output, and it's very unlikely that they would have any strong priors as to its return. So I think it's safe...

In general, I am not sure the print methods should do any transformation, as a discrepancy between what is displayed and what is returned is definitely a way to confuse people (I was really annoyed for instance by the fact that BayesFactor displays regular BFs but returns log(BFs), but that's maybe just me :)

@strengejacke
Copy link
Member Author

@strengejacke strengejacke commented Jun 6, 2019

Ok, I think we can close this issue then, maybe we just need some small refining of the docs, but... (see below, last paragraph)

In general, I am not sure the print methods should do any transformation

I think this is one of the main benefits of having print-methods, as long as the conversion is correct (i.e. a proportion of .5 in the data is printed as 50%) I think there's a difference between what you process in further analyses / scripts / whatever (the .5), and the requirements to a "human-readable" output (50%). Mathematically, all computations would proceed with .5, I think. :-)

To be nitpicking, the returned data values (in the data frame or as value) are 80 or 100, and not "80%" or "100%", so formally, the return results from p_direction() are wrong! Except you convert to character and add a percentage-sign to the return values, but then it's difficult to do further calculations. So, currently, the functions do "wrong computations" or have "wrong" labelling / values.

@strengejacke
Copy link
Member Author

@strengejacke strengejacke commented Jun 6, 2019

The more I think about it, the more I tend to revise the code and use "ratios" as values, not integers. We're currently still at a point-of-return, but I guess this chance will be over soon. But I'm open to be convinced otherwise, especially if there are good arguments and/or a majority voting to preserve the current state. :-)

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 6, 2019

it's not integerers it's percentage haha, you're too nitpick! anyway I am traveling today so ill be out for 24h, please refrain from changing until then, I'll think about this while being high (literally high) :)

@strengejacke
Copy link
Member Author

@strengejacke strengejacke commented Jun 6, 2019

I won't revise any code, this is something with consequences that is in the responsibility of the maintainer.

@DominiqueMakowski DominiqueMakowski mentioned this issue Jun 18, 2019
2 of 3 tasks complete
@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 19, 2019

Moreover, there is no risk of confusing the output of the percentage since the pd output is never < 1 (as it is between 50 and 100).

However, I am now wondering about - for consistency and readability - outputting the ROPE Percentage as a percentage...

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 22, 2019

Today I feel a bit less categorical about this (did you cast some dark magic rituals while I was asleep 😮?)

@DominiqueMakowski DominiqueMakowski changed the title proportion vs integer p_direction: return as 0.5 - 1 float or keep as 50 - 100 percentage? Jun 22, 2019
@strengejacke
Copy link
Member Author

@strengejacke strengejacke commented Jun 22, 2019

@pdwaggoner @mattansb @humanfactors @IndrajeetPatil
What are your opinions?

@mattansb
Copy link
Member

@mattansb mattansb commented Jun 22, 2019

Proportion makes more sense to me? But I don't feel strongly one way or the other 😅

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 22, 2019

@humanfactors
Copy link
Member

@humanfactors humanfactors commented Jun 22, 2019

I feel relatively strongly (pd 99% ) that the underlying stored value should be a decimal ratio (e.g., 0.8), but I agree the print message should show as a percentage.

One really really convincing reason to do this is that an individual may want to multiple the pd by another object in a calculation. Currently, the output is wrong: 85% * 200 != 17000. This is a different issue to the Bayes Factor returning log(bf), because the difference is self-explanatory (transparent) as the base type changes. Furthermore, looking at scales::percent, the idea that we would store the underlying percentage as the wrong value would be greatly inconsistent with the rest of the R ecosystem.

This is just incorrect, and we don't want to create functions which create 'invisible' errors like this.

> as.numeric(bayestestR::pd(rnorm(1100,-1, 1)) * 10) 
[1] 847.2727

So I think float makes most sense.

@DominiqueMakowski
Copy link
Member

@DominiqueMakowski DominiqueMakowski commented Jun 23, 2019

Currently, the output is wrong: 85% * 200 != 17000

This is a strong argument indeed.

We can change it for v3. @strengejacke you can sleep easy 😅 I think the main thing it will break is report, but it is currently broken in general

@strengejacke
Copy link
Member Author

@strengejacke strengejacke commented Jun 23, 2019

Not sure, the plot-method in see might be affected, but we can update the packages short after each other, so there won't be any problems for too long.

@strengejacke
Copy link
Member Author

@strengejacke strengejacke commented Jun 23, 2019

v3

I hope you meant 0.3.0 :-P But... you re-opened this pandora's box. :-)

@pdwaggoner
Copy link
Member

@pdwaggoner pdwaggoner commented Jun 23, 2019

Agreed - proportions are typically how I think about this stuff, but like @mattansb , I don't have a strong opinion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants
You can’t perform that action at this time.