-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quantilerank and percentilerank functions #741
Add quantilerank and percentilerank functions #741
Conversation
I have been looking up bibliographic references for I did a brief reading of the quantile reference article, found it a bit advanced for me, and couldn't see a direct relationship between the As it has been difficult to find references in the documentation of other languages, it is up to you to keep or not the construction of the functions with the 3 methods, Also, if you find that it's not worth implementing the function, please know that I have no problem closing this PR. |
These things are really tricky, but looking at the Wikipedia table we can see that the R-7 definition, which is used by Excel's |
I checked the relationship of So I proceeded with the implementation of the different methods using the |
Thanks! Let's get the |
the question arose whether or not the array needs to be sorted and looking for the libreoffice implementation, I discovered that is necessary. I saw in the quantile code tat there is a specific sort used there. So, since we need to sort the vector in some methods, could you please help us find the best recommended sort for the task? I don't have a CS background, so I'm not the best person to answer that problem. I made some benchmark tests and saw that if I use This sort question I'm up to you. whatever you suggest, we do. |
Sorting is not necessary. What you need to do is:
(I have written this without looking at @nalimilan code not to get influenced by it but it seems it is the same). Then you should be able to compute all required variants using You have not provided the definitions you use in the docstring (you gave only references and I do not have enough time currently to digest them - sorry for this) but there should be a natural mapping. E.g. for
|
@bkamins , I used the for-loop you wrote, it worked perfectly and I also adapted all the other methods using the 4 variables generated in the for-loop. All tests passed, after the changes made in commit dca02ea. Tks. I understand that an important point that remains to be resolved is to find a way to incorporate the use of the |
I have written my implementation to work on any iterable. It does not rely on the fact that what you pass is a vector. So it can be just dropped. The only consideration is (but we could potentially add it later, cc @nalimilan) if we want |
Yes. To ensure that all the methods are right, I reproduced on the tests the same examples of the docs of each method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us wait for @nalimilan approval before merging
2e2ba27
to
3e30dd1
Compare
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bearing with us @AugustoCL! The road was long but this is a very nice addition to StatsBase! I believe we now have one of the most complete and best documented implementations around.
I've made a few direct edits to the docstrings, let me know whether that's OK.
I like those edits in docstrings. It's very good to me.
You're welcome. I also must thank you for guiding me on this path. I did the best I could for my level and learned a lot on this journey.
I wondered if these functions were relevant enough to be accepted in a base package like StatsBase.jl. So knowing that makes me proud of this work. Thanks for all. |
Just for curiosity, is there an estimated date for the merge? |
For now we are waiting for a few days to hear if other people have comments on the PR. |
yes. So I'll keep an eye out until this friday. |
First ping🏓 |
Let's merge, it sounds unlikely that somebody new is going to comment at this point! |
This PR continue the discussion of the merged/closed PR #733.
So, It has the code updated, having:
quantilerank
andpercentilerank
) as combined in the previous discussion.:rank
according to this book reference.