-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sugar median function #424
Comments
I like I do recall from those discussions, and from having looked at the stats::median code that there was quite a bit of detail to cover. Or am I now confusing median with running median? In any event, it would be nice to have something in sugar, and it would be equally nice it was rigorously tested against what base R has. As we have good reason to believe that is once again a reference implementation. Oh, and what you have above for logicals may be bug-report worthy. |
Thanks for the feedback Dirk. And I will look into submitting something on the R Bugs tracker. I can't imagine that behavior is intended... |
I think static (ie frozen data) |
That sounds good - but I was more referring to the two ordinary median implementations I linked to here. Would you rather go with the version that has the option of dropping template <int RTYPE, bool NA, typename T>
inline typename sugar::median_detail::result<RTYPE>::type
median(const Rcpp::VectorBase<RTYPE, NA, T>& x, bool na_rm = false) {
switch (static_cast<int>(na_rm)) {
case 0:
return sugar::Median<RTYPE, NA, T, false>(x);
case 1:
return sugar::Median<RTYPE, NA, T, true>(x);
default:
return Rcpp::traits::get_na<RTYPE>();
}
} or the cleaner version which just accepts a single vector argument: template <int RTYPE, bool NA, typename T>
inline sugar::Median<RTYPE, NA, T>
median(const Rcpp::VectorBase<RTYPE, NA, T>& x) {
return sugar::Median<RTYPE, NA, T>(x);
} |
I know, sorry. I don't find the first one that distasteful. And coping with Either one is fine. |
Okay that works for me. |
This can be closed now. Thanks again for the PR! |
In light of a recent Stack Overflow question, I put together a sugar function corresponding to
stats::median
- more specifically,median.default
. At the moment, I have two implementations worked out: one which has an extra boolean argument for specifying whether or not to remove missing values (likemedian.default
'sna.rm
option), and one which does not (it just returnsNA
for input containing one or more missing values). I've posted what are more or less final drafts of these source files here.With the na.rm option
The main benefit of this version is of course having the option to remove missing values - thus more closely conforming to
median.default
. As a result, however, the interface function is a little ugly:The
na.rm
is encoded as a template parameter, and after playing around with some TMP for a while and failing to find a more elegant solution, I settled on theswitch
statement logic above. I'm actually not sure that this is even avoidable, but am certainly open to suggestions if there are better alternatives.Without the na.rm option
While this version obviously lacks some of the above functionality, it is more consistent with similar sugar functions such as
mean
andmax
which do not provide anna.rm
option. Accordingly, the interface is more sugar-esque:General remarks
In the testing I've done so far, both versions seem to mirror
median.default
pretty closely. These are some slight differences that I've come across:Date
andPOSIXt
objects; this is also true of theRcpp::mean
, however.NA
and issues a warning, whereas my versions simply returnNA
. I'm not sure ifmedian.default
was actually intended to work with character vectors; I think it may be the case that odd-length character vectors just happen to work with the other generic functions used inside ofmedian.default
, because the warning that is issued comes frommean.default
, notmedian.default
. I still question whether there are legitimate use cases for callingmedian
on a character vector, but regardless, I can add a warning message is desired.median.default
has a very, very strange way of handling logical vectors, sometimes returning a logical, sometimes an integer, and sometimes a floating point number. I don't think it's possible to replicate this from C++, nor do I think it's desirable:This may be the quirkiest aspect of R I've come across to date.
The only other thing I want to address is the handling of (odd-length) character vectors, which is actually what caused me to bump into the issue with
std::sort
in #419. TheINTSXP
,REALSXP
, andCPLXSXP
versions ofmedian
utilizestd::nth_element
andstd::max_element
for a more efficient calculation. Sincenth_element
won't work with thestring_proxy
class, I just fell back to theVector::sort
member function for a less efficient, but safer, approach to getting the median value. But again, I can't imagine there are many warranted uses ofmedian
for character vectors, so hopefully no one will be too put off by the less efficient full sort used in theSTRSXP
specializations.Any thoughts or concerns? (Apologies for the essay.)
The text was updated successfully, but these errors were encountered: