-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sugar function 'trimws' with unit tests (closes #679) #680
Sugar function 'trimws' with unit tests (closes #679) #680
Conversation
Codecov Report
@@ Coverage Diff @@
## master #680 +/- ##
=======================================
Coverage 89.77% 89.77%
=======================================
Files 66 66
Lines 3511 3511
=======================================
Hits 3152 3152
Misses 359 359 Continue to review full report at Codecov.
|
return str; | ||
} | ||
|
||
inline const char* trim_right(const char* str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like these functions should also accept the string length as well -- this would help avoid a call to ::strlen()
when the length is already known, and it should already be known for R's CHARSXP
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't thrilled about using strlen
, but I assumed this was unavoidable because that is how the length is determined in the string_proxy
class. I'm not terribly familiar with R's internal string cache, but if the lengths of CHARSXP
s are known as you suggest, I'd be more than happy to find a way to incorporate that in place of strlen
. Can you get back to me on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can simply call Rf_length()
on a CHARSXP
to retrieve its length, as the length will be stored in the SEXP
header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it looks like you've effectively done that now (with the call to LENGTH()
). Great!
} | ||
|
||
inline const char* trim_right(const char* str) { | ||
static std::string buff; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the use of a static buffer here implies that we would effectively hold on to a (potentially large) string in memory each time this function is called (and that memory is never released). Is there any chance we could avoid this?
In theory, it implies someone calling this with a very large string would end up 'leaking' memory (or, I guess more aptly, letting a large bit of memory remain allocated until R exits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a good point here. I was primarily looking to avoid repeatedly allocating and deallocating memory for a new std::string
on each function call, which I assumed would happen if had used a buffer with automatic storage duration. However, I think I can just create a (non-static
) variable in the enclosing function (trimws
) and pass it by reference (or rather, as a pointer, per one of your previous reviews) to trim_right
and trim_both
so that the memory gets released when trimws
returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possibility -- this function could instead return a pair of pointers (the new 'start' and 'end' of the string), and the calling function can decide how to use those pointers (e.g. duplicate into some new memory, or just use as-is if were appropriate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a std::string*
passed as a pointer in the latest commits looks fine to me as well.
} | ||
|
||
R_xlen_t i = 0, sz = x.size(); | ||
Vector<STRSXP> res(sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might consider using Vector<STRSXP> res = no_init(sz);
to avoid an unneeded initial allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch -- I'll fix this too.
if (traits::is_na<STRSXP>(x[i])) { | ||
res[i] = x[i]; | ||
} else { | ||
res[i] = (*trim)(x[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we could avoid calling through a function pointer here (just to avoid the extra indirection), although not strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I had done this just to keep the body more concise, but in hindsight I think a few extra lines of code are preferable to the extra cost of dereferencing (assuming these aren't inlined).
A couple mostly minor comments but LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
LGTM as well! |
As described in #679, this adds a sugar version of the base R function
trimws
.