New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sugar function 'trimws' with unit tests (closes #679) #680

Merged
merged 6 commits into from Apr 23, 2017

Conversation

Projects
None yet
4 participants
@nathan-russell
Contributor

nathan-russell commented Apr 22, 2017

As described in #679, this adds a sugar version of the base R function trimws.

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Apr 22, 2017

Codecov Report

Merging #680 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #680   +/-   ##
=======================================
  Coverage   89.77%   89.77%           
=======================================
  Files          66       66           
  Lines        3511     3511           
=======================================
  Hits         3152     3152           
  Misses        359      359

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7492cff...c56e54b. Read the comment docs.

codecov-io commented Apr 22, 2017

Codecov Report

Merging #680 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #680   +/-   ##
=======================================
  Coverage   89.77%   89.77%           
=======================================
  Files          66       66           
  Lines        3511     3511           
=======================================
  Hits         3152     3152           
  Misses        359      359

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7492cff...c56e54b. Read the comment docs.

Show outdated Hide outdated inst/include/Rcpp/sugar/functions/strings/trimws.h
return str;
}
inline const char* trim_right(const char* str) {

This comment has been minimized.

@kevinushey

kevinushey Apr 23, 2017

Contributor

It seems like these functions should also accept the string length as well -- this would help avoid a call to ::strlen() when the length is already known, and it should already be known for R's CHARSXPs.

@kevinushey

kevinushey Apr 23, 2017

Contributor

It seems like these functions should also accept the string length as well -- this would help avoid a call to ::strlen() when the length is already known, and it should already be known for R's CHARSXPs.

This comment has been minimized.

@nathan-russell

nathan-russell Apr 23, 2017

Contributor

I wasn't thrilled about using strlen, but I assumed this was unavoidable because that is how the length is determined in the string_proxy class. I'm not terribly familiar with R's internal string cache, but if the lengths of CHARSXPs are known as you suggest, I'd be more than happy to find a way to incorporate that in place of strlen. Can you get back to me on this?

@nathan-russell

nathan-russell Apr 23, 2017

Contributor

I wasn't thrilled about using strlen, but I assumed this was unavoidable because that is how the length is determined in the string_proxy class. I'm not terribly familiar with R's internal string cache, but if the lengths of CHARSXPs are known as you suggest, I'd be more than happy to find a way to incorporate that in place of strlen. Can you get back to me on this?

This comment has been minimized.

@kevinushey

kevinushey Apr 23, 2017

Contributor

I believe you can simply call Rf_length() on a CHARSXP to retrieve its length, as the length will be stored in the SEXP header.

@kevinushey

kevinushey Apr 23, 2017

Contributor

I believe you can simply call Rf_length() on a CHARSXP to retrieve its length, as the length will be stored in the SEXP header.

This comment has been minimized.

@kevinushey

kevinushey Apr 23, 2017

Contributor

Ah, it looks like you've effectively done that now (with the call to LENGTH()). Great!

@kevinushey

kevinushey Apr 23, 2017

Contributor

Ah, it looks like you've effectively done that now (with the call to LENGTH()). Great!

Show outdated Hide outdated inst/include/Rcpp/sugar/functions/strings/trimws.h
}
inline const char* trim_right(const char* str) {
static std::string buff;

This comment has been minimized.

@kevinushey

kevinushey Apr 23, 2017

Contributor

IIUC, the use of a static buffer here implies that we would effectively hold on to a (potentially large) string in memory each time this function is called (and that memory is never released). Is there any chance we could avoid this?

In theory, it implies someone calling this with a very large string would end up 'leaking' memory (or, I guess more aptly, letting a large bit of memory remain allocated until R exits)

@kevinushey

kevinushey Apr 23, 2017

Contributor

IIUC, the use of a static buffer here implies that we would effectively hold on to a (potentially large) string in memory each time this function is called (and that memory is never released). Is there any chance we could avoid this?

In theory, it implies someone calling this with a very large string would end up 'leaking' memory (or, I guess more aptly, letting a large bit of memory remain allocated until R exits)

This comment has been minimized.

@nathan-russell

nathan-russell Apr 23, 2017

Contributor

You make a good point here. I was primarily looking to avoid repeatedly allocating and deallocating memory for a new std::string on each function call, which I assumed would happen if had used a buffer with automatic storage duration. However, I think I can just create a (non-static) variable in the enclosing function (trimws) and pass it by reference (or rather, as a pointer, per one of your previous reviews) to trim_right and trim_both so that the memory gets released when trimws returns.

@nathan-russell

nathan-russell Apr 23, 2017

Contributor

You make a good point here. I was primarily looking to avoid repeatedly allocating and deallocating memory for a new std::string on each function call, which I assumed would happen if had used a buffer with automatic storage duration. However, I think I can just create a (non-static) variable in the enclosing function (trimws) and pass it by reference (or rather, as a pointer, per one of your previous reviews) to trim_right and trim_both so that the memory gets released when trimws returns.

This comment has been minimized.

@kevinushey

kevinushey Apr 23, 2017

Contributor

One possibility -- this function could instead return a pair of pointers (the new 'start' and 'end' of the string), and the calling function can decide how to use those pointers (e.g. duplicate into some new memory, or just use as-is if were appropriate).

@kevinushey

kevinushey Apr 23, 2017

Contributor

One possibility -- this function could instead return a pair of pointers (the new 'start' and 'end' of the string), and the calling function can decide how to use those pointers (e.g. duplicate into some new memory, or just use as-is if were appropriate).

This comment has been minimized.

@kevinushey

kevinushey Apr 23, 2017

Contributor

The use of a std::string* passed as a pointer in the latest commits looks fine to me as well.

@kevinushey

kevinushey Apr 23, 2017

Contributor

The use of a std::string* passed as a pointer in the latest commits looks fine to me as well.

Show outdated Hide outdated inst/include/Rcpp/sugar/functions/strings/trimws.h
}
R_xlen_t i = 0, sz = x.size();
Vector<STRSXP> res(sz);

This comment has been minimized.

@kevinushey

kevinushey Apr 23, 2017

Contributor

You might consider using Vector<STRSXP> res = no_init(sz); to avoid an unneeded initial allocation.

@kevinushey

kevinushey Apr 23, 2017

Contributor

You might consider using Vector<STRSXP> res = no_init(sz); to avoid an unneeded initial allocation.

This comment has been minimized.

@nathan-russell

nathan-russell Apr 23, 2017

Contributor

Good catch -- I'll fix this too.

@nathan-russell

nathan-russell Apr 23, 2017

Contributor

Good catch -- I'll fix this too.

Show outdated Hide outdated inst/include/Rcpp/sugar/functions/strings/trimws.h
if (traits::is_na<STRSXP>(x[i])) {
res[i] = x[i];
} else {
res[i] = (*trim)(x[i]);

This comment has been minimized.

@kevinushey

kevinushey Apr 23, 2017

Contributor

It'd be nice if we could avoid calling through a function pointer here (just to avoid the extra indirection), although not strictly necessary.

@kevinushey

kevinushey Apr 23, 2017

Contributor

It'd be nice if we could avoid calling through a function pointer here (just to avoid the extra indirection), although not strictly necessary.

This comment has been minimized.

@nathan-russell

nathan-russell Apr 23, 2017

Contributor

That's fair. I had done this just to keep the body more concise, but in hindsight I think a few extra lines of code are preferable to the extra cost of dereferencing (assuming these aren't inlined).

@nathan-russell

nathan-russell Apr 23, 2017

Contributor

That's fair. I had done this just to keep the body more concise, but in hindsight I think a few extra lines of code are preferable to the extra cost of dereferencing (assuming these aren't inlined).

@kevinushey

This comment has been minimized.

Show comment
Hide comment
@kevinushey

kevinushey Apr 23, 2017

Contributor

A couple mostly minor comments but LGTM!

Contributor

kevinushey commented Apr 23, 2017

A couple mostly minor comments but LGTM!

@eddelbuettel

LGTM!

@kevinushey

This comment has been minimized.

Show comment
Hide comment
@kevinushey

kevinushey Apr 23, 2017

Contributor

LGTM as well!

Contributor

kevinushey commented Apr 23, 2017

LGTM as well!

@eddelbuettel eddelbuettel merged commit e81493b into RcppCore:master Apr 23, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nathan-russell nathan-russell deleted the nathan-russell:feature/sugar-trimws branch Apr 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment