Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sugar function 'trimws' with unit tests (closes #679) #680

Merged
merged 6 commits into from
Apr 23, 2017

Conversation

nathan-russell
Copy link
Contributor

As described in #679, this adds a sugar version of the base R function trimws.

@codecov-io
Copy link

codecov-io commented Apr 22, 2017

Codecov Report

Merging #680 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #680   +/-   ##
=======================================
  Coverage   89.77%   89.77%           
=======================================
  Files          66       66           
  Lines        3511     3511           
=======================================
  Hits         3152     3152           
  Misses        359      359

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7492cff...c56e54b. Read the comment docs.

return str;
}

inline const char* trim_right(const char* str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these functions should also accept the string length as well -- this would help avoid a call to ::strlen() when the length is already known, and it should already be known for R's CHARSXPs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thrilled about using strlen, but I assumed this was unavoidable because that is how the length is determined in the string_proxy class. I'm not terribly familiar with R's internal string cache, but if the lengths of CHARSXPs are known as you suggest, I'd be more than happy to find a way to incorporate that in place of strlen. Can you get back to me on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can simply call Rf_length() on a CHARSXP to retrieve its length, as the length will be stored in the SEXP header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it looks like you've effectively done that now (with the call to LENGTH()). Great!

}

inline const char* trim_right(const char* str) {
static std::string buff;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the use of a static buffer here implies that we would effectively hold on to a (potentially large) string in memory each time this function is called (and that memory is never released). Is there any chance we could avoid this?

In theory, it implies someone calling this with a very large string would end up 'leaking' memory (or, I guess more aptly, letting a large bit of memory remain allocated until R exits)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point here. I was primarily looking to avoid repeatedly allocating and deallocating memory for a new std::string on each function call, which I assumed would happen if had used a buffer with automatic storage duration. However, I think I can just create a (non-static) variable in the enclosing function (trimws) and pass it by reference (or rather, as a pointer, per one of your previous reviews) to trim_right and trim_both so that the memory gets released when trimws returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility -- this function could instead return a pair of pointers (the new 'start' and 'end' of the string), and the calling function can decide how to use those pointers (e.g. duplicate into some new memory, or just use as-is if were appropriate).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of a std::string* passed as a pointer in the latest commits looks fine to me as well.

}

R_xlen_t i = 0, sz = x.size();
Vector<STRSXP> res(sz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider using Vector<STRSXP> res = no_init(sz); to avoid an unneeded initial allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- I'll fix this too.

if (traits::is_na<STRSXP>(x[i])) {
res[i] = x[i];
} else {
res[i] = (*trim)(x[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if we could avoid calling through a function pointer here (just to avoid the extra indirection), although not strictly necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I had done this just to keep the body more concise, but in hindsight I think a few extra lines of code are preferable to the extra cost of dereferencing (assuming these aren't inlined).

@kevinushey
Copy link
Contributor

A couple mostly minor comments but LGTM!

Copy link
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kevinushey
Copy link
Contributor

LGTM as well!

@eddelbuettel eddelbuettel merged commit e81493b into RcppCore:master Apr 23, 2017
@nathan-russell nathan-russell deleted the feature/sugar-trimws branch April 23, 2017 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants