Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31573][R] Apply fixed=TRUE as appropriate to regex usage in R #28367

Closed
wants to merge 4 commits into from

Conversation

MichaelChirico
Copy link
Contributor

What changes were proposed in this pull request?

For regex functions in base R (gsub, grep, grepl, strsplit, gregexpr), supplying the fixed=TRUE option will be more performant.

Why are the changes needed?

This is a minor fix for performance

Does this PR introduce any user-facing change?

No (although some internal code was applying fixed-as-regex in some cases that could technically have been over-broad and caught unintended patterns)

How was this patch tested?

Not

@HyukjinKwon
Copy link
Member

ok to test

@@ -606,7 +606,7 @@ getClientModeSparkSubmitOpts <- function(submitOps, sparkEnvirMap) {
# process only if --option is not already specified
if (!is.null(opsValue) &&
nchar(opsValue) > 1 &&
!grepl(sparkConfToSubmitOps[[conf]], submitOps)) {
!grepl(sparkConfToSubmitOps[[conf]], submitOps, fixed = TRUE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could actually be a bug fix too for very, very unlikely corner cases such as spark.driver.memory <> sparkAdriverXmemory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly. there were a few other like this in utils.R e.g.

grep("org.apache.spark.sql.streaming.StreamingQueryException: "

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

@HyukjinKwon HyukjinKwon changed the title [SPARK-31573][R] apply fixed=TRUE as appropriate to regex usage in R [SPARK-31573][R] Apply fixed=TRUE as appropriate to regex usage in R Apr 27, 2020
@SparkQA
Copy link

SparkQA commented Apr 27, 2020

Test build #121894 has finished for PR 28367 at commit 8676253.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

R/pkg/R/SQLContext.R Outdated Show resolved Hide resolved
@SparkQA
Copy link

SparkQA commented Apr 27, 2020

Test build #121912 has finished for PR 28367 at commit 14652d1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon reopened this Apr 28, 2020
@HyukjinKwon
Copy link
Member

Let me rerun the AppVeyor build to make sure.

@MichaelChirico
Copy link
Contributor Author

same transient failure for linter here...

@HyukjinKwon
Copy link
Member

Let's see if AppVeyor tests pass.

@SparkQA
Copy link

SparkQA commented Apr 28, 2020

Test build #121955 has finished for PR 28367 at commit cb6d263.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

HyukjinKwon pushed a commit that referenced this pull request Apr 28, 2020
### What changes were proposed in this pull request?

For regex functions in base R (`gsub`, `grep`, `grepl`, `strsplit`, `gregexpr`), supplying the `fixed=TRUE` option will be more performant.

### Why are the changes needed?

This is a minor fix for performance

### Does this PR introduce any user-facing change?

No (although some internal code was applying fixed-as-regex in some cases that could technically have been over-broad and caught unintended patterns)

### How was this patch tested?

Not

Closes #28367 from MichaelChirico/r-regex-fixed.

Authored-by: Michael Chirico <michael.chirico@grabtaxi.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit c011502)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member

Merged to master and branch-3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants