Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48114][CORE] Precompile template regex to avoid unnecessary work #46365

Conversation

vladimirg-db
Copy link
Contributor

@vladimirg-db vladimirg-db commented May 3, 2024

What changes were proposed in this pull request?

Error message template regex is now precompiled to avoid unnecessary work

Why are the changes needed?

SparkRuntimeException uses SparkThrowableHelper, which uses ErrorClassesJsonReader to create error message string from templates in error-conditions.json, but template regex is compiled on every SparkRuntimeException constructor invocation. This slows down error construction, in particular UnivocityParser + FailureSafeParser, where it's a hot path.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • testOnly org.apache.spark.sql.errors.QueryExecutionErrorsSuite
  • Manually checked csv parsing error

Was this patch authored or co-authored using generative AI tooling?

No

@vladimirg-db vladimirg-db marked this pull request as ready for review May 3, 2024 12:47
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @vladimirg-db .
Merged to master for Apache Spark 4.0.0-preview.

@vladimirg-db vladimirg-db deleted the vladimirg-db/precompile-regexes-in-error-classes-json-reader branch May 4, 2024 09:34
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?
Error message template regex is now precompiled to avoid unnecessary work

### Why are the changes needed?
`SparkRuntimeException` uses `SparkThrowableHelper`, which uses `ErrorClassesJsonReader` to create error message string from templates in `error-conditions.json`, but template regex is compiled on every `SparkRuntimeException` constructor invocation. This slows down error construction, in particular `UnivocityParser` + `FailureSafeParser`, where it's a hot path.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- `testOnly org.apache.spark.sql.errors.QueryExecutionErrorsSuite`
- Manually checked csv parsing error

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46365 from vladimirg-db/vladimirg-db/precompile-regexes-in-error-classes-json-reader.

Authored-by: Vladimir Golubev <vladimir.golubev@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants