New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up parseDateTime[InJodaSyntax]Or(Null|Zero)
on invalid inputs
#62634
Speed up parseDateTime[InJodaSyntax]Or(Null|Zero)
on invalid inputs
#62634
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Thanks, I'll check, but please expect a bit of delay as I am working in the next 1.5 weeks from a remote location with only random access to my laptop. |
This is an automated comment for commit b6af948 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
parseDateTime[InJodaSyntax]Or(Null|Zero)
on invalid inputs
@liuneng1994 Thanks a lot. I fixed a few things:
|
The only way to deal with that is to reduce the dataset size... I also did that (1 mio rows --> 100k rows). |
4e881be
This is a re-implementation of #62538 based on #62538 (comment)
parseDateTime
previously always threw an exception when there was a parse error. This error was handled in the top-level loop and rethrown (for standardparseDateTime
) or suppressed (*OrNull/*OrZero
variants). With this PR, we no longer throw internal exceptions, instead faults are communicated back as error code.In my local measurements, runtime to parse 100k invalid date/time strings (
max_threads = 1
) goes down from 2.1 sec to 0.0015 sec.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Functions
parseDateTimeOrNull
,parseDateTimeOrZero
,parseDateTimeInJodaSyntaxOrNull
andparseDateTimeInJodaSyntaxOrZero
now run significantly faster (10x - 1000x) when the input contains mostly non-parseable values.Documentation entry for user-facing changes