Fix performance regression in split by avoiding allocating substring per char by JoshRosen · Pull Request #237 · databricks/sjsonnet

JoshRosen · 2024-12-12T09:11:28Z

This PR fixes a performance regression from #227 / 4c85bde which I overlooked in review:

When generalizing the optimized non-Pattern-based split code, that commit introduced a .substring() on each character, producing tons of garbage.

Instead, I think we can do a .startsWith(splitPattern, i): this should be much faster because it will avoid unnecessary garbage string creation (plus I'm pretty sure that startsWith is optimized in modern JDKs).

I also removed the use of breakable and replaced it with an update to the while condition.

JoshRosen added 3 commits December 12, 2024 01:03

fix split perf issue

1eee196

ws

082440a

cleanup import

880552e

JoshRosen changed the title ~~Fix performance regression in splitLimit by avoiding allocating substring per char~~ Fix performance regression in split by avoiding allocating substring per char Dec 12, 2024

stephenamar-db approved these changes Dec 12, 2024

View reviewed changes

stephenamar-db merged commit 680b1a8 into databricks:master Dec 12, 2024

JoshRosen deleted the fix-split-perf-regression branch December 31, 2024 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance regression in split by avoiding allocating substring per char#237

Fix performance regression in split by avoiding allocating substring per char#237
stephenamar-db merged 3 commits intodatabricks:masterfrom
JoshRosen:fix-split-perf-regression

JoshRosen commented Dec 12, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JoshRosen commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JoshRosen commented Dec 12, 2024 •

edited

Loading