Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Optimize row-string converter #68

Merged
merged 1 commit into from
Jul 28, 2023

Conversation

scutzou
Copy link
Contributor

@scutzou scutzou commented Jul 27, 2023

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

When inserting into RS table, we need to convert InternalRow to json or csv string. Previously this was implemented via AbstractRowStringConverter.fromRow. Even though the schema is definite, it still calls the convert method per row.

This PR has been tested on real data of 640 MB of orc data having almost 3 columns and 100 million rows, the Optimized stage with one partition is performed in 3 minutes compare to 9.1 minutes. Here below flame graphs for 120-sec sample of the execution of the test query using Spark 3.3 in local mode.

Before this PR:
graph-1

spark ui
spark-ui-1

After this PR:
graph-2

spark ui
spark-ui-2

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
  • I have added documentation for my new feature or new function

Signed-off-by: ruochenzou <scutzou@outlook.com>
@banmoy banmoy changed the title [Enhancement] optimize row-string converter [Enhancement] Optimize row-string converter Jul 28, 2023
@banmoy
Copy link
Collaborator

banmoy commented Jul 28, 2023

@scutzou Good optimization. Thanks very much

@banmoy banmoy merged commit d4eeac8 into StarRocks:main Jul 28, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants