Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: avoid some string format/parse roundtrips #121417

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dt
Copy link
Member

@dt dt commented Mar 30, 2024

Previously we were formatting UUIDs and Timestamps into strings, and allocating those strings, to put them into the column vector only for import's workload reader to parse the strings back into native types as it read the vector, accounting for 5% or more of total CPU usage when IMPORT'ing from workload-generated TPCC data. Keeping these two types in native representations through the vector avoids this.

dt added 3 commits March 30, 2024 01:15
Release note: none.
Epic: none.
@dt dt requested a review from yuzefovich March 30, 2024 17:45
@dt dt requested review from a team as code owners March 30, 2024 17:45
@dt dt requested review from srosenberg and DarrylWong and removed request for a team March 30, 2024 17:45
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt dt removed request for a team, srosenberg and DarrylWong March 31, 2024 12:34
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but there are a couple of test failures.

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt)

@dt
Copy link
Member Author

dt commented Apr 1, 2024

I spent a couple hours on the failures and it is kinda annoying: we don't have the desired sql types in the tests and only have the collapsed colVec types, so there is no way to tell what is a string from bytes from uuids since they are all colvec bytes. In real crdb usage we have the destination type to know how to parse those bytes but we don't in the tests.

I'm gonna put this on ice until after PTO / indefinitely (though it is big chunk of workload import cpu time) until I can figure out these silly tests.

@dt dt marked this pull request as draft April 1, 2024 20:53
@yuzefovich
Copy link
Member

First two commits should pass CI, right? Maybe just merge those two for now then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants