Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GUC 'gp_random_insert_segments' to control the segments used for random distributed table insertion #406

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

foreyes
Copy link
Collaborator

@foreyes foreyes commented Apr 10, 2024

Introduces the 'gp_random_insert_segments' GUC to reduce the generation of excessive fragmented files during the insertion of small amounts of data into clusters with a large number of segments (e.g., 1000 records into 100 segments).

Fragmented data insertion can significantly degrade performance, especially when using append-optimized or cloud-based storage. By introducing the 'gp_random_insert_segments' GUC, users can limit the number of segments used for data insertion in randomly distributed tables, which can significantly reduce fragmented files.

@foreyes foreyes self-assigned this Apr 10, 2024
@foreyes foreyes force-pushed the dev/limited_insert branch 4 times, most recently from 519d20c to 50efffc Compare April 11, 2024 00:18
…random distributed table insertion

Introduces the 'gp_random_insert_segments' GUC to reduce the generation of
excessive fragmented files during the insertion of small amounts of data into
clusters with a large number of segments (e.g., 1000 records into 100 segments).

Fragmented data insertion can significantly degrade performance, especially
when using append-optimized or cloud-based storage. By introducing
the 'gp_random_insert_segments' GUC, users can limit the number of segments
used for data insertion in randomly distributed tables, which can significantly
reduce fragmented files.
@foreyes
Copy link
Collaborator Author

foreyes commented Apr 17, 2024

Update

  1. disable INSERT with ORCA when this feature is used.
  2. directly change the segments in slice table to avoid dispatch to unrelated segments.

@foreyes foreyes requested a review from my-ship-it April 17, 2024 06:24
@my-ship-it
Copy link
Contributor

Please add test case?

Copy link
Contributor

@my-ship-it my-ship-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@my-ship-it my-ship-it merged commit 143b3df into cloudberrydb:main Apr 18, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants