Skip to content

Conversation

@fernst
Copy link
Contributor

@fernst fernst commented Dec 1, 2021

This prevents the workers from running out of memory for large partitions.

Copy link

@sanjanasandeep sanjanasandeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@snehalakshmisha snehalakshmisha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// Submit a batch to the SQL engine every 10k records
// This is done to reduce memory usage in the worker, as processed records can now be GC'd.
numWrittenRecords++;
if (numWrittenRecords % 10000 == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this 10K configurable? In some rare cases 10K can be too high (E.g. in CDF our internal records are multimeg, customer may have it). Don't need ui, but a runtime argument would be nice

@fernst fernst force-pushed the ensure-batches-every-10k-records branch from df97de6 to 69630a2 Compare February 3, 2022 21:39
@fernst fernst force-pushed the ensure-batches-every-10k-records branch from 69630a2 to 8883bd3 Compare February 3, 2022 22:14
@fernst fernst changed the title Added logic to ensure batches are submitted to the engine every 10k records. Added logic to ensure batches are submitted to the engine every 1k records. Feb 4, 2022
…cords

by default. This is configurable using a pipeline argument.

This prevents the workers from running out of memory for large partitions.
@fernst fernst force-pushed the ensure-batches-every-10k-records branch from 8883bd3 to 4081484 Compare February 4, 2022 00:05
@fernst fernst merged commit 625002e into develop Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants