Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a check to enable size based threshold for realtime tables #12016

Merged
merged 5 commits into from Nov 18, 2023

Conversation

soumitra-st
Copy link
Contributor

@soumitra-st soumitra-st commented Nov 16, 2023

As per the Pinot document, to enable the segment size based threshold rows must be set to 0. Adding a check during table creation, if realtime.segment.flush.threshold.segment.size is set, realtime.segment.flush.threshold.rows must be set to 0.

Added a test, and also ran below curl command to ensure if the table config has invalid combination, then the API fails:

% curl -X POST http://localhost:9000/tables -H 'accept: application/json' -H 'Content-Type: application/json' -d @/Users/soumitra/pinot-tutorial/transcript/transcript-table-realtime.json
{"code":400,"error":"Invalid config: realtime.segment.flush.threshold.rows=50000, it must be set to 0 for size based segment threshold to work."}

backward-incompat

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (244c947) 61.69% compared to head (eff600f) 27.57%.

Files Patch % Lines
...he/pinot/segment/local/utils/TableConfigUtils.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #12016       +/-   ##
=============================================
- Coverage     61.69%   27.57%   -34.12%     
  Complexity      207      207               
=============================================
  Files          2385     2385               
  Lines        129352   129357        +5     
  Branches      20029    20032        +3     
=============================================
- Hits          79799    35669    -44130     
- Misses        43731    90998    +47267     
+ Partials       5822     2690     -3132     
Flag Coverage Δ
custom-integration1 ?
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 0.00% <0.00%> (-61.61%) ⬇️
java-21 27.57% <80.00%> (-33.97%) ⬇️
skip-bytebuffers-false 27.56% <80.00%> (-34.10%) ⬇️
skip-bytebuffers-true 27.56% <80.00%> (-33.93%) ⬇️
temurin 27.57% <80.00%> (-34.12%) ⬇️
unittests 27.56% <80.00%> (-34.12%) ⬇️
unittests1 ?
unittests2 27.56% <80.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 290 to 297
streamConfigMap.put(StreamConfigProperties.SEGMENT_FLUSH_THRESHOLD_ROWS, "1000000");
streamConfigMap.put(StreamConfigProperties.SEGMENT_FLUSH_THRESHOLD_SEGMENT_SIZE, "100M");
try {
new StreamConfig(tableName, streamConfigMap);
fail("Invalid config: flush threshold rows must be 0, when flush threshold size is set.");
} catch (Exception e) {
// Expected
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure this is what the document intended to mean
what i read was:

You can pick the appropriate value for segment size and number of hours in the table config, and set the number of rows to zero. Note that you don't have to pick values exactly as given in each of these combinations (they are calculated guesses anyway).

it wouldn't throw exception if num rows is set to non-zero.

my concern is that this will cause issue during cluster upgrade. if any user have both configuration set table will be unable to load in an error state .

@soumitra-st soumitra-st marked this pull request as draft November 17, 2023 02:05
@@ -157,7 +157,8 @@ public static void validate(TableConfig tableConfig, @Nullable Schema schema, @N
// Validate that StreamConfig can be created
streamConfig = new StreamConfig(tableConfig.getTableName(), streamConfigMap);
} catch (Exception e) {
throw new IllegalStateException("Could not create StreamConfig using the streamConfig map", e);
throw new IllegalStateException("Could not create StreamConfig using the streamConfig map: " + e.getMessage(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already included the exception, do not add message again

@@ -172,7 +172,7 @@ public StreamConfig(String tableNameWithType, Map<String, String> streamConfigMa

_flushThresholdRows = extractFlushThresholdRows(streamConfigMap);
_flushThresholdTimeMillis = extractFlushThresholdTimeMillis(streamConfigMap);
_flushThresholdSegmentSizeBytes = extractFlushThresholdSegmentSize(streamConfigMap);
_flushThresholdSegmentSizeBytes = extractFlushThresholdSegmentSize(streamConfigMap, _flushThresholdRows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add check within the constructor because it will break existing table configs. For new checks we want to enforce, add it into TableConfigUtils.validate()

@Jackie-Jiang Jackie-Jiang added the user-experience Related to user experience label Nov 17, 2023
…ment.flush.threshold.rows must be set to 0 for size base flush threshold updater to be active.
…reation only. Existing tables will continue to work.
@soumitra-st soumitra-st marked this pull request as ready for review November 18, 2023 19:06
@Jackie-Jiang Jackie-Jiang merged commit 6aecd41 into apache:master Nov 18, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user-experience Related to user experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants