Truncate API added#5573
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5573 +/- ##
==========================================
- Coverage 66.44% 66.27% -0.18%
==========================================
Files 1075 1123 +48
Lines 54773 57490 +2717
Branches 8168 8609 +441
==========================================
+ Hits 36396 38101 +1705
- Misses 15700 16560 +860
- Partials 2677 2829 +152
Continue to review full report at Codecov.
|
| return new SuccessResponse("Table config updated for " + tableName); | ||
| } | ||
|
|
||
| @POST |
There was a problem hiding this comment.
Can we please add documentation highlighting the differences between this and table drop?
Generally, the standard implemented by other databases has the following distinction:
DROP will delete table data, indexes, metadata etc. So any statement (to fetch data or query metadata) will fail stating "table does not exist"
TRUNCATE will delete table data, indexes, but the metadata will be kept around. So you can still see table's schema, table config etc and the queries will return empty results AFAIK.
Are we following the same distinction? If so, we should update the docs to state when should the user choose DROP vs TRUNCATE.
Also, we should add tests to ensure that both behave expectedly. If TRUNCATE is not expected to delete config and schema, then we should add a test for that.
There was a problem hiding this comment.
+1. drop and truncate behavior should follow the standards.
| @ApiOperation(value = "Truncate table") | ||
| public SuccessResponse truncateTable( | ||
| @ApiParam(value = "Name of the table to update", required = true) @PathParam("tableName") String tableName, | ||
| @ApiParam(value = "realtime|offline") @QueryParam("type") String tableTypeStr) throws Exception { |
There was a problem hiding this comment.
We still have to define what truncate means for realtime. Typically, when you create a realtime table, it immediately starts consumption. So on truncate, after dropping all segments, we should again restart consumption based on configured offset.
That will not happen with these changes. Consumption will be restarted only on the next run of the RealtimeSegmentValidationManager. Which I don't think is an acceptable flow.
There was a problem hiding this comment.
3 options
- this is ok as part of the version
- we can invoke the validation manager at the end of the call
- we can add an separate call to pause/resume Kafka consumption
There was a problem hiding this comment.
Can we take the discussion to the Issue? I believe we need to agree on a spec first before we start publishing PRs.
|
I think this task is still pending discussion about race conditions between this API and segment completion/uploads. The issue has more details |
|
Can we take the conversation to the Issue instead of this PR? |
|
@kishoreg we cannot invoke validation manager unless we get to the lead controller for that particular table. With controller separation using helix controller management, all controllers take part in valdation. etc. |
| @Path("/tables/{tableName}/truncate") | ||
| @Produces(MediaType.APPLICATION_JSON) | ||
| @ApiOperation(value = "Truncate table") | ||
| public SuccessResponse truncateTable( |
There was a problem hiding this comment.
See #5559 (comment)
Let us first agree that adding a new API (that we need to keep backward compatible, and so on) is the best answer to the problem.
| @ApiOperation(value = "Truncate table") | ||
| public SuccessResponse truncateTable( | ||
| @ApiParam(value = "Name of the table to update", required = true) @PathParam("tableName") String tableName, | ||
| @ApiParam(value = "realtime|offline") @QueryParam("type") String tableTypeStr) throws Exception { |
There was a problem hiding this comment.
Can we take the discussion to the Issue? I believe we need to agree on a spec first before we start publishing PRs.
| toggleTableState(tableName, tableType, StateType.DISABLE); | ||
|
|
||
| // Get all segment names for table and delete all segments | ||
| String tableNameWithType = getExistingTableNamesWithType(tableName, tableType).get(0); |
There was a problem hiding this comment.
How about moving line String tableNameWithType = getExistingTableNamesWithType(tableName, tableType).get(0); before the toggle, so that we are validating if table exists before calling disable.
Then you can also directly call _pinotHelixResourceManager.toggleTableState(tableNameWithType, status);
| // Get all segment names for table and delete all segments | ||
| String tableNameWithType = getExistingTableNamesWithType(tableName, tableType).get(0); | ||
| List<String> segmentNames = _pinotHelixResourceManager.getSegmentsFor(tableNameWithType); | ||
| PinotResourceManagerResponse response = _pinotHelixResourceManager.deleteSegments(tableNameWithType, segmentNames); |
There was a problem hiding this comment.
before deleting segments, you need to check that all segments went into OFFLINE state in the external view. Pasting the steps Kishore had suggested in the issue:
disable the table
wait for all segments to go to offline state
delete all the segments
enable the table
call the setup table method
| toggleTableState(tableName, tableType, StateType.ENABLE); | ||
|
|
||
| // Setup table for real-time : (ensureRealtimeClusterIsSetUp) | ||
| if (tableType != TableType.OFFLINE && _pinotHelixResourceManager.hasRealtimeTable(tableName)) { |
There was a problem hiding this comment.
why not just if(tableType == REALTIME) ?
and you already have tableNameWithType variable above, no need to construct the tableName again
| } | ||
| } | ||
|
|
||
| private List<String> getExistingTableNamesWithType(String tableName, @Nullable TableType tableType) { |
There was a problem hiding this comment.
tableType doesn't need annotation Nullable - you're checking for not-null before this call
| @@ -1240,7 +1240,7 @@ private void verifyIndexingConfig(String tableNameWithType, IndexingConfig index | |||
| } | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
Since we made this a public method, can you add some javadocs explaining what this method does?
| } | ||
|
|
||
| /** | ||
| * Truncate table, delete all contents of table without removing table configuration and schema. |
There was a problem hiding this comment.
Let's add here that for Truncate for REALTIME includes creating new CONSUMING segments in segment metadata and ideal state
| @Produces(MediaType.APPLICATION_JSON) | ||
| @ApiOperation(value = "Truncate table") | ||
| public SuccessResponse truncateTable( | ||
| @ApiParam(value = "Name of the table to update", required = true) @PathParam("tableName") String tableName, |
There was a problem hiding this comment.
Name of table to "truncate" (not update)
Let's make "type" also a required param
| } | ||
|
|
||
| @Test | ||
| public void testTruncateTable() throws IOException { |
There was a problem hiding this comment.
we need some more testing
- create a OFFLINE table and schema, add some segments, then call truncate. Check that segments/idealstate/external view are deleted, but the tableconfig and schema are intact
- create a REALTIME table and schema, let it create some CONSUMING segments, then call truncate. Check that table config and schema re intact. Also check that idealstate/external view/segments now have the newly created segments.
Description
#5559: Truncate API Added.
Offline : Disabled Table and deleted all segments
RealTime: Disabled Table and deleted all segments.
@Jackie-Jiang
Implemented as per your comment. (
PinotSegmentRestletResource.deleteAllSegments())