Flink: Compact in sink v2 support Coordinator Lock#15459
Flink: Compact in sink v2 support Coordinator Lock#15459Guosmilesmile wants to merge 4 commits intoapache:mainfrom
Conversation
| Preconditions.checkArgument(lockTime != null, "Lock time is null, Can't release lock"); | ||
| if (lockTime == null) { | ||
| LOG.warn("Lock time is null, Can't release lock"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
When we meet max watermark case, this will error, so open a new pr to deal with it .
#15458
There was a problem hiding this comment.
I've left a comment on that PR.
There was a problem hiding this comment.
Thanks for pointing it out, I have change the way
9e40643 to
b1262dd
Compare
docs/docs/flink-maintenance.md
Outdated
| env.execute("Table Maintenance Job"); | ||
| ``` | ||
|
|
||
| Use Coordinator Lock |
There was a problem hiding this comment.
| Use Coordinator Lock | |
| #### Managing table locking via Flink |
docs/docs/flink-maintenance.md
Outdated
| StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); | ||
|
|
||
| TableLoader tableLoader = TableLoader.fromCatalog( | ||
| CatalogLoader.hive("my_catalog", configuration, properties), | ||
| TableIdentifier.of("database", "table") | ||
| ); | ||
|
|
||
| TableMaintenance.forTable(env, tableLoader) | ||
| .uidSuffix("my-maintenance-job") | ||
| .rateLimit(Duration.ofMinutes(10)) | ||
| .lockCheckDelay(Duration.ofSeconds(10)) | ||
| .add(ExpireSnapshots.builder() | ||
| .scheduleOnCommitCount(10) | ||
| .maxSnapshotAge(Duration.ofMinutes(10)) | ||
| .retainLast(5) | ||
| .deleteBatchSize(5) | ||
| .parallelism(8)) | ||
| .add(RewriteDataFiles.builder() | ||
| .scheduleOnDataFileCount(10) | ||
| .targetFileSizeBytes(128 * 1024 * 1024) | ||
| .partialProgressEnabled(true) | ||
| .partialProgressMaxCommits(10)) | ||
| .append(); | ||
|
|
||
| env.execute("Table Maintenance Job"); |
There was a problem hiding this comment.
Everything except line 184 is identical (no lock parameter). Maybe consolidate the two sections and just explain that the builder can either be
TableMaintenance.forTable(env, tableLoader, lockFactory)
or
TableMaintenance.forTable(env, tableLoader)
There was a problem hiding this comment.
Right, I combine two part code
| Preconditions.checkArgument(lockTime != null, "Lock time is null, Can't release lock"); | ||
| if (lockTime == null) { | ||
| LOG.warn("Lock time is null, Can't release lock"); | ||
| return; | ||
| } |
There was a problem hiding this comment.
I've left a comment on that PR.
|
|
||
| | Key | Description | Default | | ||
| |-----|----------------------|---------| | ||
| | `flink-maintenance.lock.type` | Set to `` or not set | | |
There was a problem hiding this comment.
Would it make sense to have this default to flink? (which will be using the coordinator)
There was a problem hiding this comment.
If the coordinator lock will become the default later and the other lock options are removed, I’d prefer not to require users to set this parameter here, so it’s easier and more consistent with their future usage habits.
e15c09d to
3ae3a07
Compare
3ae3a07 to
7193f0b
Compare
| if (!Watermark.MAX_WATERMARK.equals(mark)) { | ||
| operatorEventGateway.sendEventToCoordinator( | ||
| new LockReleaseEvent(tableName, mark.getTimestamp())); | ||
| } |
There was a problem hiding this comment.
Let's compare the timestamp, not the object.
There was a problem hiding this comment.
Ok,make it compare with timestamp.
| } | ||
|
|
||
| @Test | ||
| @TestTemplate |
There was a problem hiding this comment.
In one of my PRs it was suggested not to use TestTemplate as it was only introduced for backward compatibility reasons.
I was pointed to use @ParameterizedTest for every test instead:
@ParameterizedTest
@FieldSource("FILE_FORMATS")
| + "'flink-maintenance.rewrite.schedule.data-file-size'='1'," | ||
| + "'flink-maintenance.lock-check-delay-seconds'='60'"; | ||
| private static final String TABLE_PROPERTIES_COORDINATOR = | ||
| "'flink-maintenance.lock.jdbc.init-lock-table'='true'," |
|
|
||
| #### Flink-maintained lock | ||
|
|
||
| Maintain the lock within Flink itself. This does not require configuring external systems. One prerequisite is that there are no parallel table maintenance jobs for a given table. |
There was a problem hiding this comment.
Should this be?
The only prerequisite is that there are no parallel table maintenance jobs for a given table.
| jdbcProps // JDBC connection properties | ||
| ); | ||
|
|
||
| // Option 1: With external lock factory |
There was a problem hiding this comment.
Shall we mention that we plan to deprecate Option 1?
Do we plan to do it?
| return Arrays.asList( | ||
| new Object[] {"testhadoop_basenamespace", Namespace.of("l0", "l1"), true}, | ||
| new Object[] {"testhadoop_basenamespace", Namespace.of("l0", "l1"), false}); | ||
| new Object[] {"testhadoop_basenamespace", Namespace.of("l0", "l1"), true, "jdbc"}, |
There was a problem hiding this comment.
Could this be LockConfig.JdbcLockConfig.JDBC?
| sql( | ||
| "INSERT INTO %s /*+ OPTIONS(%s) */ SELECT id,data from sourceTable", | ||
| TABLE_NAME, TABLE_PROPERTIES); | ||
| if (lockType.equals("jdbc")) { |
There was a problem hiding this comment.
Could this be: LockConfig.JdbcLockConfig.JDBC?
We already support the Coordinator Lock, but it hasn’t been introduced into Sink Table Maintenance yet. This PR adds support for configuring the Coordinator Lock in Sink Table Maintenance.