Flink: SQL: Make Dynamic sink options to be configurable in SQL#15780
Flink: SQL: Make Dynamic sink options to be configurable in SQL#15780swapna267 wants to merge 2 commits intoapache:mainfrom
Conversation
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.iceberg.flink; |
There was a problem hiding this comment.
Should probably be in the dynamic package. Or should we create a config package?
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.iceberg.flink; |
| FlinkDynamicSinkConf flinkDynamicSinkConf = | ||
| new FlinkDynamicSinkConf(writeProperties, flinkConfig); |
There was a problem hiding this comment.
Can we directly pass FlinkDynamicSinkConf to the constructor?
| writeOptions.put( | ||
| FlinkDynamicSinkOptions.IMMEDIATE_TABLE_UPDATE.key(), | ||
| Boolean.toString(newImmediateUpdate)); |
There was a problem hiding this comment.
I'm not sure this should go into WriteOptions. I think it is better to have a separate config for DynamicSink options.
There was a problem hiding this comment.
With all of them written into WriteOptions, it is easier to pass these configs from SQL by using setAll(Map<String, String> properties) for DynamicIcebergSink initialization.
If we separate them, we either need to handle it in setAll or upstream users need to provide them separately.
As Dynamic sink configs are scoped with prefix dynamic-sink , should be ok to go in same map ?
| FlinkDynamicSinkConf flinkDynamicSinkConf = | ||
| new FlinkDynamicSinkConf(writeProperties, flinkConfig); |
There was a problem hiding this comment.
Could we create the config only once and pass it to the constructor?
There was a problem hiding this comment.
Did and removed for consistency.
DynamicRecordProcessor needs FlinkDynamicSinkConf and also writeProperties/flinkConfig.
WriteProperties and FlinkConfig are required to create FlinkWriteConf in Open as it's not serializable.
May be, I can simply pass FlinkDynamicSinkConf also along with writeProperties/flinkConfig.
| if (super.writeParallelism() != Integer.MAX_VALUE) { | ||
| return super.writeParallelism(); | ||
| } |
There was a problem hiding this comment.
Not sure about this logic. The default for writeParallelism is 0.
There was a problem hiding this comment.
DynamicRecord constructor has writeParallelism as primitive int. And we basically are using Integer.MAX_VALUE to fall back to other value like job parallelism.
* @param writeParallelism The number of parallel writers. Can be set to any value {@literal > 0},
* but will always be automatically capped by the maximum write parallelism, which is the
* parallelism of the sink. Set to Integer.MAX_VALUE for always using the maximum available
* write parallelism.
There was a problem hiding this comment.
we have similar issue with upsertMode as it's using primitive boolean.
| * have fields set. | ||
| */ | ||
| @Internal | ||
| class DynamicRecordWithDefaults extends DynamicRecord { |
There was a problem hiding this comment.
| class DynamicRecordWithDefaults extends DynamicRecord { | |
| class DynamicRecordWithConfig extends DynamicRecord { |
Should we add a test to verify config handling?
Support following configs to be configurable from SQL for dynamic sink.
Fallback to writeproperties or Flink configuration if following are not set on DynamicRecord,
writeParallelism(int) → FlinkWriteOptions.WRITE_PARALLELISM
distributionMode -> FlinkWriteOptions.DISTRIBUTION_MODE
toBranch(String) → FlinkWriteOptions.BRANCH
Provide options to configure following behavior of Dynamic Sink in SQl
cacheMaxSize(int)
immediateTableUpdate(boolean)
dropUnusedColumns(boolean)
cacheRefreshMs(long)
inputSchemasPerTableCacheMaxSize(int)
caseSensitive(boolean)
More context here, #15471 (comment)