-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Hi,
-
Hudi version :0.11.1
-
Spark version :3.2.1
-
Hive version : NA
-
Hadoop version : NA
-
Storage (HDFS/S3/GCS..) :S3
-
Running on Docker? (yes/no) : no
We have spark streaming application running with batch interval of 5 min. We added below configs to avoid small file creation.
HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT.key() -> String.valueOf(104857600)
HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key() -> String.valueOf(125829120)
However when i run my application i see my parquet file are created with lesser than the mentioned small file limit.
here is the complete hudi config we are using in application.
HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT.key() -> String.valueOf(104857600),
HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key() -> String.valueOf(125829120),
HoodieCompactionConfig.INLINE_COMPACT_TRIGGER_STRATEGY.key() -> CompactionTriggerStrategy.TIME_ELAPSED.name,
HoodieCompactionConfig.INLINE_COMPACT_TIME_DELTA_SECONDS.key() -> String.valueOf(60 * 60),
HoodieCompactionConfig.CLEANER_POLICY.key() -> HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name(),
HoodieCompactionConfig.CLEANER_COMMITS_RETAINED.key() -> "936",
HoodieCompactionConfig.MIN_COMMITS_TO_KEEP.key() -> "937",
HoodieCompactionConfig.MAX_COMMITS_TO_KEEP.key() -> "960",
HoodieCompactionConfig.ASYNC_CLEAN.key() -> "false",
HoodieCompactionConfig.INLINE_COMPACT.key() -> "true",
HoodieMetricsConfig.TURN_METRICS_ON.key() -> "true",
HoodieMetricsConfig.METRICS_REPORTER_TYPE_VALUE.key() -> MetricsReporterType.DATADOG.name(),
HoodieMetricsDatadogConfig.API_SITE_VALUE.key() -> "US",
HoodieMetricsDatadogConfig.METRIC_PREFIX_VALUE.key() -> "tacticalnovusingest.hudi",
HoodieMetricsDatadogConfig.API_KEY_SUPPLIER.key() -> "com.tr.indigo.tacticalnovusingest.utils.DatadogKeySupplier",
HoodieMetadataConfig.ENABLE.key() -> "false",
HoodieWriteConfig.ROLLBACK_USING_MARKERS_ENABLE.key() -> "false",
Parquet files which created are as below.
how can we avoid small file creations?
@koochiswathiTR my teammate in case need more info.
Appreciate all the help you guys do.
Thanks,JK
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
