Skip to content

write.metadata.delete-after-commit.enabled not deleting old metadata #2931

@ayush-san

Description

@ayush-san

After reading this mail archive - https://www.mail-archive.com/dev@iceberg.apache.org/msg01416.html, I had turned on write.metadata.delete-after-commit.enabled for my tables in which data is written by a streaming job.

ALTER TABLE hive.db_name.table_name SET TBLPROPERTIES ('write.metadata.delete-after-commit.enabled'='true')

But even after enabling it, S3 size of the table is still very large as it contain all metadata files. Do I need to enable some other property too?

image

Latest metadata file content

{
  "format-version" : 2,
  "table-uuid" : "7e6356d7-2e6e-40f0-a462-073b4cbd40fc",
  "location" : "S3_LOCATION",
  "last-sequence-number" : 4072,
  "last-updated-ms" : 1627992967421,
  "last-column-id" : 16,
  "schema" : {
    "type" : "struct",
    "fields" : [...]
  },
  "default-spec-id" : 0,
  "partition-specs" : [ {
    "spec-id" : 0,
    "fields" : [ ]
  } ],
  "default-sort-order-id" : 0,
  "sort-orders" : [ {
    "order-id" : 0,
    "fields" : [ ]
  } ],
  "properties" : {
    "engine.hive.enabled" : "true",
    "write.format.default" : "parquet",
    "write.parquet.compression-codec" : "snappy",
    "write.metadata.delete-after-commit.enabled" : "true"
  },
  "current-snapshot-id" : 6552119266625920959,
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions