[SUPPORT] MOR upsert table grows in size when ingesting same records

**Describe the problem you faced**

When we ingest the same records to a MOR table the used disk space grows for each run, even though compaction and cleansing has been enabled.

The normal parquet output for the test data is 8.3M. Hudi table sizes for each run:
- 9.4M
- 51M
- 83M
- 125M
- 157M

**To Reproduce**
write the same DF multiple times:
```scala
df
      .coalesce(1)
      .write
      .format("org.apache.hudi")
      .option("hoodie.insert.shuffle.parallelism", "2")
      .option("hoodie.upsert.shuffle.parallelism", "2")
      .option("hoodie.cleaner.commits.retained", "3")
      .option("hoodie.cleaner.fileversions.retained", "2")
      .option("hoodie.compact.inline", "true")
      .option("hoodie.compact.inline.max.delta.commits", "2")
      .option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY, "true")
      .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
      .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
      .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "some_unique_key")
      .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "date")
      .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, classOf[ComplexKeyGenerator].getName)
      .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "version")
      .option(HoodieWriteConfig.TABLE_NAME, tableName)
      .mode(SaveMode.Append)
      .save("/tmp/test_hudi_mor")
```

**Expected behavior**
The used disk space should stop growing.

**Environment Description**

* Hudi version :
0.5.2

* Spark version :
2.4.4

* Hive version :

* Hadoop version :
2.7
* Storage (HDFS/S3/GCS..) :
local
* Running on Docker? (yes/no) :
no

**Additional context**

Add any other context about the problem here.

**Stacktrace**

```Add the stacktrace of the error.```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] MOR upsert table grows in size when ingesting same records #1625

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SUPPORT] MOR upsert table grows in size when ingesting same records #1625

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions