Support merge manifests on writes (MergeAppend) #363

HonahX · 2024-02-04T08:53:06Z

Add MergeAppendFiles. This PR will enable the following configurations:

commit.manifest-merge.enabled: Controls whether to automatically merge manifests on writes.
commit.manifest.min-count-to-merge: Minimum number of manifests to accumulate before merging.
commit.manifest.target-size-bytes: Target size when merging manifest files.

Since commit.manifest-merge.enabled is default to True, we need to make MergeAppend as the default way to append data to align with the property definition and java implementation

Fokko

Great start @HonahX Maybe we want to see if there are any things we can split out, such as the rolling manifest writer.

pyiceberg/table/__init__.py

Fokko · 2024-02-05T19:58:43Z

pyiceberg/table/__init__.py

+        # TODO: need to re-consider the name here: manifest containing positional deletes and manifest containing deleted entries
+        unmerged_deletes_manifests = [manifest for manifest in existing_manifests if manifest.content == ManifestContent.DELETES]
+
+        data_manifest_merge_manager = ManifestMergeManager(


We're changing the append operation from a fast-append to a regular append when it hits a threshold. I would be more comfortable with keeping the compaction separate. This way we know that an append/overwrite is always fast and in constant time. For example, if you have a process that appends data, you know how fast it will run (actually it is a function of the number of manifests).

Thanks for the explanation! Totally agree! I was thinking it might be a good time to bring FastAppend and MergeAppend to pyiceberg, making them inherit from a _SnapshotProducer

Fokko · 2024-02-08T12:03:14Z

pyiceberg/table/__init__.py

@@ -944,7 +949,8 @@ def append(self, df: pa.Table) -> None:
        if len(self.spec().fields) > 0:
            raise ValueError("Cannot write to partitioned tables")

-        merge = _MergingSnapshotProducer(operation=Operation.APPEND, table=self)
+        # TODO: need to consider how to support both _MergeAppend and _FastAppend


Do we really want to support both? This part of the Java code has been a major source of (hard to debug) problems. Splitting out the commit and compaction path completely would simplify that quite a bit.

I think it is a good idea to have a separate API in UpdateSnapshot in #446 to compact manifests only. However, I believe retaining MergeAppend is also necessary due to the commit.manifest-merge.enabled setting. This setting, when enabled (which is the default), leads users to expect automatic merging of manifests when they append/overwrite data, rather than having to compact manifest by another API. What do you think?

# Conflicts: # pyiceberg/table/__init__.py

# Conflicts: # pyiceberg/table/__init__.py # tests/integration/test_writes.py

tests/conftest.py

Fokko

Hey @HonahX thanks for working on this and sorry for the late reply. I wanted to take the time to test this properly.

It looks like either the snapshot inheritance is not working properly, or something is off with the writer. I converted the Avro manifest files to JSON using avro-tools, and noticed the following:

{
    "status": 1,
    "snapshot_id": {
        "long": 6972473597951752000
    },
    "data_sequence_number": {
        "long": -1
    },
    "file_sequence_number": {
        "long": -1
    },
...
}
{
    "status": 0,
    "snapshot_id": {
        "long": 3438738529910612500
    },
    "data_sequence_number": {
        "long": -1
    },
    "file_sequence_number": {
        "long": -1
    },
...
}
{
    "status": 0,
    "snapshot_id": {
        "long": 1638533332780464400
    },
    "data_sequence_number": {
        "long": 1
    },
    "file_sequence_number": {
        "long": 1
    },
....
}

Looks like either the snapshot inheritance is not working properly when rewriting the manifests.

Fokko · 2024-03-13T08:43:40Z

tests/integration/test_writes.py

@@ -355,6 +355,44 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w
    assert [row.deleted_data_files_count for row in rows] == [0, 0, 1, 0, 0]


+@pytest.mark.integration


Can you parameterize the test for both V1 and V2 tables?

We want to assert the manifest-entries as well (only for the merge-appended one).

Fokko · 2024-03-13T11:02:03Z

pyiceberg/table/__init__.py

+    MANIFEST_MIN_MERGE_COUNT_DEFAULT = 100
+
+    MANIFEST_MERGE_ENABLED = "commit.manifest-merge.enabled"
+    MANIFEST_MERGE_ENABLED_DEFAULT = True


Can you add these to the docs as well? :)

syun64

Thank you very much for adding this @HonahX . Just one small nit, and otherwise looks good to me!

syun64 · 2024-03-22T20:18:59Z

pyiceberg/table/__init__.py

@@ -1091,7 +1111,7 @@ def append(self, df: pa.Table) -> None:
        _check_schema(self.schema(), other_schema=df.schema)

        with self.transaction() as txn:
-            with txn.update_snapshot().fast_append() as update_snapshot:
+            with txn.update_snapshot().merge_append() as update_snapshot:


Could we update the new add_files method to also use merge_append?

That seems to be the default choice of snapshot producer in Java

@syun64 Could you elaborate on the motivation to pick merge-append over a fast-append? For Java, it is for historical reasons since the fast-append was added later. The fast-append creates more metadata but also has:

Takes less time to commit, since it doesn't rewrite any existing manifests. This reduces the chances of having a conflict.

The time it takes to commit is more predictable and fairly constant to the number of data files that are written.

When you static-overwrite partitions as you do in your typical ETL, it will speed up the deletes since it can just drop a whole manifest that the previous fast-append has produced.

The main downside is when you do full-table scans that you need to evaluate more metadata.

That's a good argument @Fokko . Especially in a world where we are potentially moving the work of doing table scans into the Rest Catalog, compacting manifests on write isn't important for this function that already looks to prioritize commit speed over anything else.

I think it makes sense to leave the function to use fast_append and let the users rely on other means of optimizing their table scans.

HonahX added 2 commits February 3, 2024 22:51

add ListPacker

a835f68

Data manifests merge manager draft

9d4b980

HonahX mentioned this pull request Feb 4, 2024

Implement Centralized Management of Table Properties #365

Closed

Fokko reviewed Feb 5, 2024

View reviewed changes

Fokko added this to the PyIceberg 0.7.0 release milestone Feb 7, 2024

HonahX added 2 commits February 7, 2024 20:22

Merge branch 'main' into manifest_compaction

ac53d31

add integration test, separate to _MergeAppend and _FastAppend

419afb2

Fokko reviewed Feb 8, 2024

View reviewed changes

HonahX added 4 commits February 13, 2024 23:19

Merge branch 'main' into manifest_compaction

6b05c93

refactor to use table properties

e9ede61

Merge branch 'main' into manifest_compaction

96f109c

# Conflicts: # pyiceberg/table/__init__.py

mergeAppendFiles

7582bd2

HonahX changed the title ~~Support merge manifests on writes~~ Support merge manifests on writes (MergeAppend) Feb 23, 2024

HonahX added 2 commits February 25, 2024 23:23

Merge branch 'main' into manifest_compaction

0519448

# Conflicts: # pyiceberg/table/__init__.py

fix TODOs

2609002

HonahX marked this pull request as ready for review February 26, 2024 10:51

HonahX mentioned this pull request Feb 27, 2024

Support metadata compaction #270

Open

HonahX added 4 commits March 3, 2024 11:55

Merge branch 'main' into manifest_compaction

76b4949

# Conflicts: # pyiceberg/table/__init__.py # tests/integration/test_writes.py

resolve conflicts

cad427a

set sparkSession to use UTC timezone

6fce92b

fix nit

a32471e

HonahX commented Mar 3, 2024

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

Merge branch 'main' into manifest_compaction

57eba6a

Fokko requested changes Mar 13, 2024

View reviewed changes

syun64 reviewed Mar 22, 2024

View reviewed changes

Fokko mentioned this pull request Mar 27, 2024

Add entries metadata table #551

Merged

jqin61 mentioned this pull request Apr 3, 2024

Support partial deletes #569

Open

Fokko mentioned this pull request Apr 10, 2024

Implement rolling manifest-writers #596

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support merge manifests on writes (MergeAppend) #363

Support merge manifests on writes (MergeAppend) #363

HonahX commented Feb 4, 2024 •

edited

Fokko left a comment

Fokko Feb 5, 2024

HonahX Feb 6, 2024

Fokko Feb 8, 2024

HonahX Feb 20, 2024

Fokko left a comment •

edited

Fokko Mar 13, 2024

Fokko Mar 13, 2024

Fokko Mar 13, 2024

syun64 left a comment

syun64 Mar 22, 2024

Fokko Mar 26, 2024

syun64 Mar 26, 2024

		@@ -355,6 +355,44 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w
		assert [row.deleted_data_files_count for row in rows] == [0, 0, 1, 0, 0]


		@pytest.mark.integration

Support merge manifests on writes (MergeAppend) #363

Are you sure you want to change the base?

Support merge manifests on writes (MergeAppend) #363

Conversation

HonahX commented Feb 4, 2024 • edited

Fokko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fokko left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

syun64 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HonahX commented Feb 4, 2024 •

edited

Fokko left a comment •

edited