Skip to content

Feature: Write to branches #941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jul 2, 2025
Merged

Conversation

vinjai
Copy link
Contributor

@vinjai vinjai commented Jul 18, 2024

Fixes: #306

kevinjqliu and others added 4 commits July 15, 2024 03:36
@vinjai vinjai changed the title Feature/write to branch Feature: Write to branches Jul 18, 2024
@vinjai vinjai marked this pull request as ready for review October 16, 2024 08:15
@vinjai
Copy link
Contributor Author

vinjai commented Oct 16, 2024

@sungwy @kevinjqliu
This PR is ready for review

@vinjai
Copy link
Contributor Author

vinjai commented Oct 16, 2024

Fixed another bug.
Writes with same name but different ref types were being successful. This is a bug in the current release version too

Please review whenever you get some time.

@kevinjqliu
Copy link
Contributor

Thanks for the contribution! I'll take a look.
I remember adding support for branch is complicated since we need to consider different edge cases.

@vinjai
Copy link
Contributor Author

vinjai commented Oct 19, 2024

I have mostly tried to cover all edge cases.
The idea is that the branch is just another iceberg table where the snapshots append independently of the main branch.

I also agree with your concern.
If it helps, we can add more test cases

@kevinjqliu kevinjqliu added this to the PyIceberg 0.9.0 release milestone Oct 30, 2024
@vinjai
Copy link
Contributor Author

vinjai commented Nov 7, 2024

Hey @kevinjqliu
Did you get a chance to look at this?

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! And sorry for the delay, I was running the 0.8.0 release.
Generally LGTM, I left a comment about add more tests integrating with Spark

@vinjai
Copy link
Contributor Author

vinjai commented Nov 14, 2024

Thank you for the review @kevinjqliu

@vinjai
Copy link
Contributor Author

vinjai commented Nov 20, 2024

@kevinjqliu What are the next steps to get this merged?

@vinjai vinjai requested a review from kevinjqliu November 20, 2024 10:00
@Fokko
Copy link
Contributor

Fokko commented Apr 24, 2025

Thanks @dbuades for pinging me, and sorry for letting this one linger for so long. Let me review this tomorrow morning.

@vinjai Do you have time to resolve the conflicts by any chance?

@vinjai
Copy link
Contributor Author

vinjai commented Apr 28, 2025

Hey @Fokko
Will try to resolve the conflicts over the weekend.

@SebastienN15
Copy link

Hey @vinjai, did you get a chance to fix those conflicts? 🙂

@SebastienN15
Copy link

@vinjai I fixed the conflicts on this branch, feel free to cherry pick the last commit #2009
Otherwise I can open a PR on my side and take it from there 🙂

@vinjai
Copy link
Contributor Author

vinjai commented May 23, 2025

Thanks @SebastienN15 — I’ll review your commit and take it forward from there.
I’ll share the working PR tomorrow.

@vinjai
Copy link
Contributor Author

vinjai commented May 26, 2025

Identified and fixed a bug related to empty tables.
Planning to add test cases to cover this scenario.

@vinjai
Copy link
Contributor Author

vinjai commented May 27, 2025

Hey @Fokko
This PR is ready for review again

@vinjai
Copy link
Contributor Author

vinjai commented Jun 1, 2025

Hey @Fokko
I noticed that the integration tests for Python 3.9 are failing.
Possibly, a runtime issue:

ServerError: NoSuchBucketException: The specified bucket does not exist (Service: S3, Status Code: 404, Request ID: 18436A88DE8BCC82, Extended Request ID: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8) (SDK Attempt Count: 1)

I ran the same tests locally with Python 3.9.6, and they passed without issues.
Is there any change required in the PR to resolve the above failure?

@Fokko
Copy link
Contributor

Fokko commented Jun 1, 2025

@vinjai Thanks for pinging me. The issue has been fixed in #2049. Locally, you're still using an older version of minio, and therefore the tests pass. Could you pull in the latest main branch to fix the CI? Thanks!

@vinjai
Copy link
Contributor Author

vinjai commented Jun 3, 2025

Hey @Fokko
I've pulled the latest changes from the main branch. Please re-trigger the workflow.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vinjai for working on this, and sorry for the long wait. I think this looks great, and let's move this forward 👍

Comment on lines 662 to 665
if branch is None:
files = self._scan(row_filter=delete_filter, case_sensitive=case_sensitive).plan_files()
else:
files = self._scan(row_filter=delete_filter, case_sensitive=case_sensitive).use_ref(branch).plan_files()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in a subsequent PR we can pass in the ref to the constructor 👍

@Fokko
Copy link
Contributor

Fokko commented Jun 18, 2025

@vinjai I'm sorry, can you resolve the conflicts once more? I'll merge right after

@SebastienN15
Copy link

Hey @vinjai, would you be able to resolve the conflicts? I would love to see this PR merged 🙏

@vinjai
Copy link
Contributor Author

vinjai commented Jun 24, 2025

Hey @Fokko
I have resolved all conflicts and merged the recent changes.
While running integration tests, I am facing this error for a few tests:

java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (360448)
Allocator(toArrowBatchIterator) 0/360448/360448/9223372036854775807 (res/actual/peak/limit)

The same error is coming on the main branch too.

Any idea on how to resolve this?

@vinjai vinjai requested a review from Fokko June 28, 2025 06:29
@vinjai
Copy link
Contributor Author

vinjai commented Jun 28, 2025

Hey @Fokko,

I tried the tests on a different machine. The tests are running fine.
Request you to run the workflow and merge this PR ahead.

Thanks

@Fokko
Copy link
Contributor

Fokko commented Jul 2, 2025

@vinjai Thanks, let's move this forward. Thanks @SebastienN15 for pinging me on this matter :)

@Fokko Fokko merged commit 33c8931 into apache:main Jul 2, 2025
10 checks passed
amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
Fixes: apache#306

---------

Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support writing to a branch
5 participants