Skip to content

Fix TypeError in create_match_filter for Composite Keys with Single Unique Condition #1693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Feb 21, 2025

Conversation

omkenge
Copy link
Contributor

@omkenge omkenge commented Feb 20, 2025

Old Code Behavior:

Even if there's only one such condition, the code wraps it in an Or() operator.
But Or() is meant to combine two or more conditions (like “condition A OR condition B”).
If you give it only one condition, it complains because it expects a second condition.

New Code Behavior:

The new change checks how many conditions you have.
If there's only one condition, it simply returns that condition.
If there are more than one, it uses Or() to combine them.

@omkenge omkenge mentioned this pull request Feb 20, 2025
@omkenge omkenge changed the title Upsert Issue Fix TypeError in create_match_filter for Composite Keys with Single Unique Condition Feb 20, 2025
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this! can you also add a test case too?

@omkenge
Copy link
Contributor Author

omkenge commented Feb 20, 2025

@kevinjqliu
The existing test already covers the composite key scenario. With the fix, it exercises the code path that previously caused the error, so no new test case is needed.

@omkenge
Copy link
Contributor Author

omkenge commented Feb 20, 2025

Old Senario
Problem: The Or() function expects two or more conditions (one on the left, one on the right). But here it is given only one condition.
Result: This produces a TypeError because Or() is missing its "right" argument.

@corleyma
Copy link

@omkenge I think the ask is to add a test covering the case where there is a single condition. This test would have failed prior to your change, and should pass with your change.

@omkenge
Copy link
Contributor Author

omkenge commented Feb 20, 2025

@corleyma @kevinjqliu
I tried , But not sure I need help from you

@omkenge
Copy link
Contributor Author

omkenge commented Feb 21, 2025

Issue:
When no rows are updated (i.e., none of the source composite keys match the target), the code calls pa.Table.from_arrays([], names=...). This passes 0 arrays for a schema that expects 6 arrays(origianl schema of table), causing the error.
ERROR

    rows_to_update = upsert_util.get_rows_to_update(df, matched_iceberg_table, join_cols)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/pyiceberg/pyiceberg/table/upsert_util.py", line 89, in get_rows_to_update
    rows_to_update_table = pa.Table.from_arrays([], names=source_table.column_names)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 4851, in pyarrow.lib.Table.from_arrays
  File "pyarrow/table.pxi", line 1593, in pyarrow.lib._sanitize_arrays
  File "pyarrow/table.pxi", line 1557, in pyarrow.lib._schema_from_arrays
ValueError: Length of names (6) does not match length of arrays (0)

Solution:
Create an empty array for each column in the source schema:

empty_arrays = [pa.array([], type=field.type) for field in source_table.schema]
rows_to_update_table = pa.Table.from_arrays(empty_arrays, schema=source_table.schema)

@Fokko Fokko added this to the PyIceberg 0.9.0 milestone Feb 21, 2025
@Fokko
Copy link
Contributor

Fokko commented Feb 21, 2025

@omkenge I ran into the same issue, and saw that you already fixed this 🚀 Thanks for jumping on this right away! I left some suggestions, but it looks like it is heading into the right direction 👍

omkenge and others added 2 commits February 21, 2025 14:43
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@omkenge
Copy link
Contributor Author

omkenge commented Feb 21, 2025

@Fokko What is your opinion on Test Cases? my point of view is we do not need test cases for this change

omkenge and others added 2 commits February 21, 2025 15:15
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@Fokko
Copy link
Contributor

Fokko commented Feb 21, 2025

@omkenge can fix the import?

from pyiceberg.expressions import (
    And,
    BooleanExpression,
    EqualTo,
    In,
    Or,
)

Should be:

from pyiceberg.expressions import (
    AlwaysFalse,
    And,
    BooleanExpression,
    EqualTo,
    In,
    Or,
)

Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@Fokko Fokko merged commit 4e9c66d into apache:main Feb 21, 2025
7 checks passed
@Fokko
Copy link
Contributor

Fokko commented Feb 21, 2025

Thanks @omkenge for fixing this! Thanks @corleyma and @kevinjqliu for the review, I'll be merging this now since I'm also blocked by it :)

@omkenge omkenge deleted the upsert_issue branch February 22, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants