Skip to content

Fix Table.scan to enable case sensitive argument #1423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 16, 2024

Conversation

jiakai-li
Copy link
Contributor

This pull request fixes below issue:

The change modified the DataScan._build_partition_projection method to pass case_sensitive argument when calling inclusive_projection. So that Table.scan method respects this configuration. Also added related tests.

@jiakai-li jiakai-li changed the title Fix Table.scan to enable case sensitivity argument Fix Table.scan to enable case sensitive argument Dec 11, 2024
@sungwy
Copy link
Collaborator

sungwy commented Dec 11, 2024

Hi @jiakai-li - thank you for reporting this issue and putting up this PR!

The behavior looks right to me, but I think it would be worth thinking through this behavior change holistically. In _DeleteFiles we also build partition projection using inclusive_projection where we only support case_sensitive=True matching.

def _build_partition_projection(self, spec_id: int) -> BooleanExpression:
schema = self._transaction.table_metadata.schema()
spec = self._transaction.table_metadata.specs()[spec_id]
project = inclusive_projection(schema, spec)
return project(self._predicate)

If we introduce case_sensitive=False as a behavior that can be controlled by the user on DataScan for inclusive_projection, I think its also worth thinking if we want to introduce a discrepancy in our behaviors based on API, or if we should be keeping a consistent behavior (one way or another).

Tagging @Fokko for review

@sungwy sungwy requested a review from Fokko December 11, 2024 14:18
@Fokko
Copy link
Contributor

Fokko commented Dec 11, 2024

First of all, thanks @jiakai-li for raising this PR, and thanks @sungwy for raising that issue. And I agree, I think we should also be able to control case-sensitivity when doing deletes (defaulting to true as this is the current behavior).

@jiakai-li
Copy link
Contributor Author

Thanks very much for the guidance guys @sungwy and @Fokko . Is it ok for me to pick up the delete part as well? I'll update this PR to include both operations if that's ok. Thanks!

@sungwy
Copy link
Collaborator

sungwy commented Dec 11, 2024

Yes, I think updating this PR to include the changes for both makes sense @jiakai-li 👍

Thank you again for tackling this issue!

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left a comment on making the tests more readable

- Add more readable integration test for case-sensitive and case-insensitive `Table.scan`
- Remove less readable test
- Enable `case_sensitive` delete and overwrite
@jiakai-li
Copy link
Contributor Author

Hey guys, thanks a lot for your kind guidance and great suggestions. I've updated the PR to:

  • Enable Table.delete and Table.overwrite operations to control case-sensitivity
  • Updated test cases to make it more readable

Please let me know if there is other changes you would like me to make, thanks.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a few nit comments, generally LGTM! Thanks!

@jiakai-li
Copy link
Contributor Author

jiakai-li commented Dec 14, 2024

added a few nit comments, generally LGTM! Thanks!

Thank you @kevinjqliu ! I've updated the test cases and resolved the comments. Please let me know if you have further comments.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @jiakai-li this looks very good. I've left some small comments, but in general this looks good 👍

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @jiakai-li for working on this, and thanks @sungwy for the sharp eye 👍

@Fokko Fokko merged commit b34d8dd into apache:main Dec 16, 2024
@jiakai-li jiakai-li deleted the fix-table-scan-case-sensitive branch December 18, 2024 02:43
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 24, 2024
* fix-table-scan-enable-case-sensitivity

* Updates included:
- Add more readable integration test for case-sensitive and case-insensitive `Table.scan`
- Remove less readable test
- Enable `case_sensitive` delete and overwrite

* Remove less readable test

* Add integration test `Table.delete` and `Table.overwrite`

* Fix typo

* Add test cases for default `Table.delete` case-sensitivity

* Update `case_sensitive` argument position
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants