Skip to content

fix: add metadata_properties to _construct_parameters when update hive table #2013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kadai0308
Copy link
Contributor

@kadai0308 kadai0308 commented May 18, 2025

Closes: #2010

Rationale for this change

This change adds metadata_properties to the _construct_parameters function to ensure metadata properties are included in the parameters.
I'm not entirely confident about the changes, so please let me know if my understanding is correct—if so, I’ll proceed to add tests.
Thanks you!

Are these changes tested?

Not yet.

Are there any user-facing changes?

Not sure.

@kadai0308 kadai0308 force-pushed the fix/hive-client-does-not-update-table-properties branch from fa27421 to df294d1 Compare May 18, 2025 13:50
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kadai0308 thanks for working on this. Would it be possible to add a test? We have Hive container running that we use for tests. This way we don't break it in the future.

Maybe we can add a test somewhere here:

def test_table_properties(catalog: Catalog) -> None:

The session_catalog_hive is a HiveCatalog.

@kadai0308
Copy link
Contributor Author

@kadai0308 thanks for working on this. Would it be possible to add a test? We have Hive container running that we use for tests. This way we don't break it in the future.

Maybe we can add a test somewhere here:

def test_table_properties(catalog: Catalog) -> None:

The session_catalog_hive is a HiveCatalog.

sure

@kadai0308 kadai0308 force-pushed the fix/hive-client-does-not-update-table-properties branch from df294d1 to ff149e8 Compare May 25, 2025 13:17
@kadai0308
Copy link
Contributor Author

@Fokko Can you help me review the PR? Thank you.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM! I added a few comments. Thanks for working on this!

properties = {PROP_EXTERNAL: "TRUE", PROP_TABLE_TYPE: "ICEBERG", PROP_METADATA_LOCATION: metadata_location}
if previous_metadata_location:
properties[PROP_PREVIOUS_METADATA_LOCATION] = previous_metadata_location

if metadata_properties:
for key, value in metadata_properties.items():
if key not in properties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is fine, it helps with not re-setting PROP_EXTERNAL, PROP_TABLE_TYPE, PROP_METADATA_LOCATION, and PROP_PREVIOUS_METADATA_LOCATION

@@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None:
table.transaction().set_properties(property_name=None).commit_transaction()
assert "None type is not a supported value in properties: property_name" in str(exc_info.value)

if isinstance(catalog, HiveCatalog):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great test! could you move this into its own test function?

with just hive catalog

@pytest.mark.integration
@pytest.mark.parametrize("catalog", [pytest.lazy_fixture("session_catalog_hive")])

@@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None:
table.transaction().set_properties(property_name=None).commit_transaction()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why the rest of these tests pass since we're not setting the properties in the HMS. Turns out the table properties are saved in the table metadata using its properties field.

This is not what the table metadata's properties field should be used for,

properties	A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, commit.retry.num-retries is used to control the number of commit retries.

This is a side affect of

@property
def properties(self) -> Dict[str, str]:
"""Properties of the table."""
return self.metadata.properties
and
return Table(
identifier=(table.dbName, table.tableName),
metadata=metadata,
metadata_location=metadata_location,
io=self._load_file_io(metadata.properties, metadata_location),
catalog=self,
)

We should fix this behavior and read/write properties using the HMS's table parameters. We can fix this separately from the current issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #2064 to track this

Copy link
Contributor

@kevinjqliu kevinjqliu Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will save the properties in both the HMS's table parameter and table metadata's properties field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review and suggestion.

Yes, this is also confuse me when I develop this PR. Then I found:
https://github.com/kadai0308/iceberg-python/blob/ff149e8e9d8e0b8dd9e74158b1fb89724833b5b4/pyiceberg/catalog/hive.py#L342-L344

So to make sure it did write to the HMS properties, I need to new a hive_client to get the hive_table.parameters:

hive_client: _HiveClient = _HiveClient(catalog.properties["uri"])

with hive_client as open_client:
    hive_table = open_client.get_table(*TABLE_NAME)
    assert hive_table.parameters.get("abc") == "def"
    assert hive_table.parameters.get("p1") == "123"

instead of just test like:

table = create_table(catalog)
assert table.properties == dict(p1="123", **DEFAULT_PROPERTIES)

I think I can also help with #2064.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @kadai0308, thanks for adding the test 👍

@@ -112,6 +113,27 @@ def test_table_properties(catalog: Catalog) -> None:
assert "None type is not a supported value in properties: property_name" in str(exc_info.value)


@pytest.mark.integration
@pytest.mark.parametrize("catalog", [pytest.lazy_fixture("session_catalog_hive")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we don't need to use parameterize with a single argument

@Fokko
Copy link
Contributor

Fokko commented Jun 8, 2025

@kadai0308, there is an issue with the code formatting, can you run make lint? Thanks!

@Fokko
Copy link
Contributor

Fokko commented Jun 13, 2025

Gentle ping @kadai0308

@Fokko
Copy link
Contributor

Fokko commented Jun 13, 2025

@kadai0308 It looks like some relevant tests are failing:

tests/catalog/test_hive.py::test_create_table[True] FAILED               [ 46%]
tests/catalog/test_hive.py::test_create_table[False] FAILED              [ 46%]
tests/catalog/test_hive.py::test_create_table_with_given_location_removes_trailing_slash[True] FAILED [ 46%]
tests/catalog/test_hive.py::test_create_table_with_given_location_removes_trailing_slash[False] FAILED [ 46%]

@kadai0308
Copy link
Contributor Author

@kadai0308 It looks like some relevant tests are failing:

tests/catalog/test_hive.py::test_create_table[True] FAILED               [ 46%]
tests/catalog/test_hive.py::test_create_table[False] FAILED              [ 46%]
tests/catalog/test_hive.py::test_create_table_with_given_location_removes_trailing_slash[True] FAILED [ 46%]
tests/catalog/test_hive.py::test_create_table_with_given_location_removes_trailing_slash[False] FAILED [ 46%]

Yeah, I am looking on it.

@Fokko
Copy link
Contributor

Fokko commented Jun 13, 2025

@kadai0308 Thanks, appreciate it. I think apart from that this PR looks good to go in 👍

@kadai0308 kadai0308 force-pushed the fix/hive-client-does-not-update-table-properties branch from 5273b95 to 985d77d Compare June 13, 2025 14:54
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevinjqliu
Copy link
Contributor

@kadai0308 looks like linter failed, can you run make lint locally?

ruff-format..............................................................Failed
- hook id: ruff-format
- files were modified by this hook

1 file reformatted, 153 files left unchanged

@kadai0308
Copy link
Contributor Author

@kadai0308 looks like linter failed, can you run make lint locally?

ruff-format..............................................................Failed
- hook id: ruff-format
- files were modified by this hook

1 file reformatted, 153 files left unchanged

Sorry for forgetting to run lint before pushing the commit.

@kevinjqliu kevinjqliu merged commit 4f0d7ef into apache:main Jun 14, 2025
10 checks passed
@kevinjqliu
Copy link
Contributor

kevinjqliu commented Jun 14, 2025

Thanks for the fix @kadai0308 and thanks @Fokko @geruh for the review :)

@kadai0308 kadai0308 deleted the fix/hive-client-does-not-update-table-properties branch June 15, 2025 04:50
Fokko pushed a commit that referenced this pull request Jul 3, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

See #2013
Closes #2064

Continuing the trend, but with glue.

# Are these changes tested?

See test below

# Are there any user-facing changes?

When a user specifies property update on commit table, those parameters
will be passed to the glue client.

<!-- In the case of user-facing changes, please add the changelog label.
-->
amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
…e table (apache#2013)

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
Closes: apache#2010

# Rationale for this change
This change adds metadata_properties to the _construct_parameters
function to ensure metadata properties are included in the parameters.
I'm not entirely confident about the changes, so please let me know if
my understanding is correct—if so, I’ll proceed to add tests.
Thanks you!

# Are these changes tested?
Not yet.

# Are there any user-facing changes?
Not sure.
amitgilad3 pushed a commit to amitgilad3/iceberg-python that referenced this pull request Jul 7, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change

See apache#2013
Closes apache#2064

Continuing the trend, but with glue.

# Are these changes tested?

See test below

# Are there any user-facing changes?

When a user specifies property update on commit table, those parameters
will be passed to the glue client.

<!-- In the case of user-facing changes, please add the changelog label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] hive client does not update table properties
4 participants