You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the previous line we cut the first segment of the namespace (presumably because this is the name) - even if the warehouse name is None. If there warehouse name is None, it isn't even added to the identifier namespace:
Thus, when name is None or "", the first bit of the namespace identifier is removed in L356 (first snippet) wrongly.
If the catalog has a name, we could argue that removing it is correct.
However there is a second Problem:
I would argue that the RestCatalogs name should NEVER be prepended to a Tables namespace identifier in the first place.
And pyiceberg has this opinion too for 50% ;) - unfortunately the namespace provided in the url and in some requests diverge.
With the script at the bottom of this issue I receive the following request to the endpoint POST http://localhost:8080/catalog/v1/2cb6e8f0-0b7e-11ef-a4fb-c3ca318d8520/namespaces/my_namespace/tables/my_table (so namespace "my_namespace"):
Notice how the namespace provided in the body is different from the namespace in the URL. The URL is not prefixed (due to the [:1] that the path magic function does as seen above), while the body contains a prefixed namespace.
My solution would be to get rid of the name as part of the namespace altogether, thus removing the name prefix here:
Any thoughts welcome!
My baseline is that at least the URL and the body should match.
My code:
importpandasaspdimportpyarrowaspafrompyiceberg.catalog.restimportRestCatalogcatalog=RestCatalog(
name="my_catalog_name",
uri="http://localhost:8080/catalog/",
warehouse="test",
# For now we need a dummy token even for unauthenticated catalogs.# This is a requirement of pyiceberg.token="dummy",
)
# Lets work with the catalognamespace= ("my_namespace",)
ifnamespacenotincatalog.list_namespaces():
catalog.create_namespace(namespace)
df=pd.DataFrame(
[[1, 1.2, "foo"], [2, 2.2, "bar"]], columns=["my_ints", "my_floats", "strings"]
)
tbl=pa.Table.from_pandas(df)
table=catalog.create_table((*namespace, "my_table"), schema=tbl.schema)
table.overwrite(tbl) # <-- This is the important part
The text was updated successfully, but these errors were encountered:
Apache Iceberg version
main (development)
Please describe the bug 馃悶
This is a harder one:
I am currently unhappy with the way pyiceberg handles the RestCatalogs
name
property.There is one probably undisputed bug here:
iceberg-python/pyiceberg/catalog/rest.py
Line 356 in 20b7b53
In the previous line we cut the first segment of the namespace (presumably because this is the
name
) - even if the warehouse name isNone
. If there warehouse name isNone
, it isn't even added to the identifier namespace:iceberg-python/pyiceberg/catalog/rest.py
Line 464 in 20b7b53
(notice the
if self.name
)Thus, when name is None or "", the first bit of the namespace identifier is removed in L356 (first snippet) wrongly.
If the catalog has a name, we could argue that removing it is correct.
However there is a second Problem:
I would argue that the RestCatalogs name should NEVER be prepended to a Tables namespace identifier in the first place.
And pyiceberg has this opinion too for 50% ;) - unfortunately the namespace provided in the url and in some requests diverge.
With the script at the bottom of this issue I receive the following request to the endpoint POST
http://localhost:8080/catalog/v1/2cb6e8f0-0b7e-11ef-a4fb-c3ca318d8520/namespaces/my_namespace/tables/my_table
(so namespace "my_namespace"):Notice how the namespace provided in the body is different from the namespace in the URL. The URL is not prefixed (due to the [:1] that the path magic function does as seen above), while the body contains a prefixed namespace.
My solution would be to get rid of the name as part of the namespace altogether, thus removing the name prefix here:
iceberg-python/pyiceberg/catalog/rest.py
Line 464 in 20b7b53
And consequently also the remove the [1:] here:
iceberg-python/pyiceberg/catalog/rest.py
Line 356 in 20b7b53
Any thoughts welcome!
My baseline is that at least the URL and the body should match.
My code:
The text was updated successfully, but these errors were encountered: