Skip to content

Inconsistency in catalog.list_tables Behavior Across Python and Java: Returns Non-Iceberg Tables in Python Only #314

@HonahX

Description

@HonahX

Feature Request / Improvement

I noticed that in python, hive, glue and dynamo list all tables, including non-Iceberg ones, in the namespace

def list_tables(self, namespace: Union[str, Identifier]) -> List[Identifier]:
"""List tables under the given namespace in the catalog (including non-Iceberg tables).
When the database doesn't exist, it will just return an empty list.
Args:
namespace: Database to list.
Returns:
List[Identifier]: list of table identifiers.
Raises:
NoSuchNamespaceError: If a namespace with the given name does not exist, or the identifier is invalid.
"""
database_name = self.identifier_to_database(namespace, NoSuchNamespaceError)
with self._client as open_client:
return [(database_name, table_name) for table_name in open_client.get_all_tables(db_name=database_name)]

def list_tables(self, namespace: Union[str, Identifier]) -> List[Identifier]:
"""List tables under the given namespace in the catalog (including non-Iceberg tables).
Args:
namespace (str | Identifier): Namespace identifier to search.
Returns:
List[Identifier]: list of table identifiers.
Raises:
NoSuchNamespaceError: If a namespace with the given name does not exist, or the identifier is invalid.
"""
database_name = self.identifier_to_database(namespace, NoSuchNamespaceError)
table_list: List[TableTypeDef] = []
next_token: Optional[str] = None
try:
while True:
table_list_response = (
self.glue.get_tables(DatabaseName=database_name)
if not next_token
else self.glue.get_tables(DatabaseName=database_name, NextToken=next_token)
)
table_list.extend(table_list_response["TableList"])
next_token = table_list_response.get("NextToken")
if not next_token:
break
except self.glue.exceptions.EntityNotFoundException as e:
raise NoSuchNamespaceError(f"Database does not exist: {database_name}") from e
return [(database_name, table["Name"]) for table in table_list]

However, in java, we apply a filter to only return Iceberg tables in the given namespace:
GlueCatalog.listTables
HiveCatalog.listTables

I forgot if we discussed this before: Why do we choose to include non-iceberg tables in the result in python?

cc @Fokko

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions