Skip to content

[python] Implement tag CRUD on FileSystemCatalog#7751

Merged
JingsongLi merged 2 commits intoapache:masterfrom
TheR1sing3un:py-filesystem-catalog-tag-crud
May 1, 2026
Merged

[python] Implement tag CRUD on FileSystemCatalog#7751
JingsongLi merged 2 commits intoapache:masterfrom
TheR1sing3un:py-filesystem-catalog-tag-crud

Conversation

@TheR1sing3un
Copy link
Copy Markdown
Member

Purpose

PR #7746 added abstract tag stubs to Catalog and a complete RESTCatalog implementation, but left FileSystemCatalog inheriting NotImplementedError. The merge note said implementing filesystem tag CRUD required porting a Python TagManager. As it turns out, pypaimon/tag/tag_manager.py::TagManager and FileStoreTable.create_tag / delete_tag / list_tags / tag_manager() already exist on master, so this PR is a thin catalog-layer wrapper that delegates to those helpers and translates ValueError / return-value-False into the typed catalog exceptions used by the rest of the API.

Linked issue

Follow-up to #7746 — closes the FileSystemCatalog gap noted in that PR's "Out of scope" section.

Tests

  • pypaimon/tests/filesystem_catalog_tag_test.py (12 new cases) — mirrors RESTCatalogTagCRUDTest on the local filesystem path:
    • test_create_and_get_tag, test_create_tag_with_snapshot_id
    • test_create_tag_already_exists_raises, test_create_tag_already_exists_ignore
    • test_create_tag_with_time_retained_raises_not_implemented (filesystem-only)
    • test_create_tag_table_not_exists
    • test_get_tag_not_exists
    • test_delete_tag_happy, test_delete_tag_not_exists
    • test_list_tags_paged_basic (paging via max_results + next_page_token)
    • test_list_tags_paged_with_prefix, test_list_tags_paged_empty_table
  • pypaimon/tests/rest/rest_tag_test.py — removed the now-stale FilesystemCatalogTagInheritsNotImplementedTest class (its assertions no longer hold; new behavior is covered by the new file). Cleaned up unused imports left behind.
  • Regression: pypaimon/tests/api/test_tag_dto_serde.py (8 cases) and the REST tag test matrix (11 cases) all still pass.
  • All edited files pass flake8 --config=dev/cfg.ini.

API and format

No new public Catalog method or wire format. This PR only overrides the four existing abstract stubs in FileSystemCatalog:

Method Delegate / behavior
create_tag table.create_tag(tag_name, snapshot_id, ignore_if_exists). ValueError("...already exists.")TagAlreadyExistException (only when not ignore_if_exists).
delete_tag table.delete_tag(tag_name) -> bool. FalseTagNotExistException.
get_tag table.tag_manager().get(tag_name) -> Optional[Tag]. NoneTagNotExistException. Result wrapped in GetTagResponse with snapshot=tag.trim_to_snapshot().
list_tags_paged table.list_tags() -> List[str]. Client-side lexicographic sort + prefix filter + page_token continuation cursor (no Java equivalent of paged listing exists in TagManager itself).

Documentation

n/a — this is a parity fix for an existing public API.

Out of scope (separate PRs)

  • time_retained real support — Python's Tag dataclass currently inherits only Snapshot fields and has no place to store tag_create_time / tag_time_retained. This PR rejects time_retained != None with NotImplementedError rather than silently dropping it (which would mislead callers about TTL effects). Extending Tag + TagManager.create_tag to carry these fields requires a coordinated change to the on-disk tag JSON format and is tracked as a follow-up.
  • GetTagResponse.tag_create_time — for the same reason, this catalog returns None for tag_create_time and tag_time_retained. Cross-FileIO mtime extraction is not reliable (different FileIO implementations expose different metadata).
  • Branch CRUD on FileSystemCatalog — branch CRUD requires the abstract rename_branch stub and BranchAlreadyExistException handling introduced by [python] Add branch CRUD REST implementation to RESTCatalog #7747 (still under review). The sister PR for FileSystemCatalog branch CRUD will follow once [python] Add branch CRUD REST implementation to RESTCatalog #7747 lands.
  • Tag callbacks — Java has them; Python doesn't, and they're not needed for filesystem semantics.

Verification

# Targeted
pytest pypaimon/tests/filesystem_catalog_tag_test.py -v

# Regression
pytest pypaimon/tests/rest/rest_tag_test.py pypaimon/tests/api/test_tag_dto_serde.py -v

# Master-vs-fix
git stash
python -c "from pypaimon.catalog.filesystem_catalog import FileSystemCatalog; \
           print('create_tag in __dict__:', 'create_tag' in FileSystemCatalog.__dict__)"
# Expected: False (master inherits NotImplementedError)
git stash pop
python -c "..."  # Expected: True (PR adds the override)

# Lint
flake8 --config=dev/cfg.ini pypaimon/catalog/filesystem_catalog.py \
                            pypaimon/tests/filesystem_catalog_tag_test.py \
                            pypaimon/tests/rest/rest_tag_test.py

Generative AI disclosure

Implementation, tests, and PR description were drafted with the assistance of Claude (Anthropic). Claude was supplied with the existing pypaimon helpers (TagManager, FileStoreTable.create_tag / delete_tag / list_tags, RESTCatalog.create_tag's exception mapping pattern from #7746) and the REST tag test matrix; the design plan and code were reviewed and integrated by the author.

PR apache#7746 added abstract tag stubs to ``Catalog`` and a complete
``RESTCatalog`` implementation, but left ``FileSystemCatalog``
inheriting NotImplementedError. The merge note said implementing
filesystem tag CRUD required porting a Python TagManager. As it turns
out, ``pypaimon/tag/tag_manager.py::TagManager`` and
``FileStoreTable.create_tag/delete_tag/list_tags/tag_manager()``
already exist on master, so this PR is a thin catalog-layer wrapper
that delegates to those helpers and translates ``ValueError`` /
return-value-False into the typed catalog exceptions used by the rest
of the API.

Implementations:
- create_tag: delegates to ``table.create_tag``; ``ValueError`` carrying
  "already exists" is translated to ``TagAlreadyExistException``
  (``ignore_if_exists`` is honored inside ``table.create_tag`` itself,
  so any conflict that bubbles up means the caller asked us to fail).
- delete_tag: delegates to ``table.delete_tag(tag_name) -> bool``;
  ``False`` (tag missing) is translated to ``TagNotExistException``.
- get_tag: delegates to ``table.tag_manager().get(tag_name)``; ``None``
  is translated to ``TagNotExistException``. The result is wrapped in
  ``GetTagResponse`` with the tag's snapshot via
  ``tag.trim_to_snapshot()``.
- list_tags_paged: client-side prefix filter + lexicographic
  pagination (Python TagManager has no built-in paged listing).

``time_retained`` is intentionally rejected with NotImplementedError:
the Python ``Tag`` dataclass currently inherits only ``Snapshot``
fields and has no place to store ``tag_create_time`` /
``tag_time_retained``. Silently dropping the option would lead callers
to believe their TTL took effect; raising surfaces the gap explicitly.
Extending ``Tag`` + ``TagManager.create_tag`` to carry these fields is
tracked as a follow-up (and would need a coordinated schema change in
the on-disk tag JSON format).

For the same reason, ``GetTagResponse.tag_create_time`` and
``tag_time_retained`` are returned as ``None`` from this catalog —
neither value is tracked filesystem-side today.

Tests:
- pypaimon/tests/filesystem_catalog_tag_test.py (12 cases) mirrors the
  REST tag CRUD test matrix on the local filesystem path: create+get,
  snapshot_id selection, already-exists raises / ignore, table-not-
  exists, get/delete-not-exists, list_tags_paged with paging + prefix
  + empty table, plus ``time_retained`` raises NotImplementedError.
- pypaimon/tests/rest/rest_tag_test.py: removed the now-stale
  ``FilesystemCatalogTagInheritsNotImplementedTest`` class — its
  assertions ("FileSystemCatalog raises NotImplementedError on tag
  methods") no longer hold; the new behavior is covered by the new
  filesystem test file. Cleaned up unused imports left behind.
Self-review caught a residual `unittest.main()` block whose import was
removed alongside the `FilesystemCatalogTagInheritsNotImplementedTest`
class in the previous commit. Running the file directly with
`python pypaimon/tests/rest/rest_tag_test.py` raised `NameError`.
pytest didn't catch this (it doesn't go through `__main__`), but
reviewers commonly run test files directly. Re-add the import.
@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit f816b3e into apache:master May 1, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants