BLDX-434 | Migrate to `msgspec.Struct` models by Aryamanz29 · Pull Request #811 · atlanhq/atlan-python

Aryamanz29 · 2026-02-15T11:32:53Z

✨ Description

https://linear.app/atlan-epd/issue/BLDX-434/plan-productionization-of-msgspecstruct-models

🧩 Type of change

Select all that apply:

🚀 New feature (non-breaking change that adds functionality)
🐛 Bug fix (non-breaking change that fixes an issue) — please include tests! Refer testing-toolkit 🧪
🔄 Refactor (code change that neither fixes a bug nor adds a feature)
🧹 Maintenance (chores, cleanup, minor improvements)
💥 Breaking change (fix or feature that may break existing functionality)
📦 Dependency upgrade/downgrade
📚 Documentation updates

📋 Checklist

My code follows the project’s style guidelines
I’ve performed a self-review of my code
I’ve added comments in tricky or complex areas
I’ve updated the documentation as needed
There are no new warnings from my changes
I’ve added tests to cover my changes
All new and existing tests pass locally

fyzanshaik-atlan · 2026-02-16T15:59:19Z

@greptile review

greptile-apps · 2026-02-16T15:59:39Z

Too many files changed for review. (1032 files found, 500 file limit)

fyzanshaik-atlan · 2026-02-16T16:05:02Z

@greptile re-review

fyzanshaik-atlan · 2026-02-17T15:27:08Z

@claude /review

fyzanshaik-atlan · 2026-02-20T07:27:14Z

@claude /review

claude · 2026-02-20T07:28:46Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Aryamanz29 · 2026-02-22T10:54:41Z

@claude /review

claude · 2026-02-22T10:56:10Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

fyzanshaik-atlan · 2026-03-03T22:21:47Z

Code Review

This PR migrates the pyatlan SDK's internal model layer from Pydantic BaseModel to msgspec.Struct, introducing a parallel pyatlan_v9 package with its own client, model, and test trees. The approach runs both model systems side-by-side, bridging them via a custom _is_model_instance helper and a registered Pydantic encoder for msgspec structs. The change is marked as breaking and covers ~990 files.

Confidence Score: 2/5

This PR is a draft and not yet intended for merge — reviewed in that context
Checked for correctness bugs, security, standards compliance, CI integration, and packaging
Score deducted for two infrastructure-level blockers: missing runtime dependency declaration and absent CI coverage for the new test suite, plus a confirmed logic bug in the audit model

Important Files Changed

File	Change	Risk
`pyproject.toml`	Modified	High — runtime dependency missing
`.github/workflows/pyatlan-pr.yaml`	Modified	High — v9 tests not wired
`pyatlan/validate.py`	Added	Medium — cross-boundary type matching
`pyatlan_v9/client/atlan.py`	Added	High — core client entry point
`pyatlan_v9/model/audit.py`	Added	Medium — logic bug in sort fallback
`pyatlan_v9/model/typedef.py`	Added	Medium — module-level mutable state
`pyatlan_v9/client/transport.py`	Added	Medium — custom HTTP retry transport
`tests_v9/unit/`	Added	Medium — not run in CI

Change Flow

sequenceDiagram
    participant User as User Code
    participant Client as AtlanClient (v9)
    participant SubClient as Sub-client (e.g. AssetClient)
    participant Validate as validate_arguments
    participant Bridge as _is_model_instance
    participant Msgspec as msgspec.Struct model
    participant HTTP as httpx + PyatlanTransport
    participant Legacy as Legacy Pydantic layer

    User->>Client: AtlanClient(base_url, api_key)
    Client->>Client: __post_init__ (env fallbacks, session init)
    User->>SubClient: client.assets.get_by_guid(guid)
    SubClient->>Validate: @validate_arguments checks type
    Validate->>Bridge: _is_model_instance(value, expected_type)
    Bridge-->>Validate: match by MRO name
    SubClient->>HTTP: _call_api(endpoint, request_obj)
    HTTP->>Legacy: deserialize response (Pydantic)
    Legacy-->>SubClient: legacy Asset object
    SubClient->>Msgspec: (v9 path) msgspec.convert(raw, TargetStruct)
    Msgspec-->>User: typed v9 model instance

Findings

#	Severity	File	Issue
1	Critical	`pyproject.toml:29-41`	`msgspec` is not declared as a runtime dependency. Any installation of the package will produce `ModuleNotFoundError` when importing `pyatlan_v9`.
2	Critical	`.github/workflows/pyatlan-pr.yaml:117`	The CI workflow only runs `pytest tests/unit` (legacy). The `tests_v9/unit` suite — 1,852 tests — is never executed on PRs. Regressions in v9 code will not be caught.
3	Warning	`pyatlan_v9/model/audit.py:92,117,149`	Logic bug in sort fallback: `sort=sort if LATEST_FIRST else []`. `LATEST_FIRST` is a non-empty list and is always truthy, so the condition always takes the left branch. The empty-list fallback is dead code. Intended expression is likely `sort if sort else LATEST_FIRST`.
4	Warning	`.github/workflows/pyatlan-pr.yaml:109-110`	QA checks (ruff-format, ruff-lint, mypy) are commented out. One of the commits is explicitly named `[temp/ci] Skip QA checks`. These must be restored before merge.
5	Warning	`pyatlan_v9/model/typedef.py:448,557,675`	`_OPTIONS_PARENT_MAP: dict[int, Any] = {}` is a module-level mutable dict keyed by `id(self)`. Object IDs are reused after garbage collection, and there is no cleanup path, causing unbounded growth in long-running processes. This is also not thread-safe.
6	Warning	`pyatlan/validate.py:91,110`	`_is_model_instance` generic fallback matches any class whose MRO contains a name equal to `expected_type.__name__`. A user-defined class named `Asset` or `Query` will pass validation for a `pyatlan.model.assets.core.Asset` expected type, silently accepting the wrong object.

- Remove dead code: admin/, checkpoint.py, exceptions.py, py.typed - Delete old single-file client.py (replaced by client/ package later) - Rename models/ → model/assets/ for consistency with legacy pyatlan/model/assets/ - Move infrastructure files (conversion_utils.py, serde.py, transform.py) to model/ - Add model/__init__.py for package-level re-exports - Update all import paths (pyatlan_v9.models → pyatlan_v9.model.assets) - Update all model test imports to match new paths

Migrate all legacy pyatlan model files (AtlanObject/Pydantic BaseModel) to pyatlan_v9 msgspec.Struct equivalents: Infrastructure: - core.py: AtlanObject base, AtlanTag, AtlanField helpers - structs.py: SourceTagAttachment, BadgeCondition, etc. - translators.py/retranslators.py: Tag name translation pipeline Models (28 new files): - search.py: DSL, IndexSearchRequest, Query types - typedef.py: EnumDef, StructDef, AtlanTagDef, CustomMetadataDef, etc. - lineage.py: LineageListRequest, FluentLineage, LineageResponse, etc. - audit.py, search_log.py: AuditSearchRequest, SearchLogRequest - response.py: AssetMutationResponse, AssetResponse - group.py, user.py, role.py: GroupRequest, AtlanUser, AtlanRole - credential.py, oauth_client.py, sso.py, api_tokens.py - events.py, keycloak_events.py: AtlanEvent, KeycloakEvent - query.py, task.py, workflow.py, suggestions.py - aggregation.py, atlan_image.py, contract.py, custom_metadata.py - data_mesh.py, dq_rule_conditions.py, file.py, internal.py, lineage_ref.py Assets: - purpose.py: Purpose model with tag translation support - snowflake_dynamic_table.py: SnowflakeDynamicTable model All models use msgspec conventions: kw_only=True, UNSET/UnsetType, rename='camel' where needed, and proper serialization methods.

…, add S3Object.create_with_prefix alias - Add _normalize_camel_key to handle API keys with uppercase abbreviations (e.g., apiPathRawURI→apiPathRawUri, dataProductAssetsDSL→dataProductAssetsDsl) - Fix announcement_type UNSET assertions in 5 asset test files (data_studio, gcs, preset, superset, api) for partial update responses - Add S3Object.create_with_prefix as alias for creator_with_prefix Made-with: Cursor

…ests Change `is None` to `not` for immediate_downstream and immediate_upstream checks to handle msgspec UNSET values in v9 lineage test assertions. Made-with: Cursor

…UNSET assertions, Anaplan creator params - Add explicit msgspec.field(name=...) mappings for QuickSight model fields (quick_sight_folder_type, quick_sight_dataset_import_mode, quick_sight_dataset_field_type) - Fix retranslators.py NoneType iteration when classification_names is None - Update S3 and test_client assertions to handle UNSET vs None for optional fields - Normalize Anaplan creator connection_qualified_name parameter signatures - Fix glossary _assert_relationship for v9 flat model structure Made-with: Cursor

…ration tests Systematically convert `assert field is None` to `assert not field` for msgspec model attributes that return UNSET instead of None when absent. Changes across 21 test files: - test_client.py: certificate_status, anchor, classification_names, views - test_sql_assets.py: mutated_entities.CREATE, trim_to_required checks - test_index_search.py: aggregations, nested_results, timestamp coercion - custom_metadata_test.py: RACI, IPR, DQ empty attribute checks - glossary_test.py: mutated_entities checks - document_db_asset_test.py: trim_to_required checks - s3_asset_test.py: aws_arn, s3_object_key absent fields - suggestions_test.py: owner_groups, meanings, description - test_workflow_client.py: credential extras/level/metadata - All aio/ async counterparts with same patterns - Convert integer timestamps to datetime for Pydantic validation Made-with: Cursor

…on, glossary UDR - Change certificate_status/message defaults from None to UNSET in Asset so setting them to None is properly serialized (not omitted by omit_defaults) - Simplify PopularityInsights.record_last_timestamp type to Union[int, None] to fix msgspec type union restriction while keeping __post_init__ conversion - Construct PopularityInsights via kwargs (not post-init mutation) so datetime→epoch conversion runs during construction - Add msgspec.convert in test_sql_assets verify_popularity to handle dict responses from API for source_read_recent_user_record_list - Fix glossary _assert_relationship to check relationship_attributes dict Made-with: Cursor

…, Entity UNSET defaults - Procedure.updater(): make definition optional (default '') so updater(qualified_name, name) works - Entity: set classification_names, meanings, labels, pending_tasks defaults to UNSET so workflow package serialization omits them (fixes test_packages AssertionError on connection payload) - Audit search: audit_search_paging.json entity audit keys to camelCase (entityQualifiedName, typeName, entityId) for EntityAudit decode; assert helpers use first['entityId'] etc. Made-with: Cursor

tests/unit and tests/unit/aio assert on audit_search_paging.json which was updated to camelCase (entityQualifiedName, typeName, entityId); update _assert_audit_search_results to read first['entityId'] etc. Made-with: Cursor

…validate_arguments Delete the custom validate_arguments wrapper and _is_model_instance utility. All legacy client, model, and UI code now uses pydantic.v1's validate_arguments directly. Also removes the msgspec cross-type encoder registration from model/core.py and _is_msgspec_cross_type_match from utils.py, and updates the batch __track method to support v9 models via getattr type_name check. Made-with: Cursor

…essages Pydantic v1's validate_arguments produces different error messages than the removed custom validator (e.g. "value is not a valid dict" instead of "instance of X expected"). Update all assertion strings and change exact equality checks to substring checks where pydantic appends type info. Made-with: Cursor

Adds validate_arguments decorators to async SSO client, fixes SSO model fields, improves asset/batch client v9 compatibility, and various model enhancements (badge, entity, persona, schema, etc.). Includes v9 validate module and pkg utilities. Made-with: Cursor

Fix SSO test missing group_map_name param, task client assertion, batch test mock patches. Add v9-specific validation constants with custom validator error messages (separate from pydantic messages in legacy). Made-with: Cursor

Remove legacy client usage from v9 integration tests, fix SSO test assertions, update async conftest and utilities for v9 client. Made-with: Cursor

…l qa-checks (ruff format, ruff check, mypy) Made-with: Cursor

Legacy tests under tests/ are kept in sync with main branch. V9-specific tests live under tests_v9/. Made-with: Cursor

…scriptors and overlay system - Regenerate 557 asset files from Pkl type definitions with field descriptors (ClassVar placeholders + deferred KeywordField/BooleanField/NumericField init) - Add 82 overlay files (_overlays/) with custom methods (creator, updater, policies) - Add /generate-v9-models Claude Code skill for model regeneration from models repo - Add pyatlan_v9 model generation section to README - Post-sync patches: set[str] fields in asset.py, relationship_attributes in related_entity.py, Process.Attributes backward-compat alias Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Regenerate all asset models using updated Pkl generator: - set[str] fields now generated natively (no post-sync sed patch) - validate/minimize/relate methods removed (dead code) - relationship_attributes field in RelatedEntity base class - Move overlay files to SDK repo (_overlays/ directory) - Fix overlay imports: ai_model.py, collection.py, connection.py - Add ruff exclusion for _overlays/ and F821 per-file-ignores - Update generate-v9-models skill: sdkOnly mode, temp staging, ruff auto-fix+format step, removed asset.py sed patch - Update process_test.py to use Process.generate_qualified_name Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Updated 60 existing asset files with new fields/changes - Added 5 new Snowflake Semantic files (dimension, fact, logical_table, metric, view) - Ruff-cleaned 339 unused import errors Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…notations All struct field annotations now use Union[X, None, UnsetType] instead of X | None | UnsetType, and List/Dict/Set from typing instead of builtin subscripting. Verified passing 6376 tests on both Python 3.9 and 3.11. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Aryamanz29

LGTM - lets goooooooooo 🚀

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Aryamanz29 self-assigned this Feb 15, 2026

Aryamanz29 added feature New feature or request dependencies Pull requests that update a dependency file change Pyatlan change pull request breaking-change labels Feb 15, 2026

Aryamanz29 force-pushed the BLDX-434 branch from 5ad9abd to 9625aac Compare February 19, 2026 12:19