Skip to content

BLDX-434 | Migrate to msgspec.Struct models#811

Merged
Aryamanz29 merged 45 commits intomainfrom
BLDX-434
Mar 5, 2026
Merged

BLDX-434 | Migrate to msgspec.Struct models#811
Aryamanz29 merged 45 commits intomainfrom
BLDX-434

Conversation

@Aryamanz29
Copy link
Copy Markdown
Member

✨ Description

https://linear.app/atlan-epd/issue/BLDX-434/plan-productionization-of-msgspecstruct-models

🧩 Type of change

Select all that apply:

  • 🚀 New feature (non-breaking change that adds functionality)
  • 🐛 Bug fix (non-breaking change that fixes an issue) — please include tests! Refer testing-toolkit 🧪
  • 🔄 Refactor (code change that neither fixes a bug nor adds a feature)
  • 🧹 Maintenance (chores, cleanup, minor improvements)
  • 💥 Breaking change (fix or feature that may break existing functionality)
  • 📦 Dependency upgrade/downgrade
  • 📚 Documentation updates

📋 Checklist

  • My code follows the project’s style guidelines
  • I’ve performed a self-review of my code
  • I’ve added comments in tricky or complex areas
  • I’ve updated the documentation as needed
  • There are no new warnings from my changes
  • I’ve added tests to cover my changes
  • All new and existing tests pass locally

@Aryamanz29 Aryamanz29 self-assigned this Feb 15, 2026
@Aryamanz29 Aryamanz29 added feature New feature or request dependencies Pull requests that update a dependency file change Pyatlan change pull request breaking-change labels Feb 15, 2026
@fyzanshaik-atlan
Copy link
Copy Markdown
Contributor

@greptile review

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Feb 16, 2026

Too many files changed for review. (1032 files found, 500 file limit)

@fyzanshaik-atlan
Copy link
Copy Markdown
Contributor

@greptile re-review

@fyzanshaik-atlan
Copy link
Copy Markdown
Contributor

@claude /review

@fyzanshaik-atlan
Copy link
Copy Markdown
Contributor

@claude /review

@claude
Copy link
Copy Markdown

claude Bot commented Feb 20, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@Aryamanz29
Copy link
Copy Markdown
Member Author

@claude /review

@claude
Copy link
Copy Markdown

claude Bot commented Feb 22, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@fyzanshaik-atlan
Copy link
Copy Markdown
Contributor

Code Review

This PR migrates the pyatlan SDK's internal model layer from Pydantic BaseModel to msgspec.Struct, introducing a parallel pyatlan_v9 package with its own client, model, and test trees. The approach runs both model systems side-by-side, bridging them via a custom _is_model_instance helper and a registered Pydantic encoder for msgspec structs. The change is marked as breaking and covers ~990 files.

Confidence Score: 2/5

  • This PR is a draft and not yet intended for merge — reviewed in that context
  • Checked for correctness bugs, security, standards compliance, CI integration, and packaging
  • Score deducted for two infrastructure-level blockers: missing runtime dependency declaration and absent CI coverage for the new test suite, plus a confirmed logic bug in the audit model
Important Files Changed
File Change Risk
pyproject.toml Modified High — runtime dependency missing
.github/workflows/pyatlan-pr.yaml Modified High — v9 tests not wired
pyatlan/validate.py Added Medium — cross-boundary type matching
pyatlan_v9/client/atlan.py Added High — core client entry point
pyatlan_v9/model/audit.py Added Medium — logic bug in sort fallback
pyatlan_v9/model/typedef.py Added Medium — module-level mutable state
pyatlan_v9/client/transport.py Added Medium — custom HTTP retry transport
tests_v9/unit/ Added Medium — not run in CI

Change Flow

sequenceDiagram
    participant User as User Code
    participant Client as AtlanClient (v9)
    participant SubClient as Sub-client (e.g. AssetClient)
    participant Validate as validate_arguments
    participant Bridge as _is_model_instance
    participant Msgspec as msgspec.Struct model
    participant HTTP as httpx + PyatlanTransport
    participant Legacy as Legacy Pydantic layer

    User->>Client: AtlanClient(base_url, api_key)
    Client->>Client: __post_init__ (env fallbacks, session init)
    User->>SubClient: client.assets.get_by_guid(guid)
    SubClient->>Validate: @validate_arguments checks type
    Validate->>Bridge: _is_model_instance(value, expected_type)
    Bridge-->>Validate: match by MRO name
    SubClient->>HTTP: _call_api(endpoint, request_obj)
    HTTP->>Legacy: deserialize response (Pydantic)
    Legacy-->>SubClient: legacy Asset object
    SubClient->>Msgspec: (v9 path) msgspec.convert(raw, TargetStruct)
    Msgspec-->>User: typed v9 model instance
Loading

Findings

# Severity File Issue
1 Critical pyproject.toml:29-41 msgspec is not declared as a runtime dependency. Any installation of the package will produce ModuleNotFoundError when importing pyatlan_v9.
2 Critical .github/workflows/pyatlan-pr.yaml:117 The CI workflow only runs pytest tests/unit (legacy). The tests_v9/unit suite — 1,852 tests — is never executed on PRs. Regressions in v9 code will not be caught.
3 Warning pyatlan_v9/model/audit.py:92,117,149 Logic bug in sort fallback: sort=sort if LATEST_FIRST else []. LATEST_FIRST is a non-empty list and is always truthy, so the condition always takes the left branch. The empty-list fallback is dead code. Intended expression is likely sort if sort else LATEST_FIRST.
4 Warning .github/workflows/pyatlan-pr.yaml:109-110 QA checks (ruff-format, ruff-lint, mypy) are commented out. One of the commits is explicitly named [temp/ci] Skip QA checks. These must be restored before merge.
5 Warning pyatlan_v9/model/typedef.py:448,557,675 _OPTIONS_PARENT_MAP: dict[int, Any] = {} is a module-level mutable dict keyed by id(self). Object IDs are reused after garbage collection, and there is no cleanup path, causing unbounded growth in long-running processes. This is also not thread-safe.
6 Warning pyatlan/validate.py:91,110 _is_model_instance generic fallback matches any class whose MRO contains a name equal to expected_type.__name__. A user-defined class named Asset or Query will pass validation for a pyatlan.model.assets.core.Asset expected type, silently accepting the wrong object.

Comment thread pyatlan_v9/model/audit.py
Comment thread pyatlan_v9/validate.py
Comment thread .github/workflows/pyatlan-pr.yaml Outdated
Comment thread .github/workflows/pyatlan-pr.yaml
Comment thread pyatlan_v9/model/typedef.py
Comment thread pyatlan_v9/__init__.py
@Aryamanz29 Aryamanz29 force-pushed the BLDX-434 branch 4 times, most recently from d6a8ffc to ec57960 Compare March 5, 2026 09:30
- Remove dead code: admin/, checkpoint.py, exceptions.py, py.typed
- Delete old single-file client.py (replaced by client/ package later)
- Rename models/ → model/assets/ for consistency with legacy pyatlan/model/assets/
- Move infrastructure files (conversion_utils.py, serde.py, transform.py) to model/
- Add model/__init__.py for package-level re-exports
- Update all import paths (pyatlan_v9.models → pyatlan_v9.model.assets)
- Update all model test imports to match new paths
Migrate all legacy pyatlan model files (AtlanObject/Pydantic BaseModel)
to pyatlan_v9 msgspec.Struct equivalents:

Infrastructure:
- core.py: AtlanObject base, AtlanTag, AtlanField helpers
- structs.py: SourceTagAttachment, BadgeCondition, etc.
- translators.py/retranslators.py: Tag name translation pipeline

Models (28 new files):
- search.py: DSL, IndexSearchRequest, Query types
- typedef.py: EnumDef, StructDef, AtlanTagDef, CustomMetadataDef, etc.
- lineage.py: LineageListRequest, FluentLineage, LineageResponse, etc.
- audit.py, search_log.py: AuditSearchRequest, SearchLogRequest
- response.py: AssetMutationResponse, AssetResponse
- group.py, user.py, role.py: GroupRequest, AtlanUser, AtlanRole
- credential.py, oauth_client.py, sso.py, api_tokens.py
- events.py, keycloak_events.py: AtlanEvent, KeycloakEvent
- query.py, task.py, workflow.py, suggestions.py
- aggregation.py, atlan_image.py, contract.py, custom_metadata.py
- data_mesh.py, dq_rule_conditions.py, file.py, internal.py, lineage_ref.py

Assets:
- purpose.py: Purpose model with tag translation support
- snowflake_dynamic_table.py: SnowflakeDynamicTable model

All models use msgspec conventions: kw_only=True, UNSET/UnsetType,
rename='camel' where needed, and proper serialization methods.
Aryamanz29 and others added 24 commits March 5, 2026 19:06
…, add S3Object.create_with_prefix alias

- Add _normalize_camel_key to handle API keys with uppercase abbreviations
  (e.g., apiPathRawURI→apiPathRawUri, dataProductAssetsDSL→dataProductAssetsDsl)
- Fix announcement_type UNSET assertions in 5 asset test files
  (data_studio, gcs, preset, superset, api) for partial update responses
- Add S3Object.create_with_prefix as alias for creator_with_prefix

Made-with: Cursor
…ests

Change `is None` to `not` for immediate_downstream and immediate_upstream
checks to handle msgspec UNSET values in v9 lineage test assertions.

Made-with: Cursor
…UNSET assertions, Anaplan creator params

- Add explicit msgspec.field(name=...) mappings for QuickSight model fields
  (quick_sight_folder_type, quick_sight_dataset_import_mode, quick_sight_dataset_field_type)
- Fix retranslators.py NoneType iteration when classification_names is None
- Update S3 and test_client assertions to handle UNSET vs None for optional fields
- Normalize Anaplan creator connection_qualified_name parameter signatures
- Fix glossary _assert_relationship for v9 flat model structure

Made-with: Cursor
…ration tests

Systematically convert `assert field is None` to `assert not field` for
msgspec model attributes that return UNSET instead of None when absent.

Changes across 21 test files:
- test_client.py: certificate_status, anchor, classification_names, views
- test_sql_assets.py: mutated_entities.CREATE, trim_to_required checks
- test_index_search.py: aggregations, nested_results, timestamp coercion
- custom_metadata_test.py: RACI, IPR, DQ empty attribute checks
- glossary_test.py: mutated_entities checks
- document_db_asset_test.py: trim_to_required checks
- s3_asset_test.py: aws_arn, s3_object_key absent fields
- suggestions_test.py: owner_groups, meanings, description
- test_workflow_client.py: credential extras/level/metadata
- All aio/ async counterparts with same patterns
- Convert integer timestamps to datetime for Pydantic validation

Made-with: Cursor
…on, glossary UDR

- Change certificate_status/message defaults from None to UNSET in Asset
  so setting them to None is properly serialized (not omitted by omit_defaults)
- Simplify PopularityInsights.record_last_timestamp type to Union[int, None]
  to fix msgspec type union restriction while keeping __post_init__ conversion
- Construct PopularityInsights via kwargs (not post-init mutation) so
  datetime→epoch conversion runs during construction
- Add msgspec.convert in test_sql_assets verify_popularity to handle dict
  responses from API for source_read_recent_user_record_list
- Fix glossary _assert_relationship to check relationship_attributes dict

Made-with: Cursor
…, Entity UNSET defaults

- Procedure.updater(): make definition optional (default '') so updater(qualified_name, name) works
- Entity: set classification_names, meanings, labels, pending_tasks defaults to UNSET so workflow
  package serialization omits them (fixes test_packages AssertionError on connection payload)
- Audit search: audit_search_paging.json entity audit keys to camelCase (entityQualifiedName,
  typeName, entityId) for EntityAudit decode; assert helpers use first['entityId'] etc.

Made-with: Cursor
tests/unit and tests/unit/aio assert on audit_search_paging.json which was
updated to camelCase (entityQualifiedName, typeName, entityId); update
_assert_audit_search_results to read first['entityId'] etc.

Made-with: Cursor
…validate_arguments

Delete the custom validate_arguments wrapper and _is_model_instance utility.
All legacy client, model, and UI code now uses pydantic.v1's validate_arguments
directly. Also removes the msgspec cross-type encoder registration from model/core.py
and _is_msgspec_cross_type_match from utils.py, and updates the batch __track method
to support v9 models via getattr type_name check.

Made-with: Cursor
…essages

Pydantic v1's validate_arguments produces different error messages than
the removed custom validator (e.g. "value is not a valid dict" instead of
"instance of X expected"). Update all assertion strings and change exact
equality checks to substring checks where pydantic appends type info.

Made-with: Cursor
Adds validate_arguments decorators to async SSO client, fixes SSO model
fields, improves asset/batch client v9 compatibility, and various model
enhancements (badge, entity, persona, schema, etc.). Includes v9 validate
module and pkg utilities.

Made-with: Cursor
Fix SSO test missing group_map_name param, task client assertion, batch
test mock patches. Add v9-specific validation constants with custom
validator error messages (separate from pydantic messages in legacy).

Made-with: Cursor
Remove legacy client usage from v9 integration tests, fix SSO test
assertions, update async conftest and utilities for v9 client.

Made-with: Cursor
…l qa-checks (ruff format, ruff check, mypy)

Made-with: Cursor
Legacy tests under tests/ are kept in sync with main branch.
V9-specific tests live under tests_v9/.

Made-with: Cursor
…scriptors and overlay system

- Regenerate 557 asset files from Pkl type definitions with field descriptors
  (ClassVar placeholders + deferred KeywordField/BooleanField/NumericField init)
- Add 82 overlay files (_overlays/) with custom methods (creator, updater, policies)
- Add /generate-v9-models Claude Code skill for model regeneration from models repo
- Add pyatlan_v9 model generation section to README
- Post-sync patches: set[str] fields in asset.py, relationship_attributes in related_entity.py,
  Process.Attributes backward-compat alias

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Regenerate all asset models using updated Pkl generator:
  - set[str] fields now generated natively (no post-sync sed patch)
  - validate/minimize/relate methods removed (dead code)
  - relationship_attributes field in RelatedEntity base class
- Move overlay files to SDK repo (_overlays/ directory)
- Fix overlay imports: ai_model.py, collection.py, connection.py
- Add ruff exclusion for _overlays/ and F821 per-file-ignores
- Update generate-v9-models skill: sdkOnly mode, temp staging,
  ruff auto-fix+format step, removed asset.py sed patch
- Update process_test.py to use Process.generate_qualified_name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Updated 60 existing asset files with new fields/changes
- Added 5 new Snowflake Semantic files (dimension, fact, logical_table, metric, view)
- Ruff-cleaned 339 unused import errors

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…notations

All struct field annotations now use Union[X, None, UnsetType] instead of
X | None | UnsetType, and List/Dict/Set from typing instead of builtin
subscripting. Verified passing 6376 tests on both Python 3.9 and 3.11.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member Author

@Aryamanz29 Aryamanz29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - lets goooooooooo 🚀

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aryamanz29 Aryamanz29 marked this pull request as ready for review March 5, 2026 13:53
@Aryamanz29 Aryamanz29 merged commit 60ca25d into main Mar 5, 2026
78 of 85 checks passed
@Aryamanz29 Aryamanz29 deleted the BLDX-434 branch March 5, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change change Pyatlan change pull request dependencies Pull requests that update a dependency file feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants