Skip to content

Refactor classification terminology to traits across taxonomy system#641

Merged
jfrench9 merged 5 commits intomainfrom
feature/element-trait-breakout
Apr 26, 2026
Merged

Refactor classification terminology to traits across taxonomy system#641
jfrench9 merged 5 commits intomainfrom
feature/element-trait-breakout

Conversation

@jfrench9
Copy link
Copy Markdown
Member

Summary

This PR introduces a fundamental terminology refactor across the taxonomy and classification system, renaming the concept of "classifications" to "traits" when associated with elements. This aligns the domain model more accurately with the intended semantics — elements possess traits rather than being assigned to classifications — and introduces a new Trait model as a first-class entity distinct from the legacy Classification model.

Key Changes

New Models & Renamed Entities

  • Added robosystems/models/extensions/trait.py — New Trait model replacing the classification concept for element-level attributes
  • Added robosystems/models/extensions/element_trait.py — New junction model replacing ElementClassification to associate elements with traits
  • Deleted robosystems/models/extensions/element_classification.py — Superseded by element_trait.py
  • Modified robosystems/models/extensions/classification.py — Refactored to coexist with the new Trait model; likely scoped to non-element contexts (e.g., association classifications)

Database Migration

  • Updated 0002_taxonomy_library.py migration with significant schema changes (~214 lines changed) to reflect the new trait-based table structures and relationships

API Layer Updates

  • Updated API models across extensions/ (accounts, reports, taxonomies, trial_balance, account_rollups) to use trait terminology
  • Updated information_block.py, library.py, and taxonomy_block.py API models
  • Updated GraphQL types and resolvers for ledger, library, and taxonomy_block to expose traits instead of classifications

Operations & Business Logic

  • Refactored all taxonomy block operations: chart_of_accounts, library_creator, cascade, custom_ontology, reporting_extension, reporting_standard, update_apply, update_validator, validators
  • Updated roboledger reads (accounts, reports, taxonomies, trial_balance, account_rollups) and commands (elements)
  • Updated materialization logic in extensions/materialize.py
  • Updated library read operations and element queries
  • Refactored the AI mapping agent and its prompt to use trait terminology

Taxonomy Seeds & Loaders

  • Updated all JSONLD taxonomy seed files (FAC, rs-gaap-to-metamodel, us-gaap-metamodel) to reflect trait-based schema
  • Updated jsonld_loader.py and taxonomy/model.py to parse and represent traits
  • Updated tenant_writer.py for trait-aware persistence

Schema & Infrastructure

  • Updated robosystems/schemas/base.py with trait-related schema definitions
  • Updated MCP taxonomy tools for trait-based queries

Breaking Changes

⚠️ This is a breaking change across multiple layers:

  • Database schema: The migration renames/restructures tables. Existing deployments will require migration execution. Data in element_classification tables must be migrated to the new element_trait structure.
  • API contracts: GraphQL types and API response models now use trait/traits instead of classification/classifications for element-level attributes. All API consumers must be updated.
  • Taxonomy seed files: JSONLD seeds have been updated; any external tooling consuming these formats must adapt.
  • Agent prompts: The mapping agent prompt now references traits, which may affect LLM behavior for in-flight or cached agent sessions.

Testing Notes

  • Updated tests across 8 test files covering:
    • Arelle pipeline tests
    • GraphQL taxonomy block tests
    • MCP taxonomy tools tests
    • Materialization operation tests
    • Roboledger element command tests
    • Custom ontology, library creator, and validator operation tests
    • Classification migration shape tests (renamed/refactored for traits)
  • Verify that the database migration applies cleanly on both fresh and existing databases
  • Validate end-to-end taxonomy creation, element querying, and report generation flows with the new trait terminology

Infrastructure Considerations

  • The database migration must be run before deploying the application code; coordinate deployment order carefully
  • Any downstream services or clients consuming the GraphQL API or reading taxonomy data will need corresponding updates
  • Cached taxonomy data or materialized views referencing the old classification columns will need to be refreshed post-migration

🤖 Generated with Claude Code

Branch Info:

  • Source: feature/element-trait-breakout
  • Target: main
  • Type: feature

Co-Authored-By: Claude noreply@anthropic.com

- Renamed classification-related variables and methods to use "trait" terminology for consistency.
- Updated database schema references from classifications to traits, including relevant SQL queries.
- Adjusted test cases to reflect the new trait terminology, ensuring all references to classifications are replaced with traits.
- Enhanced the CopyStats class to include traits and element_traits counts.
- Ensured all relevant documentation and comments are updated to reflect the changes in terminology.
@jfrench9
Copy link
Copy Markdown
Member Author

@claude please review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Apr 25, 2026

Claude finished @jfrench9's task in 2m 56s —— View job


PR Review: Refactor classification terminology to traits

  • Gather context and understand request
  • Read new model files (trait.py, element_trait.py)
  • Read database migration changes
  • Read key operation files (library_creator, tenant_writer, materialize)
  • Read GraphQL types/resolvers
  • Read test files
  • Compile and post review feedback

Summary

This is a well-structured, large-scale terminology refactor that cleanly separates the semantics of "traits" (element-level FASB metamodel vocabulary) from "classifications" (association-level structural patterns). The architecture is sound. I found one confirmed bug, two correctness issues, several stale docstrings, and one incomplete feature worth calling out.


🐛 Bug: _update_efs_classification wipes all traits, not just EFS

File: robosystems/operations/taxonomy_block/chart_of_accounts.py:119-122

def _update_efs_classification(session, element_id, identifier):
    session.execute(
        ElementTrait.__table__.delete().where(ElementTrait.element_id == element_id)
        # ❌ This deletes ALL trait categories for the element
    )
    _assign_efs_classification(session, element_id, identifier)

The delete has no category filter, so updating an element's EFS trait will also destroy any other category assignments (liquidity, activityType, etc.) that may have been applied to the same element. The fix should filter on the joined Trait.category:

session.execute(
    ElementTrait.__table__
    .delete()
    .where(
        ElementTrait.element_id == element_id,
        ElementTrait.trait_id.in_(
            select(Trait.id).where(
                Trait.category == "elementsOfFinancialStatements"
            )
        ),
    )
)

Fix this →


⚠️ Bug: _RULE_COLS missing target_taxonomy_id

File: robosystems/taxonomy/writers/tenant_writer.py:77-82

_RULE_COLS = (
    "id, taxonomy_id, rule_category, rule_pattern, rule_expression, "
    "rule_message, rule_severity, rule_origin, target_kind, "
    "target_structure_id, target_element_id, target_association_id, "
    # ❌ Missing: target_taxonomy_id
    "rule_variables, metadata, created_at, updated_at, created_by"
)

The target_taxonomy_id column exists in both the Rule model (rule.py:131) and the migration DDL (0002_taxonomy_library.py:1137), but is absent from the copy column list. Any rule targeting a taxonomy (target_kind='taxonomy') will silently lose its target_taxonomy_id when copied from public to tenant schemas, breaking the polymorphism constraint for that row.

Fix this →


⚠️ Issue: Downgrade doesn't drop tenant library tables

File: migrations/extensions/versions/0002_taxonomy_library.py:1297-1311

_teardown_tenant() in downgrade() removes triggers, narrows CHECK constraints, and drops the Phase d/theta columns — but never drops the 6 tables created by _create_tenant_library_tables during upgrade (element_labels, element_references, traits, element_traits, classifications, association_classifications). After migrate-down, tenant schemas will have orphaned tables that 0001 doesn't know about.


⚠️ Issue: create_library_arcs has dead classification_assignment code

File: robosystems/operations/taxonomy_block/library_creator.py:360-367

counts: dict[str, int] = {
    ...
    "classification_assignments": 0,      # ← always stays 0
    "classification_assignments_skipped": 0,  # ← always stays 0
}

TaxonomyPackage has no classification_assignments field, jsonld_loader.py never populates one, and there is no corresponding ClassificationAssignmentSpec. The AssociationClassification junction exists in the schema but there's currently no seed-time loader path for it. These dead counters are harmless but misleading — and the phase-2 log message in the migration will always emit classification_assignments=0 regardless of what the seeds contain.

If AssociationClassification loading is intentionally deferred, a comment here would prevent future confusion.


📝 Stale docstrings

Location Issue
library_creator.py:126-128 Docstring says "Classifications per package" — now Traits.
library_creator.py:355 Docstring says "Insert Associations + ClassificationAssignments" — now TraitAssignments.
0002_taxonomy_library.py:797-800 Comment says "element_classifications junction rows" — should be "element_traits".

✅ What looks good

  • Model design is clean. Trait (element-side, 25 categories) and Classification (association-side, 3 categories) are clearly documented and have correct CHECK constraints that mirror each other.
  • ElementTrait composite PK with a partial index on (element_id, is_primary) WHERE is_primary=true is the right index for EFS lookups.
  • efs_trait_by_element is the single source of truth for EFS lookups (elements.py:94) — this is the right abstraction and eliminates the N+1 pattern.
  • _bulk_resolve_element_ids avoids the O(N) round-trip problem for large packages — good performance hygiene.
  • Deterministic UUID5 key derivation ensures idempotent re-seeding.
  • Immutability triggers cover element_traits and traits in addition to structural tables — library-seeded trait data is correctly protected.
  • Migration downgrade() correctly handles public schema — FK-safe delete order (rules → junction → vocabulary → elements), then table drops.
  • Test coverage is updated across all affected files; the test_classifications_migration_shape.py static shape tests are a good pattern for this kind of plumbing.

@jfrench9 jfrench9 merged commit 8cfa8eb into main Apr 26, 2026
7 checks passed
@jfrench9 jfrench9 deleted the feature/element-trait-breakout branch April 26, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant