Refactor classification terminology to traits across taxonomy system by jfrench9 · Pull Request #641 · RoboFinSystems/robosystems

jfrench9 · 2026-04-25T23:07:33Z

Summary

This PR introduces a fundamental terminology refactor across the taxonomy and classification system, renaming the concept of "classifications" to "traits" when associated with elements. This aligns the domain model more accurately with the intended semantics — elements possess traits rather than being assigned to classifications — and introduces a new Trait model as a first-class entity distinct from the legacy Classification model.

Key Changes

New Models & Renamed Entities

Added robosystems/models/extensions/trait.py — New Trait model replacing the classification concept for element-level attributes
Added robosystems/models/extensions/element_trait.py — New junction model replacing ElementClassification to associate elements with traits
Deleted robosystems/models/extensions/element_classification.py — Superseded by element_trait.py
Modified robosystems/models/extensions/classification.py — Refactored to coexist with the new Trait model; likely scoped to non-element contexts (e.g., association classifications)

Database Migration

Updated 0002_taxonomy_library.py migration with significant schema changes (~214 lines changed) to reflect the new trait-based table structures and relationships

API Layer Updates

Updated API models across extensions/ (accounts, reports, taxonomies, trial_balance, account_rollups) to use trait terminology
Updated information_block.py, library.py, and taxonomy_block.py API models
Updated GraphQL types and resolvers for ledger, library, and taxonomy_block to expose traits instead of classifications

Operations & Business Logic

Refactored all taxonomy block operations: chart_of_accounts, library_creator, cascade, custom_ontology, reporting_extension, reporting_standard, update_apply, update_validator, validators
Updated roboledger reads (accounts, reports, taxonomies, trial_balance, account_rollups) and commands (elements)
Updated materialization logic in extensions/materialize.py
Updated library read operations and element queries
Refactored the AI mapping agent and its prompt to use trait terminology

Taxonomy Seeds & Loaders

Updated all JSONLD taxonomy seed files (FAC, rs-gaap-to-metamodel, us-gaap-metamodel) to reflect trait-based schema
Updated jsonld_loader.py and taxonomy/model.py to parse and represent traits
Updated tenant_writer.py for trait-aware persistence

Schema & Infrastructure

Updated robosystems/schemas/base.py with trait-related schema definitions
Updated MCP taxonomy tools for trait-based queries

Breaking Changes

⚠️ This is a breaking change across multiple layers:

Database schema: The migration renames/restructures tables. Existing deployments will require migration execution. Data in element_classification tables must be migrated to the new element_trait structure.
API contracts: GraphQL types and API response models now use trait/traits instead of classification/classifications for element-level attributes. All API consumers must be updated.
Taxonomy seed files: JSONLD seeds have been updated; any external tooling consuming these formats must adapt.
Agent prompts: The mapping agent prompt now references traits, which may affect LLM behavior for in-flight or cached agent sessions.

Testing Notes

Updated tests across 8 test files covering:
- Arelle pipeline tests
- GraphQL taxonomy block tests
- MCP taxonomy tools tests
- Materialization operation tests
- Roboledger element command tests
- Custom ontology, library creator, and validator operation tests
- Classification migration shape tests (renamed/refactored for traits)
Verify that the database migration applies cleanly on both fresh and existing databases
Validate end-to-end taxonomy creation, element querying, and report generation flows with the new trait terminology

Infrastructure Considerations

The database migration must be run before deploying the application code; coordinate deployment order carefully
Any downstream services or clients consuming the GraphQL API or reading taxonomy data will need corresponding updates
Cached taxonomy data or materialized views referencing the old classification columns will need to be refreshed post-migration

🤖 Generated with Claude Code

Branch Info:

Source: feature/element-trait-breakout
Target: main
Type: feature

Co-Authored-By: Claude noreply@anthropic.com

- Renamed classification-related variables and methods to use "trait" terminology for consistency. - Updated database schema references from classifications to traits, including relevant SQL queries. - Adjusted test cases to reflect the new trait terminology, ensuring all references to classifications are replaced with traits. - Enhanced the CopyStats class to include traits and element_traits counts. - Ensured all relevant documentation and comments are updated to reflect the changes in terminology.

jfrench9 · 2026-04-25T23:07:34Z

@claude please review this PR

claude · 2026-04-25T23:07:59Z

Claude finished @jfrench9's task in 2m 56s —— View job

PR Review: Refactor classification terminology to traits

Gather context and understand request
Read new model files (trait.py, element_trait.py)
Read database migration changes
Read key operation files (library_creator, tenant_writer, materialize)
Read GraphQL types/resolvers
Read test files
Compile and post review feedback

Summary

This is a well-structured, large-scale terminology refactor that cleanly separates the semantics of "traits" (element-level FASB metamodel vocabulary) from "classifications" (association-level structural patterns). The architecture is sound. I found one confirmed bug, two correctness issues, several stale docstrings, and one incomplete feature worth calling out.

🐛 Bug: `_update_efs_classification` wipes all traits, not just EFS

File: robosystems/operations/taxonomy_block/chart_of_accounts.py:119-122

def _update_efs_classification(session, element_id, identifier):
    session.execute(
        ElementTrait.__table__.delete().where(ElementTrait.element_id == element_id)
        # ❌ This deletes ALL trait categories for the element
    )
    _assign_efs_classification(session, element_id, identifier)

The delete has no category filter, so updating an element's EFS trait will also destroy any other category assignments (liquidity, activityType, etc.) that may have been applied to the same element. The fix should filter on the joined Trait.category:

session.execute(
    ElementTrait.__table__
    .delete()
    .where(
        ElementTrait.element_id == element_id,
        ElementTrait.trait_id.in_(
            select(Trait.id).where(
                Trait.category == "elementsOfFinancialStatements"
            )
        ),
    )
)

Fix this →

⚠️ Bug: `_RULE_COLS` missing `target_taxonomy_id`

File: robosystems/taxonomy/writers/tenant_writer.py:77-82

_RULE_COLS = (
    "id, taxonomy_id, rule_category, rule_pattern, rule_expression, "
    "rule_message, rule_severity, rule_origin, target_kind, "
    "target_structure_id, target_element_id, target_association_id, "
    # ❌ Missing: target_taxonomy_id
    "rule_variables, metadata, created_at, updated_at, created_by"
)

The target_taxonomy_id column exists in both the Rule model (rule.py:131) and the migration DDL (0002_taxonomy_library.py:1137), but is absent from the copy column list. Any rule targeting a taxonomy (target_kind='taxonomy') will silently lose its target_taxonomy_id when copied from public to tenant schemas, breaking the polymorphism constraint for that row.

Fix this →

⚠️ Issue: Downgrade doesn't drop tenant library tables

File: migrations/extensions/versions/0002_taxonomy_library.py:1297-1311

_teardown_tenant() in downgrade() removes triggers, narrows CHECK constraints, and drops the Phase d/theta columns — but never drops the 6 tables created by _create_tenant_library_tables during upgrade (element_labels, element_references, traits, element_traits, classifications, association_classifications). After migrate-down, tenant schemas will have orphaned tables that 0001 doesn't know about.

⚠️ Issue: `create_library_arcs` has dead classification_assignment code

File: robosystems/operations/taxonomy_block/library_creator.py:360-367

counts: dict[str, int] = {
    ...
    "classification_assignments": 0,      # ← always stays 0
    "classification_assignments_skipped": 0,  # ← always stays 0
}

TaxonomyPackage has no classification_assignments field, jsonld_loader.py never populates one, and there is no corresponding ClassificationAssignmentSpec. The AssociationClassification junction exists in the schema but there's currently no seed-time loader path for it. These dead counters are harmless but misleading — and the phase-2 log message in the migration will always emit classification_assignments=0 regardless of what the seeds contain.

If AssociationClassification loading is intentionally deferred, a comment here would prevent future confusion.

📝 Stale docstrings

Location	Issue
`library_creator.py:126-128`	Docstring says "Classifications per package" — now Traits.
`library_creator.py:355`	Docstring says "Insert Associations + ClassificationAssignments" — now TraitAssignments.
`0002_taxonomy_library.py:797-800`	Comment says "element_classifications junction rows" — should be "element_traits".

✅ What looks good

Model design is clean. Trait (element-side, 25 categories) and Classification (association-side, 3 categories) are clearly documented and have correct CHECK constraints that mirror each other.
ElementTrait composite PK with a partial index on (element_id, is_primary) WHERE is_primary=true is the right index for EFS lookups.
efs_trait_by_element is the single source of truth for EFS lookups (elements.py:94) — this is the right abstraction and eliminates the N+1 pattern.
_bulk_resolve_element_ids avoids the O(N) round-trip problem for large packages — good performance hygiene.
Deterministic UUID5 key derivation ensures idempotent re-seeding.
Immutability triggers cover element_traits and traits in addition to structural tables — library-seeded trait data is correctly protected.
Migration downgrade() correctly handles public schema — FK-safe delete order (rules → junction → vocabulary → elements), then table drops.
Test coverage is updated across all affected files; the test_classifications_migration_shape.py static shape tests are a good pattern for this kind of plumbing.

…elated functions

…bases

…oject dependencies

jfrench9 added 4 commits April 25, 2026 18:16

feat: Update terminology from classifications to traits and enhance r…

78ffe49

…elated functions

Refactor code structure for improved readability and maintainability

28867cd

feat: Enhance S3 bucket lifecycle policies for graph backups and data…

04867cf

…bases

fix: Update robosystems-client version and source configuration in pr…

2e73165

…oject dependencies

jfrench9 merged commit 8cfa8eb into main Apr 26, 2026
7 checks passed

jfrench9 deleted the feature/element-trait-breakout branch April 26, 2026 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor classification terminology to traits across taxonomy system#641

Refactor classification terminology to traits across taxonomy system#641
jfrench9 merged 5 commits intomainfrom
feature/element-trait-breakout

jfrench9 commented Apr 25, 2026

Uh oh!

jfrench9 commented Apr 25, 2026

Uh oh!

claude Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jfrench9 commented Apr 25, 2026

Summary

Key Changes

New Models & Renamed Entities

Database Migration

API Layer Updates

Operations & Business Logic

Taxonomy Seeds & Loaders

Schema & Infrastructure

Breaking Changes

Testing Notes

Infrastructure Considerations

Uh oh!

jfrench9 commented Apr 25, 2026

Uh oh!

claude Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Refactor classification terminology to traits

Summary

🐛 Bug: _update_efs_classification wipes all traits, not just EFS

⚠️ Bug: _RULE_COLS missing target_taxonomy_id

⚠️ Issue: Downgrade doesn't drop tenant library tables

⚠️ Issue: create_library_arcs has dead classification_assignment code

📝 Stale docstrings

✅ What looks good

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 25, 2026 •

edited

Loading

🐛 Bug: `_update_efs_classification` wipes all traits, not just EFS

⚠️ Bug: `_RULE_COLS` missing `target_taxonomy_id`

⚠️ Issue: `create_library_arcs` has dead classification_assignment code