Skip to content

Fix SEC historical provisioning schema resolution issues#352

Merged
jfrench9 merged 2 commits intomainfrom
bugfix/fix-sec-historical-provisioning
Feb 13, 2026
Merged

Fix SEC historical provisioning schema resolution issues#352
jfrench9 merged 2 commits intomainfrom
bugfix/fix-sec-historical-provisioning

Conversation

@jfrench9
Copy link
Copy Markdown
Member

Summary

This PR resolves schema resolution and contextual schema loading issues in the SEC historical data provisioning process. The changes enhance the LadybugMaterializer component to properly handle schema resolution during materialization operations.

Key Accomplishments

  • Enhanced schema resolution logic in the SEC ingestion materialization processor
  • Improved contextual schema loading mechanisms to ensure proper data structure validation
  • Added robust error handling and fallback mechanisms for schema resolution failures
  • Strengthened the materialization pipeline's ability to handle diverse SEC data formats

Breaking Changes

None. This is a backward-compatible enhancement that improves existing functionality without altering public APIs.

Testing Notes

  • Verify that SEC historical data provisioning completes successfully without schema-related errors
  • Test materialization process with various SEC data formats to ensure proper schema resolution
  • Validate that existing materialization workflows continue to function as expected
  • Monitor for any performance impacts during schema loading operations

Infrastructure Considerations

  • The enhanced schema resolution may result in slightly increased memory usage during materialization
  • Consider monitoring materialization job execution times to ensure performance remains within acceptable bounds
  • No additional infrastructure dependencies or configuration changes required

🤖 Generated with Claude Code

Branch Info:

  • Source: bugfix/fix-sec-historical-provisioning
  • Target: main
  • Type: bugfix

Co-Authored-By: Claude noreply@anthropic.com

…oading

- Added logic to resolve schema from parent graph for subgraphs when no active schema is found.
- Implemented fallback to load contextual schema if no schema DDL is available, improving handling of repository names.
- Updated database creation parameters to include schema type and custom schema DDL based on new logic.
- Enhanced logging for better traceability during schema loading and database recreation.
@jfrench9
Copy link
Copy Markdown
Member Author

@claude please review this PR

@claude
Copy link
Copy Markdown

claude bot commented Feb 13, 2026

Claude finished @jfrench9's task in 1m 22s —— View job


PR Review

Review Tasks:

  • Read the modified file
  • Analyze the code changes
  • Provide comprehensive review feedback

Summary

This PR enhances the _rebuild_ladybug_database method in the LadybugMaterializer to properly handle schema resolution when the SEC repository database is being rebuilt. The changes add a multi-tier fallback mechanism for schema resolution that handles both regular graphs and subgraphs.

Changes Overview

The PR modifies robosystems/adapters/sec/processors/ingestion/materialization.py:668-770 by adding:

  1. Subgraph-aware schema resolution - Falls back to parent schema for subgraphs
  2. Contextual schema loader fallback - Uses get_contextual_schema_loader when database schema is not available
  3. Enhanced error handling - Better handling of missing schemas with clear logging

Code Quality: ✅ Good

Strengths:

  • Clear separation of concerns with well-documented fallback logic
  • Proper logging at each fallback stage
  • Consistent with existing patterns in the codebase (similar logic likely exists in other graph creation paths)
  • No breaking changes to public APIs

Code Style:

  • Follows existing code formatting standards
  • Variable naming is clear and descriptive
  • Comments explain the "why" appropriately

Issues & Concerns

1. Import Placement (Minor)

The new imports are added inside the function:

from robosystems.middleware.graph.utils.subgraph import (
  is_subgraph,
  parse_subgraph_id,
)

This is consistent with the existing pattern in this function (other imports are also inside the function), so it's acceptable. However, moving frequently-used imports to module level could provide a small performance benefit.

2. Potential None Access (Medium Severity)

At line 724, there's a potential None access issue:

schema_ddl = schema.schema_ddl if schema else None

Then later at line 754:

"schema_type": schema_type if not schema else schema.schema_type,

If schema is not None but schema.schema_ddl is None or empty, we fall back to contextual loader (good). However, at line 754, if schema exists but schema.schema_type is None, this could fail. Consider adding a null check:

"schema_type": (schema.schema_type if schema and schema.schema_type else schema_type),

3. Redundant Variable Assignment (Minor)

The subgraph_info variable is parsed twice (lines 714 and 732). Consider caching it:

subgraph_info = parse_subgraph_id(self.graph_id)

schema = GraphSchema.get_active_schema(self.graph_id, db)
if not schema and subgraph_info:
  schema = GraphSchema.get_active_schema(subgraph_info.parent_graph_id, db)
  # ... rest of logic

# Later, reuse the same variable instead of re-parsing
repo_name = subgraph_info.parent_graph_id if subgraph_info else self.graph_id

4. Missing Test Coverage (Critical)

The PR description mentions "Testing Notes" but doesn't reference actual test additions. For changes this significant to the schema resolution logic, I'd expect:

  • Unit tests for the schema resolution fallback chain
  • Tests for subgraph schema inheritance
  • Tests for the contextual schema loader fallback
  • Integration tests verifying the rebuild works for both regular graphs and subgraphs

Recommendation: Add tests before merging. Fix this →

5. Logging Clarity (Minor)

At line 747, the log message could be more specific:

logger.info(
  f"Loaded contextual schema for {self.graph_id} "
  f"({len(loader.nodes)} nodes, {len(loader.relationships)} relationships)"
)

Consider mentioning this is a fallback path:

logger.info(
  f"No database schema found - falling back to contextual schema loader for {self.graph_id} "
  f"({len(loader.nodes)} nodes, {len(loader.relationships)} relationships)"
)

6. Repository Name Resolution (Question)

At lines 732-733:

subgraph_info = parse_subgraph_id(self.graph_id)
repo_name = subgraph_info.parent_graph_id if subgraph_info else self.graph_id

Is it guaranteed that self.graph_id is a valid repository name when it's not a subgraph? For the SEC use case, the materializer is initialized with graph_id="sec" (line 56 default), so this seems fine. But the comment says "Resolve the repository name" - should there be validation that the resolved name is actually a known repository?

Performance Considerations

The PR description mentions "slightly increased memory usage during materialization" - this seems accurate given:

  1. Additional schema parsing/compilation when fallback is triggered
  2. Temporary storage of the compiled schema object

For the SEC historical provisioning use case, this should be negligible since it's a one-time operation during rebuild.

Security Considerations

No security concerns identified. The changes:

  • Don't introduce user input validation issues
  • Don't expose sensitive data
  • Use existing validated utilities (parse_subgraph_id, get_contextual_schema_loader)

Recommendations

  1. Add test coverage (Critical) - See issue chore: improve documentation and suppress build output #4 above
  2. Add null safety for schema.schema_type access (Medium) - See issue Fix MCP tools query results to include units and improve dimensions #2 above
  3. Cache subgraph_info to avoid redundant parsing (Minor optimization)
  4. Improve logging to clarify when fallback paths are taken (Minor)

Approval Status

Conditional Approval - The logic is sound and solves the stated problem, but I recommend:

  1. Adding test coverage before merging
  2. Addressing the potential None access issue at line 754

The changes are backward-compatible and align with the architectural patterns in the codebase. Once tests are added, this should be good to merge.


| Branch: bugfix/fix-sec-historical-provisioning

- Updated schema resolution to clarify the distinction between parent graphs and contextual loaders.
- Simplified the logic for determining the active schema and improved logging for better traceability.
- Adjusted database creation parameters to ensure correct schema type handling based on the new resolution logic.
@jfrench9 jfrench9 merged commit a43f671 into main Feb 13, 2026
7 checks passed
@jfrench9 jfrench9 deleted the bugfix/fix-sec-historical-provisioning branch February 14, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant