feat: classify JSON export schema field mismatches, fix validator view mappings, duplicate field handling, and array type parsing, and add implementation roadmap by Copilot · Pull Request #8528 · Hack23/cia

Copilot · 2026-04-05T20:31:00Z

Description

The 5 JSON export schemas define 168 fields but only 24 exist in database sample data (144 mismatches). Analysis reveals most "mismatches" are either JSON structural grouping objects (67) or fields derivable from existing DB columns (42) — not true gaps. This PR classifies every field into 4 validator-backed categories (implemented/structural/computed/planned), annotates all schema docs, enhances the validator to report status with fully deterministic output, fixes a critical view-mapping bug, a duplicate field-name handling bug, and an array type parsing bug in the validator, and provides a prioritized implementation roadmap.

A critical bug was discovered and fixed in validate_schemas.py: the ministry schema_to_views mapping used riksdagen_government (English spelling) but the actual CSV files and DB views use riksdagen_goverment (matching the database identifiers). This caused the validator to miss 6 ministry CSV data files entirely. Additional missing view patterns were added for committee (committee_productivity) and ministry (ministry_productivity_matrix, ministry_risk_evolution), increasing implemented fields from 21 to 24.

A second bug was fixed in the mermaid field parser: duplicate field names across different classes/paths (e.g., politician has both +String party and +PartyLink party) would silently overwrite each other. Non-scalar types (PartyLink, Trends, VoteAlignment, etc.) are now tracked using composite keys (e.g., PartyLink:party) and are always classified as STRUCTURAL. A SCALAR_TYPES constant distinguishes scalar field types from non-scalar/link types. The PartyLink party entry was removed from politician-schema-validated.md since CSV sample data validates scalar columns, not nested link objects.

A third bug was fixed in the mermaid field regex: the pattern +(\w+)\s+(\w+) did not match array type forms like String[] or CommitteeLink[] (the [] broke the \w+ match), so array fields in schema diagrams were silently skipped and never classified/validated. The regex now captures optional [] notation, and the extracted base type is normalized (e.g., String[] → String is still scalar, CommitteeLink[] remains non-scalar/STRUCTURAL).

Additionally, meetings was removed from the committee structural fields list in FIELD_MAPPING.md — it was contradictorily listed as both PLANNED and STRUCTURAL, and the validator classifies it as planned. All view/file lists (matched_views, missing_views) are now sorted before writing to validation-results.json for fully deterministic output across runs.

Field classifications distinguish between nested object types (e.g., +Trends trend, +VoteAlignment alignment, +Productivity productivity) which are STRUCTURAL, and leaf metrics (e.g., +Integer committeeChairs, +Float strengthScore) which are COMPUTED or PLANNED. All doc table classifications — including validated schema files, FIELD_MAPPING.md per-schema sections, and summary reports — are fully reconciled with validation-results.json output across 9 rounds of review alignment. All structural JSON grouping fields consistently use ❌ STRUCTURAL labeling (not DEPRECATED) across all documentation sections. Ministry view references use the actual DB view identifiers (view_riksdagen_goverment*) throughout all sections including the Top 25 implementation roadmap table.

Type of Change

Primary Changes

🔄 Enhancement
🐛 Bug Fix

Political Analysis

Technical Changes

📝 Documentation
- Technical Documentation
✅ Testing
- Unit Tests

Impact Analysis

Political Analysis Impact

Impact on data quality: Field-level implementation status now tracked across all 4 political entity schemas with corrected view mappings, accurate non-scalar field tracking, and complete array field parsing
Impact on analysis accuracy: Clear distinction between implemented (24), structural (67), computed (42), and planned (35) fields — all backed by validator output with corrected ministry CSV file matching, composite-key deduplication, and array type handling
Impact on transparency features: Validated schemas now classify fields with accurate counts; ministry schema now correctly matches 8 CSV data files (was 2 due to view-mapping bug); non-scalar link types no longer silently overwrite same-named scalar fields; array fields are no longer silently skipped

Technical Impact

Performance impact: None — documentation and validation tooling only
Security implications: None
Dependency changes: None

Testing

Unit tests added/updated
Political data validation completed

23 tests pass (5 new/updated):

test_structural_fields_defined — verifies STRUCTURAL_FIELDS constant (including trend, alignment)
test_computed_fields_defined — verifies COMPUTED_FIELDS constant (including id, committeeChairs, strengthScore)
test_field_status_in_validation_results — verifies results structure with all 4 categories
test_field_classification_with_data — end-to-end classification with aggregation across structural/computed/planned
test_mismatch_includes_status — verifies STRUCTURAL/COMPUTED/PLANNED status in mismatch entries

Documentation

Package/module documentation updated
Political analysis documentation updated

Key changes:

validate_schemas.py — Fixed critical ministry view-mapping bug (riksdagen_government → riksdagen_goverment matching actual DB spelling). Fixed duplicate field-name handling: non-scalar types (PartyLink, Trends, VoteAlignment, etc.) now use composite keys (Type:name) to prevent overwriting same-named scalar fields. Fixed mermaid regex to match Type[] array notation; base type is normalized for scalar check (String[] → String is scalar, CommitteeLink[] remains non-scalar). Added SCALAR_TYPES constant to distinguish scalar vs non-scalar field types; non-scalar types are always classified as STRUCTURAL. Added missing view patterns: committee_productivity, ministry_productivity_matrix, ministry_risk_evolution. Enhanced with 3-way mismatch classification, fully deterministic output (sorted schema_fields, unmapped_data_columns, matched_views, missing_views), and dynamic file counts:
```
SCALAR_TYPES = {"String", "Integer", "Float", "Long", "Double", "Boolean",
                "Date", "DateTime", "int", "float", "string", "boolean", "date"}
STRUCTURAL_FIELDS = {"attributes", "labels", "relationships", "intelligence",
                     "trend", "alignment", "productivity", "performance",
                     "budget", "predictions", ...}  # 32 entries
COMPUTED_FIELDS = {"fullName", "partyLoyalty", "totalMembers", "performanceScore",
                   "id", "committeeChairs", "strengthScore", ...}  # 31 entries
```
FIELD_MAPPING.md — Every field now has ✅ IMPLEMENTED / 🔀 COMPUTED / 🔄 PLANNED / ❌ STRUCTURAL status. Status legend maps directly to validator output categories. Validation Summary table aligned with validator output (67/42/35). Per-schema sections reconciled with validation-results.json: politician implemented fields include riskScore, riskLevel, attendanceRate; party implemented includes totalVotes; committee implemented includes regularMembers, totalMembers; ministry implemented includes name, id. All structural/computed/planned lists match validator per-schema breakdowns. meetings removed from committee structural list (it's PLANNED per validator). Committee and ministry structural field sections consistently use ❌ STRUCTURAL labeling. Ministry view references corrected to actual DB identifiers (view_riksdagen_goverment*) across all sections including the Top 25 implementation roadmap table. Mismatch reduction strategy table uses 67 structural fields with cascading count adjustments. Validation exclusions count updated to 67 structural fields.
4 schema files (politician-, party-, committee-, ministry-schema.md) — Added 📋 Field Implementation Status section after each mermaid diagram. Nested object types correctly classified as STRUCTURAL. Leaf metrics classified as COMPUTED. Non-scalar link types (e.g., MinistryLink:ministry) reclassified as STRUCTURAL. Array types (e.g., CommitteeMembership[] current/historical, ElectionHistory[] history, RegionalSupport[] regions, CommitteeLink[] committees) classified as STRUCTURAL per validator logic. Politician schema: current/historical reclassified from PLANNED to STRUCTURAL (non-scalar CommitteeMembership[] arrays); field count note updated to match validator's 55 fields_defined. Party schema: subcategories/intelligenceTags reclassified from PLANNED to STRUCTURAL (array types per validator); history/regions reclassified from PLANNED to STRUCTURAL (non-scalar ElectionHistory[]/RegionalSupport[] arrays); field count note updated to match validator's 51 fields_defined. Summary counts in both schemas reconciled with per-schema breakdowns in validation-results.json. Committee schema: regularMembers/totalMembers now IMPLEMENTED (found in view_committee_productivity). Ministry schema: id now IMPLEMENTED (found in view_riksdagen_goverment_proposals).
4 validated schemas (*-validated.md) — Field counts aligned with validation-results.json per-schema breakdowns. PartyLink party removed from politician-schema-validated.md (CSV validates scalars only; scalar String party retained). party-schema-validated.md mermaid diagram updated to include totalVotes. Duplicate ## Migration Notes headings removed.
SCHEMA_VALIDATION_REPORT.md — Uses dynamic files_analyzed count. Total Schema Mismatches corrected to 144 (168 − 24 implemented). Field Status Summary shows full 4-category breakdown with corrected numbers: 24 implemented, 67 structural, 42 computed, 35 planned. Checklist counts updated to match validator output. Mismatch resolution strategy section updated.
VALIDATED_SCHEMAS_SUMMARY.md — Numbers aligned with actual validator output (24 implemented, 67 structural, 42 computed, 35 planned).
.gitignore — Added __pycache__/, *.pyc, and .pytest_cache/ to prevent committing Python build artifacts.

Screenshots

N/A

Related Issues

Related to #893

Checklist

Code follows project coding standards
Comments are clear and helpful
Documentation is updated
Tests are passing
Security compliance is maintained
Performance impact is acceptable
Breaking changes are documented
Changes are backward compatible

Additional Notes

Mismatch breakdown from validator output: 67 structural (JSON grouping objects like attributes, labels, trend, alignment, productivity, budget, predictions, non-scalar link types like PartyLink:party, MinistryLink:ministry, and array link types like CommitteeLink[]:committees, CommitteeMembership[]:current, ElectionHistory[]:history, RegionalSupport[]:regions) + 42 computed (derivable fields like fullName, totalMembers, committeeChairs, strengthScore, established) + 35 planned (fields needing new data sources) = 144 total mismatches (24 fields implemented, 168 total fields). The critical view-mapping fix corrected the ministry schema_to_views mapping from riksdagen_government to riksdagen_goverment (matching the actual DB view spelling in full_schema.sql), which unlocked 6 additional ministry CSV data files. The composite-key fix prevents non-scalar types from silently overwriting same-named scalar fields — non-scalar types are tracked as Type:name keys and always classified as STRUCTURAL. The array regex fix ensures Type[] fields (e.g., String[], CommitteeLink[]) are parsed and classified rather than silently skipped. Committee gained 2 implemented fields (regularMembers, totalMembers from view_committee_productivity) and ministry gained 1 (id from view_riksdagen_goverment_proposals). All documentation tables across validated schemas, FIELD_MAPPING.md per-schema sections, schema field implementation status tables, and summary reports reflect these validator-verified numbers. Per-schema field implementation status tables in politician-schema.md and party-schema.md reconciled: non-scalar array types (CommitteeMembership[], ElectionHistory[], RegionalSupport[]) correctly classified as STRUCTURAL; scalar array types (String[] for subcategories/intelligenceTags) also classified as STRUCTURAL per validator output; field count notes updated to match validator fields_defined totals (55 for politician, 51 for party). All view/file lists are sorted for fully deterministic output across runs. All structural JSON grouping fields consistently use ❌ STRUCTURAL labeling across all documentation sections. Ministry source view references use the actual DB identifiers (view_riksdagen_goverment, view_riksdagen_goverment_role_member) throughout all documentation sections including the Top 25 implementation roadmap table.

Security Considerations

No sensitive data exposed
Security best practices followed
Compliance requirements met

Release Notes

Fixed critical validator view-mapping bug (ministry riksdagen_government → riksdagen_goverment matching actual DB spelling), fixed duplicate field-name handling (non-scalar types now use composite Type:name keys and are classified as STRUCTURAL), fixed mermaid regex to parse Type[] array fields (previously silently skipped), removed contradictory meetings from committee structural list (it's PLANNED per validator), sorted all view/file lists for fully deterministic output, and added missing view patterns (committee_productivity, ministry_productivity_matrix, ministry_risk_evolution). Removed PartyLink party from politician validated schema (CSV validates scalars only). Reconciled all per-schema field implementation status tables with validator output — non-scalar array types (CommitteeMembership[], ElectionHistory[], RegionalSupport[]) and scalar array types (String[]) correctly classified as STRUCTURAL; field count notes aligned with validator fields_defined totals; total mismatches header corrected to 144. Updated validation results: 144 mismatches (168 total fields), 24 implemented, 67 structural, 42 computed, 35 planned. Classified all JSON export schema field mismatches into 4 validator-backed categories, added field status annotations to all schema and validated schema documentation, fully reconciled all per-schema field lists and counts with validation-results.json output across 9 rounds of review alignment, ensured consistent STRUCTURAL labeling across all documentation sections, added .gitignore for Python artifacts, and created a prioritized 25-field implementation roadmap.

github-actions · 2026-04-05T20:31:30Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 1ad51e8.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Agent-Logs-Url: https://github.com/Hack23/cia/sessions/57d6f1f4-def9-4f9c-b234-5b04ee040d5e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-05T21:21:52Z

📊 JSON Schema Validation Results

Validation Summary:

⏰ Generated: 2026-04-05T21:21:51.812855+00:00
📋 Schemas Validated: 5
📁 Sample Files Analyzed: 202
⚠️ Total Mismatches: 121

Schema Status

Schema	Fields	Views Matched	Mismatches	Status
Politician	45	12	31	⚠️ REVIEW
Party	42	17	38	⚠️ REVIEW
Committee	26	10	24	⚠️ REVIEW
Ministry	29	2	28	⚠️ REVIEW
Intelligence	0	5	0	✅ PASS

📄 Full Report: See uploaded artifacts for detailed validation report

ℹ️ Status: Field mismatches are documented and tracked. These represent differences between original projected schemas and actual implemented data.

📄 Data-Validated Schemas: See *-schema-validated.md files for schemas matching actual data (121 mismatches documented).

Add implementation status (IMPLEMENTED/COMPUTED/PLANNED/DEPRECATED) for all 142 fields across 5 schemas based on validation run showing 121 mismatches. Key additions: - Status column with icons for every field mapping table - Validation summary showing 21 implemented, 43 computed, 28 planned, 45 deprecated structural grouping fields - Top 25 High-Value Fields prioritization with effort estimates - Mismatch resolution strategy reducing 121 gaps to 18 true gaps - Per-schema unmapped DB columns with priority recommendations - Implementation roadmap (3 sprints) - Field status promotion lifecycle documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Add a Field Implementation Status section after the mermaid schema diagram in politician-schema.md. The table categorizes all 55 fields from the class diagram into four statuses: - IMPLEMENTED (11): fields available in current database exports - COMPUTED (13): fields derivable from existing data - PLANNED (11): fields requiring new data sources - STRUCTURAL (20): JSON grouping objects Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

…otations, and enhanced validator Agent-Logs-Url: https://github.com/Hack23/cia/sessions/57d6f1f4-def9-4f9c-b234-5b04ee040d5e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-05T21:39:09Z

📊 JSON Schema Validation Results

Validation Summary:

⏰ Generated: 2026-04-05T21:39:09.005242+00:00
📋 Schemas Validated: 5
📁 Sample Files Analyzed: 202
⚠️ Total Mismatches: 121

Schema Status

Schema	Fields	Views Matched	Mismatches	Status
Politician	45	12	31	⚠️ REVIEW
Party	42	17	38	⚠️ REVIEW
Committee	26	10	24	⚠️ REVIEW
Ministry	29	2	28	⚠️ REVIEW
Intelligence	0	5	0	✅ PASS

📄 Full Report: See uploaded artifacts for detailed validation report

ℹ️ Status: Field mismatches are documented and tracked. These represent differences between original projected schemas and actual implemented data.

📄 Data-Validated Schemas: See *-schema-validated.md files for schemas matching actual data (121 mismatches documented).

…rrent analysis - Update version to 2.0.0 / 2026-04-05 - Update Overall Assessment table with per-schema coverage percentages - Add Field Status Summary (21 implemented, 50 structural, 71 planned) - Add Mismatch Resolution Strategy (3-category classification) - Add Top 25 Priority Fields for Implementation - Add Remediation Roadmap (3 sprints with effort/impact tracking) - Update Validation Checklist with current status Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

…eld status tests Agent-Logs-Url: https://github.com/Hack23/cia/sessions/57d6f1f4-def9-4f9c-b234-5b04ee040d5e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-05T21:46:18Z

📊 JSON Schema Validation Results

Validation Summary:

⏰ Generated: 2026-04-05T21:46:16.742709+00:00
📋 Schemas Validated: 5
📁 Sample Files Analyzed: 202
⚠️ Total Mismatches: 121

Schema Status

Schema	Fields	Views Matched	Mismatches	Status
Politician	45	12	31	⚠️ REVIEW
Party	42	17	38	⚠️ REVIEW
Committee	26	10	24	⚠️ REVIEW
Ministry	29	2	28	⚠️ REVIEW
Intelligence	0	5	0	✅ PASS

📄 Full Report: See uploaded artifacts for detailed validation report

ℹ️ Status: Field mismatches are documented and tracked. These represent differences between original projected schemas and actual implemented data.

📄 Data-Validated Schemas: See *-schema-validated.md files for schemas matching actual data (121 mismatches documented).

…d aggregation test Agent-Logs-Url: https://github.com/Hack23/cia/sessions/57d6f1f4-def9-4f9c-b234-5b04ee040d5e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-05T21:48:29Z

📊 JSON Schema Validation Results

Validation Summary:

⏰ Generated: 2026-04-05T21:48:28.196458+00:00
📋 Schemas Validated: 5
📁 Sample Files Analyzed: 202
⚠️ Total Mismatches: 121

Schema Status

Schema	Fields	Views Matched	Mismatches	Status
Politician	45	12	31	⚠️ REVIEW
Party	42	17	38	⚠️ REVIEW
Committee	26	10	24	⚠️ REVIEW
Ministry	29	2	28	⚠️ REVIEW
Intelligence	0	5	0	✅ PASS

📄 Full Report: See uploaded artifacts for detailed validation report

ℹ️ Status: Field mismatches are documented and tracked. These represent differences between original projected schemas and actual implemented data.

📄 Data-Validated Schemas: See *-schema-validated.md files for schemas matching actual data (121 mismatches documented).

Copilot · 2026-04-06T13:12:20Z

@copilot apply changes based on the comments in this thread

Applied all 3 review comments in 8f558ea:

Duplicate field name handling — Non-scalar mermaid types (PartyLink, Trends, VoteAlignment, etc.) now use composite keys (Type:name) to prevent overwriting same-named scalar fields. Added SCALAR_TYPES constant. Non-scalar types are always classified as STRUCTURAL automatically.
PartyLink party in validated schema — Removed PartyLink party from politician-schema-validated.md mermaid diagram and field descriptions (CSV can only validate scalar values). Scalar String party retained with note about the distinction.
__pycache__ cleanup — Confirmed no __pycache__ files are tracked in git. Added .pytest_cache to .gitignore.

Updated counts: 144 total fields, 24 implemented, 56 structural, 42 computed, 22 planned, 120 mismatches. All 23 tests pass. CodeQL clean.

Copilot

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.

Copilot · 2026-04-06T15:50:43Z

json-export-specs/validate_schemas.py

        # Pattern matches: +Type fieldName
+        # Use composite key (Type:fieldName) to avoid duplicate-name overwrites
        mermaid_pattern = r'\+(\w+)\s+(\w+)'
        for match in re.finditer(mermaid_pattern, content):
            field_type = match.group(1)
            field_name = match.group(2)


The mermaid field regex (+(\w+)\s+(\w+)) doesn’t match common type forms like String[] / CommitteeLink[] (the [] breaks the \s+), so array fields in the schema diagrams are silently skipped and never classified/validated. This makes fields_defined/mismatch counts incomplete for several schemas. Update the regex to capture optional [] (and possibly generic forms), and normalize the extracted type so scalar arrays (e.g., String[]) are still treated as scalar while object arrays remain non-scalar.

Copilot · 2026-04-06T15:50:44Z

json-export-specs/FIELD_MAPPING.md

+| `decisions` | Container section for decision data | ❌ **STRUCTURAL** |
+| `intelligence` | Container section for intelligence data | ❌ **STRUCTURAL** |
+| `policy` | Container section for policy data | ❌ **STRUCTURAL** |
+| `meetings` | Container section for meeting data (when structural) | ❌ **STRUCTURAL** |


meetings is listed as a PLANNED data field above, but also appears under “Structural Fields” as a JSON grouping container. This is contradictory and conflicts with the validator output (where meetings is a planned mismatch field). Remove meetings from the structural list (or rename the structural container if you intended a separate grouping object).

Copilot · 2026-04-06T15:50:44Z

json-export-specs/validate_schemas.py

        # Map schema fields to database columns (convert camelCase to snake_case)
-        schema_fields = set(schema_info["fields"].keys())
+        # Sort for deterministic output order
+        schema_fields = sorted(schema_info["fields"].keys())
+
+        # Initialize field status tracking
+        field_status = {
+            "implemented": [],
+            "structural": [],
+            "computed": [],
+            "planned": []
+        }

        # Check if schema fields exist in data (with flexible matching)
        unmapped_schema_fields = []
-        unmapped_data_columns = list(all_columns)
-
-        for field in schema_fields:
-            # Try various naming conventions (using set to avoid duplicates)
-            possible_names = list({
-                field,
-                self._camel_to_snake(field),
-                field.lower(),
-                field.upper()
-            })
+        unmapped_data_columns = sorted(all_columns)


You’ve made field-related output deterministic by sorting schema_fields/unmapped_data_columns, but the overall report/JSON still includes lists like matched_views/missing_views whose ordering depends on CSV load/iteration order. For truly stable output (to avoid noisy diffs across runs), consider sorting those view/file lists before writing them into schema_result and validation-results.json.

Copilot · 2026-04-06T15:50:44Z

json-export-specs/SCHEMA_VALIDATION_REPORT.md

+- [x] Structural fields (53) documented as JSON grouping objects
+- [x] Computable fields (42) mapped to source DB columns
+- [ ] Planned fields (23) tracked with data source requirements


The checklist counts appear out of sync with the validator output in validation-results.json: structural is now 56 (not 53) and planned is 22 (not 23). Update these numbers (and any related text) so the checklist stays consistent with the reported field_status_summary totals.

Suggested change

- [x] Structural fields (53) documented as JSON grouping objects

- [x] Computable fields (42) mapped to source DB columns

- [ ] Planned fields (23) tracked with data source requirements

- [x] Structural fields (56) documented as JSON grouping objects

- [x] Computable fields (42) mapped to source DB columns

- [ ] Planned fields (22) tracked with data source requirements

Copilot · 2026-04-06T15:50:45Z

json-export-specs/FIELD_MAPPING.md

+`validate_schemas.py` currently reports the raw mismatch total and does **not** yet exclude all non-implemented categories from its counts.
+
+The following categories are candidates for exclusion from mismatch counts in a future validator enhancement:
+- All ❌ **STRUCTURAL** JSON grouping fields (53 fields)


The “Validation Exclusions” bullet still says structural = 53 fields, but the validator summary in validation-results.json (and this PR’s docs elsewhere) reports 56 structural fields. Update this count (and verify planned/computed counts too) to keep the mismatch math consistent.

Suggested change

- All ❌ **STRUCTURAL** JSON grouping fields (53 fields)

- All ❌ **STRUCTURAL** JSON grouping fields (56 fields)

pethers · 2026-04-06T16:04:44Z

@copilot apply changes based on the comments in this thread

…ural list, sort view lists, update all counts - Fix mermaid regex to match Type[] array notation (String[], CommitteeLink[], etc.) - Normalize base type for scalar check (String[] → String is still scalar) - Remove meetings from committee structural fields (it's PLANNED, not structural) - Sort matched_views and missing_views lists for deterministic output - Update SCHEMA_VALIDATION_REPORT.md checklist: structural 56→67, planned 22→35 - Update FIELD_MAPPING.md validation exclusions: structural 56→67 - Update all documentation counts: 168 total fields, 24 implemented, 67 structural, 42 computed, 35 planned, 144 mismatches - Re-run validator to regenerate validation-results.json and auto-generated report - All 23 tests pass Agent-Logs-Url: https://github.com/Hack23/cia/sessions/dafa1381-0a99-4091-bcd6-ad65a11dc1eb Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-06T16:19:54Z

📊 JSON Schema Validation Results

Validation Summary:

⏰ Generated: 2026-04-06T16:19:53.438014+00:00
📋 Schemas Validated: 5
📁 Sample Files Analyzed: 202
⚠️ Total Mismatches: 144

Schema Status

Schema	Fields	Views Matched	Mismatches	Status
Politician	55	12	41	⚠️ REVIEW
Party	51	17	47	⚠️ REVIEW
Committee	29	12	25	⚠️ REVIEW
Ministry	33	8	31	⚠️ REVIEW
Intelligence	0	5	0	✅ PASS

📄 Full Report: See uploaded artifacts for detailed validation report

ℹ️ Status: Field mismatches are documented and tracked. These represent differences between original projected schemas and actual implemented data.

📄 Data-Validated Schemas: See *-schema-validated.md files for schemas matching actual data (144 mismatches documented).

Agent-Logs-Url: https://github.com/Hack23/cia/sessions/dafa1381-0a99-4091-bcd6-ad65a11dc1eb Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-06T16:21:00Z

📊 JSON Schema Validation Results

Validation Summary:

⏰ Generated: 2026-04-06T16:20:59.465024+00:00
📋 Schemas Validated: 5
📁 Sample Files Analyzed: 202
⚠️ Total Mismatches: 144

Schema Status

Schema	Fields	Views Matched	Mismatches	Status
Politician	55	12	41	⚠️ REVIEW
Party	51	17	47	⚠️ REVIEW
Committee	29	12	25	⚠️ REVIEW
Ministry	33	8	31	⚠️ REVIEW
Intelligence	0	5	0	✅ PASS

📄 Full Report: See uploaded artifacts for detailed validation report

ℹ️ Status: Field mismatches are documented and tracked. These represent differences between original projected schemas and actual implemented data.

📄 Data-Validated Schemas: See *-schema-validated.md files for schemas matching actual data (144 mismatches documented).

Copilot · 2026-04-06T16:21:05Z

@copilot apply changes based on the comments in this thread

Applied all 5 review comments in 331d769 and 1683756:

Mermaid regex — Updated to r'\+(\w+(?:\[\])?)\s+(\w+)' to capture Type[] array notation. Scalar arrays (String[]) normalized via removesuffix("[]") to match SCALAR_TYPES; non-scalar arrays (CommitteeLink[]) are classified as STRUCTURAL. Detected 24 previously-skipped array fields.
meetings conflict — Removed from committee structural fields table (kept as PLANNED only). Heading count updated 8→7.
Deterministic view lists — matched_views and missing_views sorted before writing to schema_result and JSON output.
SCHEMA_VALIDATION_REPORT.md checklist — Updated: structural 53→67, planned 23→35.
FIELD_MAPPING.md exclusions — Updated: structural 53→67.

Updated counts: 168 total fields, 24 implemented, 67 structural, 42 computed, 35 planned, 144 mismatches. All 23 tests pass. CodeQL clean.

Copilot

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.

Copilot · 2026-04-06T16:53:08Z

json-export-specs/schemas/politician-schema.md

+| **Committees** | | | |
+| `current` | 🔄 PLANNED | Committees | Requires committee membership data |
+| `historical` | 🔄 PLANNED | Committees | Requires historical committee data |
+| `chairPositions` | 🔄 PLANNED | Committees | Requires chair position tracking |
+


The current/historical committee membership fields are shown as 🔄 PLANNED, but in the mermaid diagram they are non-scalar array types (+CommitteeMembership[] current/historical). The validator logic classifies all non-scalar types as STRUCTURAL (and emits composite keys like CommitteeMembership[]:current), so these rows (and the Status Summary counts) should be reconciled to avoid contradicting validation-results.json.

Copilot · 2026-04-06T16:53:09Z

json-export-specs/schemas/politician-schema.md

+| 🔄 PLANNED | 8 | Fields requiring new data sources or engine integration |
+| ❌ STRUCTURAL | 22 | JSON grouping objects and non-scalar types, not individual data fields |
+
+> **Note:** The table above documents all 55 fields from the mermaid diagram (14+11+8+22). The validator tracks 46 of these (14 impl + 19 struct + 11 comp + 2 plan); 9 nested class fields are only documented in this table. See `validation-results.json` for per-field status.


The note claims the validator “tracks 46” fields and that “9 nested class fields are only documented in this table”, but the validator’s mermaid regex scans the entire diagram and (per validation-results.json) counts 55 fields for this schema. This note should be updated/removed so it matches the validator behavior and reported counts.

Suggested change

> **Note:** The table above documents all 55 fields from the mermaid diagram (14+11+8+22). The validator tracks 46 of these (14 impl + 19 struct + 11 comp + 2 plan); 9 nested class fields are only documented in this table. See `validation-results.json` for per-field status.

> **Note:** The table above documents all 55 fields from the mermaid diagram (14+11+8+22). The validator scans the entire mermaid diagram and reports all 55 fields for this schema. See `validation-results.json` for per-field status.

Copilot · 2026-04-06T16:53:09Z

json-export-specs/schemas/party-schema.md

+| subcategories | 🔄 PLANNED | Labels | Requires new data source |
+| descriptions | ❌ STRUCTURAL | Labels | JSON grouping object |
+| intelligenceTags | 🔄 PLANNED | Labels | Requires new data source |


Copilot · 2026-04-06T16:53:09Z

json-export-specs/schemas/party-schema.md

+| 🔄 PLANNED | 17 | Requires new data sources |
+| ❌ STRUCTURAL | 18 | JSON grouping objects |
+
+> **Note:** The table above documents all 51 fields from the mermaid diagram (4+12+17+18). The validator tracks 43 of these (4 impl + 18 struct + 12 comp + 9 plan); 8 nested class fields are only documented in this table. See `validation-results.json` for per-field status.


The note says the validator “tracks 43” fields and that 8 nested fields are only documented here, but validation-results.json reports fields_defined: 51 for the Party schema, which implies the validator is already counting all mermaid fields. Please update/remove this note to avoid contradicting the validator output.

Suggested change

> **Note:** The table above documents all 51 fields from the mermaid diagram (4+12+17+18). The validator tracks 43 of these (4 impl + 18 struct + 12 comp + 9 plan); 8 nested class fields are only documented in this table. See `validation-results.json` for per-field status.

> **Note:** The table above documents all 51 fields from the mermaid diagram (4+12+17+18), which matches the Party schema `fields_defined: 51` reported in `validation-results.json`. See that file for per-field status details.

Copilot · 2026-04-06T16:53:10Z

json-export-specs/SCHEMA_VALIDATION_REPORT.md

+**Date:** 2026-04-05  
+**Status:** Analysis Complete - Remediation Plan Provided  
+**Validated Against:** service.data.impl/src/main/resources/full_schema.sql  
+**Total Schema Mismatches:** 120 (across 4 entity schemas)


This report header states Total Schema Mismatches: 120, but the updated validator output and the Field Status Summary in the same document indicate 144 mismatches (168 total fields − 24 implemented). Please reconcile the 120 vs 144 mismatch totals so readers don’t get conflicting metrics.

Suggested change

**Total Schema Mismatches:** 120 (across 4 entity schemas)

**Total Schema Mismatches:** 144 (across 4 entity schemas)

pethers · 2026-04-06T17:01:53Z

@copilot apply changes based on the comments in this thread

… output - politician: current/historical → STRUCTURAL (CommitteeMembership[] non-scalar), summary 8→6 planned, 22→24 structural - politician: note updated to reflect validator tracks all 55 fields (not 46) - party: subcategories/intelligenceTags → STRUCTURAL (String[] array types per validator) - party: history/regions → STRUCTURAL (ElectionHistory[]/RegionalSupport[] non-scalar) - party: summary 17→13 planned, 18→22 structural; note updated to match fields_defined: 51 - SCHEMA_VALIDATION_REPORT.md: Total Schema Mismatches 120 → 144 (168 - 24 implemented) Agent-Logs-Url: https://github.com/Hack23/cia/sessions/87ca677d-6eed-4545-9caa-615aaee059c1 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

github-actions · 2026-04-06T17:25:35Z

📊 JSON Schema Validation Results

Validation Summary:

⏰ Generated: 2026-04-06T17:25:34.505154+00:00
📋 Schemas Validated: 5
📁 Sample Files Analyzed: 202
⚠️ Total Mismatches: 144

Schema Status

Schema	Fields	Views Matched	Mismatches	Status
Politician	55	12	41	⚠️ REVIEW
Party	51	17	47	⚠️ REVIEW
Committee	29	12	25	⚠️ REVIEW
Ministry	33	8	31	⚠️ REVIEW
Intelligence	0	5	0	✅ PASS

📄 Full Report: See uploaded artifacts for detailed validation report

ℹ️ Status: Field mismatches are documented and tracked. These represent differences between original projected schemas and actual implemented data.

📄 Data-Validated Schemas: See *-schema-validated.md files for schemas matching actual data (144 mismatches documented).

Copilot · 2026-04-06T17:25:53Z

@copilot apply changes based on the comments in this thread

Applied all 5 review comments in 1ad51e8:

Politician current/historical — Changed from PLANNED to ❌ STRUCTURAL (CommitteeMembership[] non-scalar array type per validator). Summary updated: STRUCTURAL 22→24, PLANNED 8→6.
Politician note — Updated to state validator scans and reports all 55 mermaid fields (was incorrectly claiming 46 with 9 undocumented).
Party subcategories/intelligenceTags — Changed from PLANNED to ❌ STRUCTURAL (array types per validator output). Also fixed history (ElectionHistory[]) and regions (RegionalSupport[]) which had the same issue. Summary updated: STRUCTURAL 18→22, PLANNED 17→13.
Party note — Updated to state all 51 fields match fields_defined: 51 in validation-results.json (was incorrectly claiming 43).
SCHEMA_VALIDATION_REPORT.md — Total mismatches corrected from 120 to 144 (168 total − 24 implemented).

All 23 tests pass.

sonarqubecloud · 2026-04-06T17:27:03Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Initial plan

d5e989b

Copilot AI assigned Copilot and pethers Apr 5, 2026

pull-request-size bot added the size/XS label Apr 5, 2026

Copilot AI linked an issue Apr 5, 2026 that may be closed by this pull request

Align JSON export schema field definitions with actual database schema — resolve 125 field mismatches #8521

Closed

8 tasks

Copilot started work on behalf of pethers April 5, 2026 21:11 View session

Initial analysis of 121 field mismatches across 4 schemas

816c898

Agent-Logs-Url: https://github.com/Hack23/cia/sessions/57d6f1f4-def9-4f9c-b234-5b04ee040d5e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

pull-request-size bot added size/XXL and removed size/XS labels Apr 5, 2026

github-actions bot added documentation infrastructure labels Apr 5, 2026

Copilot AI and others added 7 commits April 5, 2026 21:27

docs: add field implementation status section to party-schema.md

da3d744

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

docs: add Field Implementation Status section to committee schema

8ee6c17

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

docs: add Field Implementation Status section to ministry schema

2b8924f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

docs: align ministry schema field status table with project conventions

50f9f40

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

feat: add field implementation status to FIELD_MAPPING.md, schema ann…

94e397e

…otations, and enhanced validator Agent-Logs-Url: https://github.com/Hack23/cia/sessions/57d6f1f4-def9-4f9c-b234-5b04ee040d5e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot AI and others added 2 commits April 5, 2026 21:42

feat: update validated schemas, VALIDATED_SCHEMAS_SUMMARY, and add fi…

f19de53

…eld status tests Agent-Logs-Url: https://github.com/Hack23/cia/sessions/57d6f1f4-def9-4f9c-b234-5b04ee040d5e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

refactor: address code review feedback - extract helper method and ad…

5b2be52

…d aggregation test Agent-Logs-Url: https://github.com/Hack23/cia/sessions/57d6f1f4-def9-4f9c-b234-5b04ee040d5e Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Align JSON export schema field definitions with database schema~~ feat: classify 121 JSON export schema field mismatches and add implementation roadmap Apr 5, 2026

Copilot AI requested a review from pethers April 5, 2026 21:51

Copilot finished work on behalf of pethers April 5, 2026 21:51

pethers marked this pull request as ready for review April 5, 2026 21:58

Copilot AI changed the title ~~feat: classify JSON export schema field mismatches, fix validator view mappings, and add implementation roadmap~~ feat: classify JSON export schema field mismatches, fix validator view mappings and duplicate field handling, and add implementation roadmap Apr 6, 2026

Copilot finished work on behalf of pethers April 6, 2026 13:16

pethers requested a review from Copilot April 6, 2026 15:43

Copilot started reviewing on behalf of pethers April 6, 2026 15:45 View session

pethers requested review from Copilot and removed request for Copilot April 6, 2026 15:47

Copilot started reviewing on behalf of pethers April 6, 2026 15:49 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Copilot started work on behalf of pethers April 6, 2026 16:04 View session

fix: use removesuffix instead of rstrip for array type normalization

1683756

Agent-Logs-Url: https://github.com/Hack23/cia/sessions/dafa1381-0a99-4091-bcd6-ad65a11dc1eb Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>

Copilot finished work on behalf of pethers April 6, 2026 16:23

pethers requested a review from Copilot April 6, 2026 16:46

Copilot started reviewing on behalf of pethers April 6, 2026 16:47 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Copilot started work on behalf of pethers April 6, 2026 17:12 View session

Copilot finished work on behalf of pethers April 6, 2026 17:39

pethers merged commit da6ea92 into master Apr 6, 2026
14 of 15 checks passed

pethers deleted the copilot/align-json-export-schema branch April 6, 2026 17:58

	- All ❌ STRUCTURAL JSON grouping fields (53 fields)
	- All ❌ STRUCTURAL JSON grouping fields (56 fields)

	> Note: The table above documents all 55 fields from the mermaid diagram (14+11+8+22). The validator tracks 46 of these (14 impl + 19 struct + 11 comp + 2 plan); 9 nested class fields are only documented in this table. See `validation-results.json` for per-field status.
	> Note: The table above documents all 55 fields from the mermaid diagram (14+11+8+22). The validator scans the entire mermaid diagram and reports all 55 fields for this schema. See `validation-results.json` for per-field status.

	Total Schema Mismatches: 120 (across 4 entity schemas)
	Total Schema Mismatches: 144 (across 4 entity schemas)

Conversation

Copilot AI commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Primary Changes

Political Analysis

Technical Changes

Impact Analysis

Political Analysis Impact

Technical Impact

Testing

Documentation

Key changes:

Screenshots

Related Issues

Checklist

Additional Notes

Security Considerations

Release Notes

Uh oh!

github-actions bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

github-actions bot commented Apr 5, 2026

📊 JSON Schema Validation Results

Schema Status

Uh oh!

github-actions bot commented Apr 5, 2026

📊 JSON Schema Validation Results

Schema Status

Uh oh!

github-actions bot commented Apr 5, 2026

📊 JSON Schema Validation Results

Schema Status

Uh oh!

github-actions bot commented Apr 5, 2026

📊 JSON Schema Validation Results

Schema Status

Uh oh!

Copilot AI commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

pethers commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

📊 JSON Schema Validation Results

Schema Status

Uh oh!

github-actions bot commented Apr 6, 2026

📊 JSON Schema Validation Results

Schema Status

Uh oh!

Copilot AI commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI commented Apr 5, 2026 •

edited

Loading

github-actions bot commented Apr 5, 2026 •

edited

Loading