-
Notifications
You must be signed in to change notification settings - Fork 0
Feat: Enhance validation with missing data normalization #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdded normalize_missing(data) to deeply copy and replace None/"" with "N/A" across dicts/lists and produce a missing_data_report; perform_cross_source_checks() now sets cross_source_checks to "COMPLETED_WITH_ALERTS" when alerts exist, otherwise "PASSED". Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant ValidationEngine
Note over ValidationEngine: normalize_missing(data)
Caller->>ValidationEngine: normalize_missing(data)
ValidationEngine->>ValidationEngine: deepcopy input
ValidationEngine->>ValidationEngine: traverse dicts & lists
ValidationEngine->>ValidationEngine: replace None/"" -> "N/A" and record path
ValidationEngine-->>Caller: return normalized_data + missing_data_report
Note over ValidationEngine: perform_cross_source_checks(...)
Caller->>ValidationEngine: perform_cross_source_checks(...)
ValidationEngine->>ValidationEngine: compute alerts
alt alerts found
ValidationEngine-->>Caller: cross_source_checks = "COMPLETED_WITH_ALERTS"
else no alerts
ValidationEngine-->>Caller: cross_source_checks = "PASSED"
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
backend/app/services/validation/validation_engine.py (1)
165-165: Remove unused loop variable.The loop variable
iis not used within the loop body.As per coding guidelines.
Apply this diff:
- for i, part in enumerate(path_parts[:-1]): + for part in path_parts[:-1]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
backend/app/services/validation/__pycache__/validation_engine.cpython-313.pycis excluded by!**/*.pycbackend/app/services/validation/tests/__pycache__/test_validation_engine.cpython-313-pytest-8.4.2.pycis excluded by!**/*.pyc
📒 Files selected for processing (2)
backend/app/services/validation/tests/test_validation_engine.py(1 hunks)backend/app/services/validation/validation_engine.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/validation/tests/test_validation_engine.py (1)
backend/app/services/validation/validation_engine.py (2)
perform_cross_source_checks(58-127)normalize_missing(130-180)
🪛 Ruff (0.14.5)
backend/app/services/validation/validation_engine.py
165-165: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
🔇 Additional comments (2)
backend/app/services/validation/tests/test_validation_engine.py (1)
2-2: LGTM!The import statement correctly adds
normalize_missingalongside the existing import.backend/app/services/validation/validation_engine.py (1)
122-125: LGTM!The addition of the
cross_source_checksstatus field provides a clear validation state indicator. The logic correctly sets "COMPLETED_WITH_ALERTS" when alerts exist and "PASSED" otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
backend/app/services/validation/__pycache__/validation_engine.cpython-313.pycis excluded by!**/*.pycbackend/app/services/validation/tests/__pycache__/test_validation_engine.cpython-313-pytest-8.4.2.pycis excluded by!**/*.pyc
📒 Files selected for processing (2)
backend/app/services/validation/tests/test_validation_engine.py(1 hunks)backend/app/services/validation/validation_engine.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/services/validation/tests/test_validation_engine.py
🧰 Additional context used
🪛 Ruff (0.14.5)
backend/app/services/validation/validation_engine.py
166-166: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
🔇 Additional comments (5)
backend/app/services/validation/validation_engine.py (5)
6-6: LGTM! Previous critical issue resolved.The addition of
deepcopyimport correctly addresses the previous review comment about shallow copy mutations.
123-127: LGTM! Clear status reporting.The conditional status setting appropriately distinguishes between clean validation and validation with alerts.
131-143: LGTM! Function setup is correct.The function signature is well-typed, the docstring is clear, and the use of
deepcopycorrectly prevents mutation of the original input data.
146-158: LGTM! Dictionary traversal logic is sound.The logic correctly handles nested dictionaries, builds appropriate paths, and recurses into nested structures.
179-181: Approve with note on potential key collision.The function correctly invokes the traversal and returns the normalized data with the report attached.
Note that if the input
dataalready contains a"missing_data_report"key, it will be overwritten. This is likely acceptable given the function's purpose, but consider documenting this behavior if it hasn't been already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/services/validation/validation_engine.py (1)
173-180: Remove or fix ineffective type-conversion logic.The reassignment
normalized_data_ref = [](line 176) only updates the local variable and doesn't modify the parent structure innormalized_data. This means the type conversion won't actually take effect. Similar issue exists at lines 203-204.If this defensive code is meant to handle malformed inputs, it should either be fixed to properly update the parent structure, or removed entirely if the function assumes well-formed inputs.
Consider either:
- Removing these checks if the function assumes well-formed input
- Or properly updating the parent structure (which would require tracking parent references)
- if not isinstance(normalized_data_ref, list): - # If it's not a list, and it's an empty dict, convert to list - if isinstance(normalized_data_ref, dict) and not normalized_data_ref: - normalized_data_ref = [] - else: - # If it's not a list and not an empty dict, this is an error in path or structure - # For now, let's assume the path is well-formed and the type matches - pass + # Assume well-formed input where structure matches path + if not isinstance(normalized_data_ref, list): + # Skip or log unexpected structure + continue
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
backend/app/services/validation/__pycache__/validation_engine.cpython-313.pycis excluded by!**/*.pycbackend/app/services/validation/tests/__pycache__/test_validation_engine.cpython-313-pytest-8.4.2.pycis excluded by!**/*.pyc
📒 Files selected for processing (1)
backend/app/services/validation/validation_engine.py(2 hunks)
🔇 Additional comments (2)
backend/app/services/validation/validation_engine.py (2)
5-7: LGTM! Necessary imports added.The
remodule is used for path parsing in list traversal, anddeepcopycorrectly addresses the previous shallow-copy concern, ensuring the original input data remains unmodified.
124-128: LGTM! Clear status summary logic.The addition of a summary status based on alert presence provides a clear, actionable signal for downstream consumers of the validation results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
backend/app/services/validation/__pycache__/validation_engine.cpython-313.pycis excluded by!**/*.pycbackend/app/services/validation/tests/__pycache__/test_validation_engine.cpython-313-pytest-8.4.2.pycis excluded by!**/*.pyc
📒 Files selected for processing (1)
backend/app/services/validation/validation_engine.py(2 hunks)
🔇 Additional comments (2)
backend/app/services/validation/validation_engine.py (2)
5-7: LGTM! Imports are appropriate.The
remodule for regex-based path parsing anddeepcopyfor safe data cloning are both necessary for the newnormalize_missingfunction.
124-128: LGTM! Status field provides clear validation outcome.The addition of
cross_source_checksstatus based on alert presence improves the clarity of validation results.
| if not isinstance(normalized_data_ref, dict): | ||
| # If it's not a dict, and it's an empty list, convert to dict | ||
| if isinstance(normalized_data_ref, list) and not normalized_data_ref: | ||
| normalized_data_ref = {} | ||
| else: | ||
| # If it's not a dict and not an empty list, this is an error in path or structure | ||
| # For now, let's assume the path is well-formed and the type matches | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major: Type conversion doesn't update parent structure.
Lines 203-204 attempt to convert an empty list to a dict by reassigning normalized_data_ref, but this only changes the local variable without updating the parent structure. This is the same issue as in lines 173-180.
If this conversion is necessary, update the parent structure directly. Otherwise, consider removing the conversion or raising an exception to flag structural inconsistencies.
🤖 Prompt for AI Agents
In backend/app/services/validation/validation_engine.py around lines 201 to 208,
the code reassigns normalized_data_ref to {} when encountering an empty list but
only updates the local variable, leaving the parent container unchanged; update
the parent structure instead (e.g., set parent_container[parent_key_or_index] =
{} using the stored parent reference and key/index used to reach
normalized_data_ref) so the change persists, or if conversion is not allowed,
replace the conversion with raising a clear exception to flag the structural
mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/services/validation/validation_engine.py (1)
132-169: normalize_missing recursion is sound; consider small robustness and cleanup tweaksThe recursive traversal over dicts/lists and use of
deepcopyare correct and preserve the original input. A couple of small refinements would make this more robust and simpler:
- Guard against invalid input early
Ifdatais everNoneor not a dict, this will currently fail with an opaqueAttributeErroronnormalized_data.update(...). You can mirrorcheck_missing_values’ behavior with explicit checks:def normalize_missing(data: Dict[str, Any]) -> Dict[str, Any]: @@ - normalized_data = deepcopy(data) + if data is None: + raise ValueError("data must not be None") + if not isinstance(data, dict): + raise TypeError("data must be a dict") + + normalized_data = deepcopy(data)
- Drop the redundant update at the end
temp_root['__root__']andnormalized_datarefer to the same dict instance, so this line is effectively a no-op:- normalized_data.update(temp_root['__root__']) # Update normalized_data with the modified content + # normalized_data already holds the modified structure via temp_root['__root__']You can safely remove it to simplify the function.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
backend/app/services/validation/__pycache__/validation_engine.cpython-313.pycis excluded by!**/*.pycbackend/app/services/validation/tests/__pycache__/test_validation_engine.cpython-313-pytest-8.4.2.pycis excluded by!**/*.pyc
📒 Files selected for processing (1)
backend/app/services/validation/validation_engine.py(2 hunks)
🔇 Additional comments (2)
backend/app/services/validation/validation_engine.py (2)
5-7: Deep copy import correctly supports non-mutating normalizationUsing
deepcopyfornormalized_datais the right fix to ensure callers’ input isn’t mutated while still allowing in-place updates during traversal. This now matches the docstring promise of returning “a new dictionary.”
124-127: cross_source_checks status wiring looks consistentThe new
cross_source_checksflag cleanly summarizes the result:"COMPLETED_WITH_ALERTS"whenever any alert message is recorded, and"PASSED"otherwise. This covers all branches of the preceding logic and should make downstream consumption simpler.
|
the |
Overview: This PR expands the validation module to gracefully handle missing data, ensuring more robust report generation.
Changes
validationmodule for handling missing fields and failed agent outputs.normalize_missing(data)function to replace absent data points with explicit placeholders.missing_data_reportJSON field to provide clear explanations for any data gaps.Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.