Skip to content

Conversation

@sujata-m
Copy link
Contributor

@sujata-m sujata-m commented Sep 1, 2025

Dev Board Ticket

https://dev.azure.com/TDEI-UW/TDEI/_workitems/edit/1634/

Changes

  • Add geometry-aware schema selection (Point/LineString/Polygon) with sensible defaults
  • Stream jsonschema_rs errors; keep legacy errors capped by max_errors
  • Introduce ValidationResult.issues (one best error per feature) and set errors=None when empty
  • Add friendly formatting:
    • compact Enum messages
    • summarize anyOf required keys (“must include one of: …”)

helpers:

  • _feature_index_from_error: extract feature index from error.instance_path
  • _err_kind: normalize/derive validator kind from jsonschema_rs error
  • _pretty_message: produce concise, user-friendly text; handles Enum/AnyOf specially
  • _rank_for: rank errors to choose a single best error per feature
  • _get_colset: safe set extraction with diagnostics for missing columns

test: add comprehensive unit tests

  • Added additional 31 unit test cases to cover all the functionality which is written.
  • Did not touch old unit test cases to make sure the older functionality is still intact.

tets results:

> coverage run --source=src/python_osw_validation -m unittest discover -v tests/unit_tests

test_duplicate_files (test_extracted_data_validator.TestExtractedDataValidator) ... ok
test_empty_directory (test_extracted_data_validator.TestExtractedDataValidator) ... ok
test_invalid_directory (test_extracted_data_validator.TestExtractedDataValidator) ... ok
test_missing_optional_file (test_extracted_data_validator.TestExtractedDataValidator) ... ok
test_no_geojson_files (test_extracted_data_validator.TestExtractedDataValidator) ... ok
test_valid_data_at_root (test_extracted_data_validator.TestExtractedDataValidator) ... ok
test_valid_data_inside_folder (test_extracted_data_validator.TestExtractedDataValidator) ... ok
test_no_noise_no_change (test_helpers.TestCleanEnumMessage) ... ok
test_strips_other_candidates_and_trims (test_helpers.TestCleanEnumMessage) ... ok
test_empty_when_unknown (test_helpers.TestErrKind) ... ok
test_fallback_to_message (test_helpers.TestErrKind) ... ok
test_fallback_to_validator (test_helpers.TestErrKind) ... ok
test_prefers_kind_object (test_helpers.TestErrKind) ... ok
test_feature_index_absent (test_helpers.TestFeatureIndexFromError) ... ok
test_feature_index_next_not_int (test_helpers.TestFeatureIndexFromError) ... ok
test_feature_index_present (test_helpers.TestFeatureIndexFromError) ... ok
test_anyof_unions_required_fields (test_helpers.TestPrettyMessage) ... ok
test_default_first_line_from_message (test_helpers.TestPrettyMessage) ... ok
test_enum_compacts_message (test_helpers.TestPrettyMessage) ... ok
test_ordering_by_kind (test_helpers.TestRankFor) ... ok
test_tiebreaker_shorter_message_is_better (test_helpers.TestRankFor) ... ok
test_edges_invalid_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_edges_invalid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 102, in load_osw_schema
    with open(schema_path, 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 175, in validate
    if not self.validate_osw_errors(file_path=str(file_path), max_errors=max_errors):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 404, in validate_osw_errors
    schema = self.load_osw_schema(schema_path)
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 110, in load_osw_schema
    raise Exception(f'Invalid or missing schema file: {e}')
Exception: Invalid or missing schema file: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'
ok
test_edges_invalid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
test_external_extension_file_inside_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_external_extension_file_inside_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... ok
test_external_extension_file_inside_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
test_extra_field_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_id_missing_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_invalid_geometry_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_invalid_serialization_file (test_osw_validation.TestOSWValidation) ... ok
test_invalid_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_invalid_zipfile_default_error_count (test_osw_validation.TestOSWValidation) ... ok
test_invalid_zipfile_should_specific_errors_counts (test_osw_validation.TestOSWValidation) ... ok
test_invalid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 102, in load_osw_schema
    with open(schema_path, 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 175, in validate
    if not self.validate_osw_errors(file_path=str(file_path), max_errors=max_errors):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 404, in validate_osw_errors
    schema = self.load_osw_schema(schema_path)
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 110, in load_osw_schema
    raise Exception(f'Invalid or missing schema file: {e}')
Exception: Invalid or missing schema file: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'
ok
test_invalid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
test_invalid_zones_file (test_osw_validation.TestOSWValidation) ... ok
test_minimal_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_minimal_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 102, in load_osw_schema
    with open(schema_path, 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 175, in validate
    if not self.validate_osw_errors(file_path=str(file_path), max_errors=max_errors):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 404, in validate_osw_errors
    schema = self.load_osw_schema(schema_path)
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 110, in load_osw_schema
    raise Exception(f'Invalid or missing schema file: {e}')
Exception: Invalid or missing schema file: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'
ok
test_minimal_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
test_missing_identifier_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_no_entity_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_nodes_invalid_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_nodes_invalid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 102, in load_osw_schema
    with open(schema_path, 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 175, in validate
    if not self.validate_osw_errors(file_path=str(file_path), max_errors=max_errors):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 404, in validate_osw_errors
    schema = self.load_osw_schema(schema_path)
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 110, in load_osw_schema
    raise Exception(f'Invalid or missing schema file: {e}')
Exception: Invalid or missing schema file: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'
ok
test_nodes_invalid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
test_points_invalid_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_points_invalid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 102, in load_osw_schema
    with open(schema_path, 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 175, in validate
    if not self.validate_osw_errors(file_path=str(file_path), max_errors=max_errors):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 404, in validate_osw_errors
    schema = self.load_osw_schema(schema_path)
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 110, in load_osw_schema
    raise Exception(f'Invalid or missing schema file: {e}')
Exception: Invalid or missing schema file: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'
ok
test_points_invalid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
test_unmatched_ids_limited_to_20 (test_osw_validation.TestOSWValidation) ... ok
test_valid_osw_file (test_osw_validation.TestOSWValidation) ... ok
test_valid_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_valid_zipfile_with_invalid_schema (test_osw_validation.TestOSWValidation) ... Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 102, in load_osw_schema
    with open(schema_path, 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 175, in validate
    if not self.validate_osw_errors(file_path=str(file_path), max_errors=max_errors):
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 404, in validate_osw_errors
    schema = self.load_osw_schema(schema_path)
  File "/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/__init__.py", line 110, in load_osw_schema
    raise Exception(f'Invalid or missing schema file: {e}')
Exception: Invalid or missing schema file: [Errno 2] No such file or directory: '/Users/anuj/Work/Gaussian/TDEI-python-lib-osw-validation/src/python_osw_validation/schema/opensidewalk.schema.json'
ok
test_valid_zipfile_with_schema (test_osw_validation.TestOSWValidation) ... ok
test_valid_zones_file (test_osw_validation.TestOSWValidation) ... ok
test_wrong_datatypes_zipfile (test_osw_validation.TestOSWValidation) ... ok
test_duplicate_ids_detection (test_osw_validation_extras.TestOSWValidationExtras)
Duplicates inside a single file are reported. ... ok
test_extracted_data_validator_invalid (test_osw_validation_extras.TestOSWValidationExtras)
If folder structure is invalid, its error is surfaced. ... ok
test_issues_populated_for_invalid_zip (test_osw_validation_extras.TestOSWValidationExtras)
Ensure `issues` contains per-feature messages when validation fails. ... ok
test_missing_u_id_logged_and_no_keyerror (test_osw_validation_extras.TestOSWValidationExtras)
Edges missing `_u_id` should log a friendly error instead of raising KeyError. ... ok
test_pick_schema_by_geometry_and_by_filename (test_osw_validation_extras.TestOSWValidationExtras)
Point/LineString/Polygon ⇒ proper schema; filename fallback when features empty. ... ok
test_unmatched_u_id_is_limited_to_20 (test_osw_validation_extras.TestOSWValidationExtras)
When there are many unmatched _u_id values, only 20 are listed. ... ok
test_unmatched_w_id_is_limited_to_20 (test_osw_validation_extras.TestOSWValidationExtras) ... ok
test_zip_extract_failure_bubbles_as_error (test_osw_validation_extras.TestOSWValidationExtras)
If zip extraction fails, we get a clean error and False result. ... ok
test_get_colset_handles_unhashable_by_stringifying (test_osw_validation_extras.TestOSWValidationInternals) ... ok
test_get_colset_logs_and_returns_empty_when_missing (test_osw_validation_extras.TestOSWValidationInternals) ... ok
test_get_colset_returns_set_when_column_present (test_osw_validation_extras.TestOSWValidationInternals) ... ok
test_get_colset_with_none_gdf (test_osw_validation_extras.TestOSWValidationInternals) ... ok
test_pick_schema_by_geometry (test_osw_validation_extras.TestOSWValidationInternals) ... ok
test_pick_schema_filename_fallback (test_osw_validation_extras.TestOSWValidationInternals) ... ok
test_pick_schema_force_single_schema_override (test_osw_validation_extras.TestOSWValidationInternals) ... ok
test_invalid_geometry_logs_ids_when__id_present (test_osw_validation_extras.TestOSWValidationInvalidGeometryLogging)
When _id exists, the message should list some _id values and total count. ... ok
test_invalid_geometry_logs_index_when__id_missing_and_caps_20 (test_osw_validation_extras.TestOSWValidationInvalidGeometryLogging)
When _id is missing, it falls back to index and caps display at 20 of N. ... ok
test_create_zip_failure (test_zipfile_handler.TestZipFileHandler) ... ok
test_create_zip_success (test_zipfile_handler.TestZipFileHandler) ... ok
test_extract_invalid_zip (test_zipfile_handler.TestZipFileHandler) ... ok
test_extract_valid_zip (test_zipfile_handler.TestZipFileHandler) ... ok
test_remove_extracted_files (test_zipfile_handler.TestZipFileHandler) ... ok

----------------------------------------------------------------------
Ran 77 tests in 12.771s

OK

- Add geometry-aware schema selection (Point/LineString/Polygon) with sensible defaults
- Stream jsonschema_rs errors; keep legacy `errors` capped by `max_errors`
- Introduce `ValidationResult.issues` (one best error per feature) and set `errors=None` when empty
- Add friendly formatting:
  - compact Enum messages
  - summarize anyOf required keys (“must include one of: …”)

helpers:
- _feature_index_from_error: extract feature index from error.instance_path
- _err_kind: normalize/derive validator kind from jsonschema_rs error
- _pretty_message: produce concise, user-friendly text; handles Enum/AnyOf specially
- _rank_for: rank errors to choose a single best error per feature
- _get_colset: safe set extraction with diagnostics for missing columns

test: add comprehensive unit tests
- tests/test_helpers.py: covers _feature_index_from_error, _err_kind, _pretty_message, _rank_for, _get_colset
- tests/test_schema_selection.py: geometry-derived and filename fallback selection
- tests/test_validation_issues.py: streaming aggregation, best-error wins, `max_errors` cap behavior
- tests/test_integrity_checks.py: _id uniqueness; edges._u_id/_v_id and zones._w_id membership vs nodes._id
- tests/test_extensions.py: extension read failures, invalid geometries, _id fallback to index, non-serializable props

BREAKING CHANGE: ValidationResult now exposes `issues` and may return `errors=None` (not `[]`).
Downstream consumers must treat `errors` as Optional and prefer `issues` for per-feature UX.
@sujata-m sujata-m merged commit 67f6e49 into develop Sep 3, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants