Skip to content

fix(annotation): handle all VRS allele state types in deserialization#737

Open
bencap wants to merge 2 commits into
release-2026.2.3from
bugfix/bencap/736/rle-support-in-va-spec-output
Open

fix(annotation): handle all VRS allele state types in deserialization#737
bencap wants to merge 2 commits into
release-2026.2.3from
bugfix/bencap/736/rle-support-in-va-spec-output

Conversation

@bencap
Copy link
Copy Markdown
Collaborator

@bencap bencap commented May 14, 2026

allele_from_mapped_variant_dictionary_result unconditionally constructed a LiteralSequenceExpression from the stored state dict, causing a Pydantic ValidationError (500) for score sets containing reference-identical variants whose state is a ReferenceLengthExpression.

  • Dispatch on state type to construct RLE, LengthExpression, or LSE
  • Raise ValueError with an actionable message for unknown state types, replacing the cryptic multi-field Pydantic error as the failure mode
  • Add test constants for RLE and LengthExpression allele dicts
  • Parametrize state-type and CisPhasedBlock member tests to cover all three state types and enforce coverage of future additions

Additionally, fixes a bug where certain evidence lines were unable to be regenerated via a serialized dict. Evidence generation now proceeds directly via models.

allele_from_mapped_variant_dictionary_result unconditionally constructed
a LiteralSequenceExpression from the stored state dict, causing a
Pydantic ValidationError (500) for score sets containing
reference-identical variants whose state is a ReferenceLengthExpression.

- Dispatch on state type to construct RLE, LengthExpression, or LSE
- Raise ValueError with an actionable message for unknown state types,
  replacing the cryptic multi-field Pydantic error as the failure mode
- Add test constants for RLE and LengthExpression allele dicts
- Parametrize state-type and CisPhasedBlock member tests to cover all
  three state types and enforce coverage of future additions
…elds

Serializing evidence to dicts via `serialize_evidence_items` before
assigning to `hasEvidenceItems`/`hasEvidenceLines` caused
`VariantPathogenicityEvidenceLine` validation to fail for Statement
objects containing nested VRS Alleles with production genomic data
(regex constraints on `digest`, `refgetAccession`, etc. failed during
dict reconstruction).

- Remove `serialize_evidence_items` from `util.py` entirely
- `acmg_evidence_line`: pass `list(evidence)` directly; Statement is
  in `has_evidence_items_models` and passes the isinstance check
- `functional_evidence_line`: wrap items as `StudyResult(root=item)`;
  `ExperimentalVariantFunctionalImpactStudyResult` does not inherit
  `StudyResult` so direct instances fail the isinstance check
- `mapped_variant_to_pathogenicity_statement`: pass `list(evidence)`
- Add regression tests in `test_evidence_line.py` and `test_annotate.py`
  that assert evidence items are model instances, not raw dicts
- Remove `TestSerializeEvidenceItems` from `test_util.py`
@bencap bencap changed the base branch from main to release-2026.2.3 May 15, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support ReferenceLengthExpression state in annotated-variants study-result pipeline

1 participant