### Demonstration Overview: Translating a Fully Populated VRS Allele into a FHIR Allele Profile

This notebook demonstrates a full round-trip translation: converting a fully populated **GA4GH VRS Allele** object into a **FHIR Allele Profile**, and then translating it back into its original VRS form.

Unlike the earlier `vrs_allele_translation.ipynb` notebook—which focused on translating the minimal output of the `vrs-python` module—this notebook showcases an enhanced translator capable of handling the complete structure of a richly annotated VRS Allele. This includes nested metadata such as `expressions`, `extensions`, `aliases`, and `sequenceReference` fields.

#### Key features of this notebook:

- **Synthetic test data** is used to simulate a fully populated VRS Allele object.
- **Comprehensive translation** from VRS to FHIR, aiming to preserve all relevant data fields and structure.
- **Round-trip validation**, confirming that converting back from FHIR to VRS yields an equivalent result.

This notebook serves as a testbed and demonstration for accurate and detailed data exchange between the GA4GH VRS Allele model and the HL7 FHIR Allele Profile standard for representing molecular variation.


In [1]:
# Import core modules for VRS ↔ FHIR Allele translation and normalization

from ga4gh.vrs.models import Allele                           # VRS Allele model definition
from translators.vrs_to_fhir import VrsToFhirAlleleTranslator # Converts VRS Alleles to FHIR Allele Profile
from translators.fhir_to_vrs import FhirToVrsAlleleTranslator # Converts FHIR Alleles back to VRS format
import json

# Instantiate utility classes
vrs_translator = VrsToFhirAlleleTranslator()
fhir_translator = FhirToVrsAlleleTranslator()


In [2]:
# Example: Fully populated synthetic VRS Allele object used for translation testing
# NOTE:
# The `extensions` field is not currently supported by the translator and has been excluded for now.
# Placeholder examples for possible extension structures are included as comments for future implementation/testing.

example_synthetic_data = {
    "id": "ga4gh:VA.j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L",
    "type": "Allele",
    "name": "V600E",
    "description": "BRAF V600E variant",
    "digest": "j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L",
    "expressions": [
        {
            "id": "expression:1",
            "syntax": "hgvs.p",
            "value": "NP_004324.2:p.Val600Glu",
            "syntax_version": "21.0",
            "extensions": [
                {
                    "id": "sub-expression:1",
                    "name": "expression.name.1",
                    "value": False, # This should be represented as a valueBoolean 
                    "description": "expression.description.1",
                    # "extensions": [
                    #     {
                    #         "id": "sub-sub-expression:2",
                    #         "name": "expression.sub.name.2",
                    #         "value": 11.11, # This should be represented as a valueDecimal
                    #         "description": "expression.description.2"
                    #     }
                    # ]
                }
            ]
        },
        {"syntax": "hgvs.c", "value": "NM_004333.4:c.1799T>A"},
        {"syntax": "hgvs.g", "value": "NC_000007.13:g.140453136A>T"},
    ],
    "aliases": ["VAL600GLU", "V640E", "VAL640GLU"],
    #TODO: A translation was not created for this yet
    # "extensions": [
    #     {
    #         "name": "civic_variant_url",
    #         "value": "civicdb.org/links/variants/12",
    #         "description": "CIViC Variant URL",
    #         "extensions": [
    #             {
    #                 "id": "extension.sub_extension:1",
    #                 "name": "extension.sub_extension.name",
    #                 "value": "extension.sub_extension.value",
    #                 "description": "extension.sub_extension.description"
    #             }
    #         ]
    #     }
    # ],
    "location": {
        "id": "ga4gh:SL.t-3DrWALhgLdXHsupI-e-M00aL3HgK3y",
        "name": "NP_004324.2",
        "description": "My location description",
        "digest": "t-3DrWALhgLdXHsupI-e-M00aL3HgK3y",
        "type": "SequenceLocation",
        "sequenceReference": {
            "id": "sequence_reference.id",
            "name": "sequence_reference.name",
            'aliases':["sequence_reference.aliase"],
            'description': 'sequence_reference.description',
            "refgetAccession": "SQ.cQvw4UsHHRRlogxbWCB8W-mKD4AraM9y",
            "type": "SequenceReference",
            "residueAlphabet": "aa",
            "moleculeType": "protein",
            "circular": False,
            "sequence": "V",
            "extensions": [
                {
                    "id": "sequence_reference.extension:1",
                    "name": "sequence_reference.extension.name",
                    "value": "sequence_reference.extension.value",
                    "description": "sequence_reference.extension.description",
                    "extensions": [
                        {
                            "id": "sequence_reference.sub_extension:1",
                            "name": "sequence_reference.sub_extension.name",
                            "value": "sequence_reference.sub_extension.value",
                            "description": "sequence_reference.sub_extension.description"
                        }
                    ]
                }
            ]
        },
        "aliases": ["Ensembl:ENSP00000288602.6"],
        "start": 599,
        "end": 600,
        "sequence": "V",
        "extensions": [
            {
                "id": "sequence_location.extension:1",
                "name": "sequence_location.name",
                "value": "sequence_location.value",
                "description": "sequence_location.description",
                "extensions": [
                    {
                        "id": "sequence_location.sub_extension:1",
                        "name": "sequence_location.sub_extension.name",
                        "value": "sequence_location.sub_extension.value",
                        "description": "sequence_location.sub_extension.description"
                    }
                ]
            }
        ]
    },
    "state": {
        "id": "state:1",
        "name": "state",
        "description": "My description for state",
        "sequence": "E",
        "type": "LiteralSequenceExpression",
        "extensions": [
            {
                "id": "state.extension:1",
                "name": "state.name",
                "value": "state.value",
                "description": "state.description",
                "extensions": [
                    {
                        "id": "state.sub_extension:1",
                        "name": "state.sub_extension.name",
                        "value": "state.sub_extension.value",
                        "description": "state.sub_extension.description"
                    }
                ]
            }
        ],
        "aliases": ["my_sequence"]
    }
}

In [3]:
# Create a fully populated VRS Allele object using example synthetic data and display its contents
full_vrs_example = Allele(**example_synthetic_data)

# Display the full content of the Allele object using Pydantic's model_dump()
full_vrs_example.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L',
 'type': 'Allele',
 'name': 'V600E',
 'description': 'BRAF V600E variant',
 'aliases': ['VAL600GLU', 'V640E', 'VAL640GLU'],
 'digest': 'j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L',
 'expressions': [{'id': 'expression:1',
   'extensions': [{'id': 'sub-expression:1',
     'name': 'expression.name.1',
     'value': False,
     'description': 'expression.description.1'}],
   'syntax': 'hgvs.p',
   'value': 'NP_004324.2:p.Val600Glu',
   'syntax_version': '21.0'},
  {'syntax': 'hgvs.c', 'value': 'NM_004333.4:c.1799T>A'},
  {'syntax': 'hgvs.g', 'value': 'NC_000007.13:g.140453136A>T'}],
 'location': {'id': 'ga4gh:SL.t-3DrWALhgLdXHsupI-e-M00aL3HgK3y',
  'type': 'SequenceLocation',
  'name': 'NP_004324.2',
  'description': 'My location description',
  'aliases': ['Ensembl:ENSP00000288602.6'],
  'extensions': [{'id': 'sequence_location.extension:1',
    'extensions': [{'id': 'sequence_location.sub_extension:1',
      'name': 'sequence_location.sub_extensi

In [4]:
# Translate a fully populated VRS Allele object into its FHIR AlleleProfile representation
translated_fhir_allele_profile  = vrs_translator.translate_allele_to_fhir(full_vrs_example)

# Serialize the FHIR AlleleProfile to a formatted JSON string for readable display
print(json.dumps(translated_fhir_allele_profile.model_dump(), indent=2))

{
  "resourceType": "MolecularDefinition",
  "contained": [
    {
      "resourceType": "MolecularDefinition",
      "id": "vrs-location-sequence",
      "moleculeType": {
        "coding": [
          {
            "system": "http://hl7.org/fhir/sequence-type",
            "code": "protein"
          }
        ]
      },
      "representation": [
        {
          "literal": {
            "value": "V"
          }
        }
      ]
    },
    {
      "resourceType": "MolecularDefinition",
      "id": "vrs-location-sequenceReference",
      "extension": [
        {
          "url": "https://w3id.org/ga4gh/schema/vrs/2.0.1/json/SequenceReference#properties/id",
          "valueString": "sequence_reference.id"
        },
        {
          "url": "https://w3id.org/ga4gh/schema/vrs/2.0.1/json/SequenceReference#properties/name",
          "valueString": "sequence_reference.name"
        },
        {
          "url": "https://w3id.org/ga4gh/schema/vrs/2.0.1/json/SequenceReference#properti

In [5]:
# Translate a FHIR AlleleProfile object back into a VRS Allele object
translated_allele = fhir_translator.translate_allele_to_vrs(translated_fhir_allele_profile).model_dump(exclude_none=True)

# Display the resulting VRS Allele structure (excluding default values)
translated_allele

{'id': 'ga4gh:VA.j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L',
 'type': 'Allele',
 'name': 'V600E',
 'description': 'BRAF V600E variant',
 'aliases': ['VAL600GLU', 'V640E', 'VAL640GLU'],
 'digest': 'j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L',
 'expressions': [{'id': 'expression:1',
   'extensions': [{'id': 'sub-expression:1',
     'name': 'expression.name.1',
     'value': False,
     'description': 'expression.description.1'}],
   'syntax': 'hgvs.p',
   'value': 'NP_004324.2:p.Val600Glu',
   'syntax_version': '21.0'},
  {'syntax': 'hgvs.c', 'value': 'NM_004333.4:c.1799T>A'},
  {'syntax': 'hgvs.g', 'value': 'NC_000007.13:g.140453136A>T'}],
 'location': {'id': 'ga4gh:SL.t-3DrWALhgLdXHsupI-e-M00aL3HgK3y',
  'type': 'SequenceLocation',
  'name': 'NP_004324.2',
  'description': 'My location description',
  'aliases': ['Ensembl:ENSP00000288602.6'],
  'extensions': [{'id': 'sequence_location.extension:1',
    'extensions': [{'id': 'sequence_location.sub_extension:1',
      'name': 'sequence_location.sub_extensi

In [6]:
# Dump the original VRS Allele object to a dictionary, excluding default values
original_vrs_example = full_vrs_example.model_dump(exclude_none=True)

# Compare the original and round-tripped VRS Allele dictionaries for structural equality
# NOTE: Even without excluding none values, the original and translated Alleles are equivalent
original_vrs_example == translated_allele


True

### Conclusion
This notebook demonstrates that a fully populated VRS Allele 2.0 can be successfully translated into a FHIR Allele Profile. The resulting FHIR profile can then be translated back into the original VRS Allele, completing a round-trip translation in which the final output matches the original input.