### Demonstration Overview: Translating Fully Populated VRS Allele into FHIR Allele Profile

This notebook demonstrates the full translation of a fully populated VRS Allele into a FHIR Allele Profile.

Unlike the `vrs_allele_translation.ipynb` notebook—which showcases a translator modeled after the `vrs-python` translator module and operates on its output—this notebook uses an updated translator capable of handling all attributes of a fully populated VRS Allele object.

The previous translator only supports a minimal subset of the VRS Allele data, as the `vrs-python` module does not output every possible attribute. In contrast, this notebook demonstrates a new translation approach that supports full fidelity translation of rich VRS Allele data into the FHIR Allele Profile.

Throughout the notebook, we document areas where certain VRS elements may not map cleanly to FHIR and provide commentary on those limitations.


In [1]:
# Importing the necessary modules and classes
from ga4gh.vrs.models import Allele
from translators.allele_translator import VrsFhirAlleleTranslation
from normalizers.allele_normalizer import AlleleNormalizer
from api.seqrepo import SeqRepoAPI
from translators.vrs_to_fhir import VRSAlleleToFHIRTranslator
from translators.fhir_to_vrs import FhirToVrsAllele
import json

normalize = AlleleNormalizer()
alleleTranslator = VrsFhirAlleleTranslation()
seqrepo_api = SeqRepoAPI()
vrs_to= VRSAlleleToFHIRTranslator()
fhir_to = FhirToVrsAllele()


In [2]:
example_synthetic_data={
        "id": "ga4gh:VA.j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L",
        "type": "Allele",
        "name": "V600E",
        "description": "BRAF V600E variant",
        "digest": "j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L",
        "expressions": [
            {"id": "expression:1", "syntax": "hgvs.p", "value": "NP_004324.2:p.Val600Glu", "syntax_version": "21.0"},
        ],
        "aliases": ["VAL600GLU", "V640E", "VAL640GLU"],
        "extensions": [
            {
                "name": "civic_variant_url",
                "value": "civicdb.org/links/variants/12",
                "description": "CIViC Variant URL",
                "extensions": [
                        {   "id": "extension.sub_extension:1",
                            "name": "extension.sub_extension.name",
                            "value":"extension.sub_extension.value",
                            "description": "extension.sub_extension.description"
                        }
                ]
            }
        ],
        "location": {
            "id": "ga4gh:SL.t-3DrWALhgLdXHsupI-e-M00aL3HgK3y",
            "name": "NP_004324.2",
            "description": "My location description",
            "digest": "t-3DrWALhgLdXHsupI-e-M00aL3HgK3y",
            "type": "SequenceLocation",
            "sequenceReference": {
                "refgetAccession": "SQ.cQvw4UsHHRRlogxbWCB8W-mKD4AraM9y",
                "type": "SequenceReference",
                "residueAlphabet": "aa",
                "moleculeType": "protein",
                "circular": False,
                "sequence": "V", # A sequenceString that is a literal representation of the referenced sequence.
                "extensions": [
                    {   
                        "id": "sequence_reference.extension:1",
                        "name": "sequence_reference.name",
                        "value": "sequence_reference.value",
                        "description": "sequence_reference.description",
                        "extensions": [
                            {   "id": "sequence_reference.sub_extension:1",
                                "name": "sequence_reference.sub_extension.name",
                                "value":"sequence_reference.sub_extension.value",
                                "description": "sequence_reference.sub_extension.description"
                            }
                        ] 
                    }
                ]
            },
            "aliases": ["Ensembl:ENSP00000288602.6"],
            "start": 599,
            "end": 600,
            "sequence": "V", # The literal sequence encoded by the sequenceReference at these coordinates.
            "extensions": [
                {   
                    "id": "sequence_location.extension:1",
                    "name": "sequence_location.name",
                    "value": "sequence_location.value",
                    "description": "sequence_location.description",
                    "extensions": [
                        {   "id": "sequence_location.sub_extension:1",
                            "name": "sequence_location.sub_extension.name",
                            "value":"sequence_location.sub_extension.value",
                            "description": "sequence_location.sub_extension.description"
                        }
                    ] 
                }
            ]
        },
        "state": {
            "id": "state:1",
            "name": "state",
            "description": "My description for state",
            "sequence": "E", 
            "type": "LiteralSequenceExpression",
            "extensions":[
                {   
                    "id": "state.extension:1",
                    "name": "state.name",
                    "value": "state.value",
                    "description": "state.description",
                    "extensions": [
                        {   "id": "state.sub_extension:1",
                            "name": "state.sub_extension.name",
                            "value":"state.sub_extension.value",
                            "description": "state.sub_extension.description"
                        }
                    ] 
                },
            ],
            "aliases": ["my_sequence"]
        },
    }

In [3]:
# Create a fully populated VRS Allele object using example synthetic data and display its contents
full_vrs_example = Allele(**example_synthetic_data)

In [4]:
# Translate a full VRS Allele object into its FHIR AlleleProfile representation
transalted_fhir_allele_profile  = vrs_to.full_allele_translator(full_vrs_example)

In [5]:
# Serialize the FHIR AlleleProfile to a formatted JSON string for display
print(json.dumps(transalted_fhir_allele_profile.model_dump(), indent=2))

{
  "resourceType": "MolecularDefinition",
  "contained": [
    {
      "resourceType": "MolecularDefinition",
      "id": "vrs-location-sequence",
      "moleculeType": {
        "coding": [
          {
            "system": "http://hl7.org/fhir/sequence-type",
            "code": "protein"
          }
        ]
      },
      "representation": [
        {
          "literal": {
            "value": "V"
          }
        }
      ]
    },
    {
      "resourceType": "MolecularDefinition",
      "id": "vrs-location-sequenceReference",
      "extension": [
        {
          "id": "sequence_reference.extension:1",
          "extension": [
            {
              "url": "https://github.com/ga4gh/gks-core/blob/1.0/schema/gks-core/json/Extension#properties/name",
              "valueString": "sequence_reference.name"
            },
            {
              "url": "https://github.com/ga4gh/gks-core/blob/1.0/schema/gks-core/json/Extension#properties/value",
              "valueStrin

In [6]:
# Translate a FHIR Allele Profile object into its VRS Allele object 
fhir_to.full_allele_translator(transalted_fhir_allele_profile).model_dump(exclude_none=True)

{'id': 'ga4gh:VA.j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L',
 'type': 'Allele',
 'name': 'V600E',
 'description': 'BRAF V600E variant',
 'aliases': ['VAL600GLU', 'V640E', 'VAL640GLU'],
 'digest': 'j4XnsLZcdzDIYa5pvvXM7t1wn9OITr0L',
 'expressions': [{'id': 'expression:1',
   'syntax': 'hgvs.p',
   'value': 'NP_004324.2:p.Val600Glu',
   'syntax_version': '21.0'}],
 'location': {'id': 'ga4gh:SL.t-3DrWALhgLdXHsupI-e-M00aL3HgK3y',
  'type': 'SequenceLocation',
  'name': 'NP_004324.2',
  'description': 'My location description',
  'aliases': ['Ensembl:ENSP00000288602.6'],
  'extensions': [{'id': 'sequence_location.extension:1',
    'extensions': [{'id': 'sequence_location.sub_extension:1',
      'name': 'sequence_location.sub_extension.name',
      'value': 'sequence_location.sub_extension.value',
      'description': 'sequence_location.sub_extension.description'}],
    'name': 'sequence_location.name',
    'value': 'sequence_location.value',
    'description': 'sequence_location.description'}],
  'd