### Demonstration Overview: Translating an Allele Profile into a  VRS (version 1.3) object

This notebook demonstrates the conversion of an **HL7 FHIR AlleleProfile** into a **GA4GH VRS (Version 1.3)** representation. Using example data, an `AlleleProfile` object is generated and subsequently translated into a VRS allele via the `translate_allele_profile_to_vrs_allele` method from the `VrsFhirAlleleTranslation` class. This method facilitates a one-way transformation from **FHIR AlleleProfiles** to **VRS Alleles**.

For a more comprehensive understanding of the **HL7 FHIR AlleleProfile**, we recommend reviewing the [AlleleProfile Demo Notebook](allele_profile_demo.ipynb) notebook as well as the [HL7 FHIR MolecularDefinition Documentation](https://build.fhir.org/branches/cg-im-moldef_work_in_progress_2/moleculardefinition.html).


### Prerequisites and Setup

To support the one way transformation from AlelleProfile to VRS Allele of this demonstration, we set up the environment by importing the necessary libraries and modules. These include:

1. **Custom Project Modules**:
   - `AlleleProfile` from `profiles.alleleprofile`: A data structure representing an HL7 FHIR AlleleProfile
   - `VrsFhirAlleleTranslation` from `moldeftranslator.allele_translator`: A translation component for converting VRS alleles into Allele Profile.

In [9]:
from profiles.alleleprofile import AlleleProfile
from moldeftranslator.allele_translator import VrsFhirAlleleTranslation

alleleTrans = VrsFhirAlleleTranslation()

### Data Requirements for Translation

While the HL7 FHIR AlleleProfile structure defines certain fields with varying cardinalities, this notebook requires specific data elements to be present for successful translation into a GA4GH VRS Allele. These fields are essential for ensuring accurate and meaningful conversion, even if they are not strictly mandated by the FHIR specification. 

* **Reference Sequence**: `location[0]["sequenceLocation"]["sequenceContext"]["display"]`  
* **Coordinate System**: `location[0]["sequenceLocation"]["coordinateInterval"]["coordinateSystem"]["system"]["coding"][0]["code"]`  
* **Start Position**: `location[0]["sequenceLocation"]["coordinateInterval"]["startQuantity"]["value"]`  
* **End Position**: `location[0]["sequenceLocation"]["coordinateInterval"]["endQuantity"]["value"]`  
* **Literal Value (Allele Representation)**: `representation[0]["literal"]["value"]`  

⚠ **Note:** The translation step includes a validation process that verifies the presence of these required fields. If any of these fields are missing, an error will be raised, and the translation will not be performed.

### Exmaple 1

In [None]:
example_substitution ={
    "resourceType" : "MolecularDefinition",
    "id" : "demo-example-hgvs-substitution",
    "meta" : {
      "profile" : ["http://hl7.org/fhir/StructureDefinition/allelesliced"]
    },
    "moleculeType" : {
      "coding" : [{
        "system" : "http://hl7.org/fhir/sequence-type",
        "code" : "dna",
        "display" : "DNA Sequence"
      }]
    },
    "location" : [
        {
      "sequenceLocation" : {
        "sequenceContext" : {
          "reference" : "MolecularDefinition/example-sequence-nc000002-url",
          "type" : "MolecularDefinition",
          # Example needs to contain the reference sequence for translation 
          "display" : "NC_000002.12"
        },
        "coordinateInterval" : {
          "coordinateSystem" : {
            "system" : {
              "coding" : [{
                "system" : "http://loinc.org",
                "code" : "LA30100-4",
                # Example needs to contain the systems coordinate for translation 
                "display" : "0-based interval counting" 
              }],
              "text" : "0-based interval counting"
            }
          },
          # Example needs to contain the startQuantity for translation 
          "startQuantity" : {
            "value" : 27453448
          },
          # Example needs to contain the endQuantity for translation 
          "endQuantity" : {
            "value" : 27453449
          }
        }
      }
    }
    ],
    "representation" : [{
      "focus" : {
        "coding" : [{
          # Example needs to contain system 
          "system" : "http://hl7.org/fhir/moleculardefinition-focus",
          # Example needs to contain code 
          "code" : "allele-state",
          "display" : "Allele State"
        }]
      },
      "literal" : {
        # Example needs to contain the literal value for translation
        "value" : "T"
      }
    }]
  }

In [3]:
example_allele_substitution = AlleleProfile(**example_substitution)
example_allele_substitution.model_dump()

{'resourceType': 'MolecularDefinition',
 'id': 'demo-example-hgvs-substitution',
 'meta': {'profile': ['http://hl7.org/fhir/StructureDefinition/allelesliced']},
 'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
    'code': 'dna',
    'display': 'DNA Sequence'}]},
 'location': [{'sequenceLocation': {'sequenceContext': {'reference': 'MolecularDefinition/example-sequence-nc000002-url',
     'type': 'MolecularDefinition',
     'display': 'NC_000002.12'},
    'coordinateInterval': {'coordinateSystem': {'system': {'coding': [{'system': 'http://loinc.org',
         'code': 'LA30100-4',
         'display': '0-based interval counting'}],
       'text': '0-based interval counting'}},
     'startQuantity': {'value': Decimal('27453448')},
     'endQuantity': {'value': Decimal('27453449')}}}}],
 'representation': [{'focus': {'coding': [{'system': 'http://hl7.org/fhir/moleculardefinition-focus',
      'code': 'allele-state',
      'display': 'Allele State'}]},
   'literal'

### Normalized VRS allele translation

In [4]:
vrs_example_allele_substitution_norm = alleleTrans.translate_allele_profile_to_vrs_allele(example_allele_substitution)
vrs_example_allele_substitution_norm.as_dict()

{'_id': 'ga4gh:VA.fXvhngewkkyVwzEeSJRr5tro8Jcol6Q-',
 'type': 'Allele',
 'location': {'_id': 'ga4gh:VSL.nLMbYalHO4OEI2axqkyTMCQxrH98UpDN',
  'type': 'SequenceLocation',
  'sequence_id': 'ga4gh:SQ.pnAqCRBrTsUoBghSD1yp_jXWSmlbdh4g',
  'interval': {'type': 'SequenceInterval',
   'start': {'type': 'Number', 'value': 27453448},
   'end': {'type': 'Number', 'value': 27453449}}},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'T'}}

### Non-normalized VRS allele translation

In [5]:
vrs_example_allele_substitution_unnorm = alleleTrans.translate_allele_profile_to_vrs_allele(example_allele_substitution,normalize=False)
vrs_example_allele_substitution_unnorm.as_dict()

{'type': 'Allele',
 'location': {'type': 'SequenceLocation',
  'sequence_id': 'refseq:NC_000002.12',
  'interval': {'type': 'SequenceInterval',
   'start': {'type': 'Number', 'value': 27453448},
   'end': {'type': 'Number', 'value': 27453449}}},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'T'}}

### Example 2

In [None]:
example_insertion = {
    "resourceType" : "MolecularDefinition",
    "id" : "demo-example-hgvs-insertion",
    "meta" : {
      "profile" : ["http://hl7.org/fhir/StructureDefinition/allelesliced"]
    },
    "moleculeType" : {
      "coding" : [{
        "system" : "http://hl7.org/fhir/sequence-type",
        "code" : "dna",
        "display" : "DNA Sequence"
      }]
    },
    "location" : [
        {
      "sequenceLocation" : {
        "sequenceContext" : {
          "reference" : "MolecularDefinition/example-sequence-nc000001-url",
          "type" : "MolecularDefinition",
          # Must only contain the reference sequence  for translation 
          "display" : "NC_000001.11" 
        },
        "coordinateInterval" : {
          "coordinateSystem" : {
            "system" : {
              "coding" : [{
                "system" : "http://loinc.org",
                "code" : "LA30100-4",
                # Must only contain the systems coordinate for translation
                "display" : "0-based interval counting" 
              }],
              "text" : "0-based interval counting"
            }
          },
          # Example needs to contain the startQuantity for translation
          "startQuantity" : {
            "value" : 113901365
          },
          # Example needs to contain the endQuantity for translation 
          "endQuantity" : {
            "value" : 113901365
          }
        },
      }
    }
    ],
    "representation" : [{
      "focus" : {
        "coding" : [{
          # Example needs to contain system 
          "system" : "http://hl7.org/fhir/moleculardefinition-focus",
          # Example needs to contain code
          "code" : "allele-state",
          "display" : "Allele State"
        }]
      },
      "literal" : {
      # Example needs to contain the literal value for translation
        "value" : "ATA"
      }
    }]
  }


In [7]:
example_allele_insertion = AlleleProfile(**example_insertion)
example_allele_insertion.model_dump()

{'resourceType': 'MolecularDefinition',
 'id': 'demo-example-hgvs-insertion',
 'meta': {'profile': ['http://hl7.org/fhir/StructureDefinition/allelesliced']},
 'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
    'code': 'dna',
    'display': 'DNA Sequence'}]},
 'location': [{'sequenceLocation': {'sequenceContext': {'reference': 'MolecularDefinition/example-sequence-nc000001-url',
     'type': 'MolecularDefinition',
     'display': 'NC_000001.11'},
    'coordinateInterval': {'coordinateSystem': {'system': {'coding': [{'system': 'http://loinc.org',
         'code': 'LA30100-4',
         'display': '0-based interval counting'}],
       'text': '0-based interval counting'}},
     'startQuantity': {'value': Decimal('113901365')},
     'endQuantity': {'value': Decimal('113901365')}}}}],
 'representation': [{'focus': {'coding': [{'system': 'http://hl7.org/fhir/moleculardefinition-focus',
      'code': 'allele-state',
      'display': 'Allele State'}]},
   'literal':

In [8]:
vrs_example_allele_insertion = alleleTrans.translate_allele_profile_to_vrs_allele(example_allele_insertion)
vrs_example_allele_insertion.as_dict()

{'_id': 'ga4gh:VA.J9BMdktHGGjE843oD0T_bwUV6WxojkCW',
 'type': 'Allele',
 'location': {'_id': 'ga4gh:VSL.TMxdXtmi4ctcTRipHMD6py1Nv1kLMyJd',
  'type': 'SequenceLocation',
  'sequence_id': 'ga4gh:SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO',
  'interval': {'type': 'SequenceInterval',
   'start': {'type': 'Number', 'value': 113901365},
   'end': {'type': 'Number', 'value': 113901365}}},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'ATA'}}

### Conclusion

In this notebook, we demonstrated how to translate HL7 FHIR AlleleProfile resources into GA4GH VRS alleles. We began by using Python’s `**` unpacking syntax to generate a class instance directly from the complete JSON structure, and then converted the AlleleProfile object into a VRS Allele object using the `VrsFhirAlleleTranslation` class.

If you would like to learn more about building AlleleProfile resources step-by-step, we recommend reviewing the [AlleleProfile Demo Notebook](allele_profile_demo.ipynb) notebook.

We recognize that the HL7 FHIR MolecularDefinition schema is continuously evolving, which may affect the structure of AlleleProfile resources. As the schema changes, this code will also continue to evolve to maintain compatibility.
