### Demonstration Overview: Translating a VRS (version 2.0) object into an Allele Profile

This notebook demonstrates the translation of GA4GH VRS (version 2.0) into the HL7 FHIR Allele format. 

In this notebook, we use example data to generate VRS objects with the `vrs-python` library. Once created, these objects can be converted into FHIR-compliant Allele using the `translate_allele_to_fhir` method from the `VrsFhirAlleleTranslator` class. This method enables a one-way transformation from VRS to FHIR Allele.

### Prerequisites and Setup

To support the one way transformation from VRS Allele to FHIR Alelle of this demonstration, we set up the environment by importing the necessary libraries and modules. These include:

1. **External Package**:
   - `models` from `ga4gh.vrs`: Provides foundational data models for working with GA4GH Variation Representation Schema (VRS).

2. **Custom Project Modules**:
   - `VariantNormalizer` from `vrs_tools.normalizer`: A utility for normalizing a vrs allele object.
   - `VrsFhirAlleleTranslator` from `translators.vrs_fhir_translator`: A translation component for converting VRS alleles into FHIR Allele.

In [1]:
# importing the vrs models
from ga4gh.vrs.models import SequenceLocation,SequenceReference,LiteralSequenceExpression,sequenceString,Allele
from translators.vrs_fhir_translator import VrsFhirAlleleTranslator
from vrs_tools.normalizer import VariantNormalizer
from ga4gh.vrs.dataproxy import create_dataproxy


In [2]:
dp = create_dataproxy(uri="seqrepo+file:///usr/local/share/seqrepo/2024-12-20")

allele_translator = VrsFhirAlleleTranslator(dp=dp)
norm = VariantNormalizer(dp=dp)

### Example 1: Creating, Normalizing, and Translating a VRS Allele

In this example, we first generate a VRS allele object and normalize it. Next, we translate the normalized allele into a FHIR-compliant Allele.

In [3]:
# Creating the VRS object and then normalizing it.
# Example 1 - Deletion origin: "NC_000001.11:g.1014265del"
start = 1014263
end = 1014265
refseq = "NC_000001.11"
alt_seq = "C"


refget_accession = dp.derive_refget_accession(f"refseq:{refseq}")
seq_ref = SequenceReference(
    refgetAccession=refget_accession.split("refget:")[-1]
    )

seq_location = SequenceLocation(
    sequenceReference=seq_ref,
    start = start,
    end=end,
)

lit_seq_expr = LiteralSequenceExpression(
    sequence=sequenceString(alt_seq)
)

del_example_1 = Allele(
    location=seq_location,
    state=lit_seq_expr
)

norm_del_example_1 = norm.normalize(del_example_1)
norm_del_example_1.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.LFsYSeoQjboSTao-ChLlORiHwqUgh_Q1',
 'type': 'Allele',
 'digest': 'LFsYSeoQjboSTao-ChLlORiHwqUgh_Q1',
 'location': {'id': 'ga4gh:SL.avvnxuqix2Teyyqc1jEwbb8-cE2FLIv9',
  'type': 'SequenceLocation',
  'digest': 'avvnxuqix2Teyyqc1jEwbb8-cE2FLIv9',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'},
  'start': 1014263,
  'end': 1014265},
 'state': {'type': 'ReferenceLengthExpression',
  'length': 1,
  'sequence': 'C',
  'repeatSubunitLength': 1}}

In [4]:
# Translating the normalized allele into an allele profile
allele_profile_del_example = allele_translator.translate_allele_to_fhir(norm_del_example_1)

print(type(allele_profile_del_example))
allele_profile_del_example.model_dump()

<class 'profiles.allele.Allele'>


{'resourceType': 'MolecularDefinition',
 'contained': [{'resourceType': 'MolecularDefinition',
   'id': 'ref-to-nc000001',
   'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
      'code': 'dna',
      'display': 'DNA Sequence'}]},
   'representation': [{'code': [{'coding': [{'system': 'http://www.ncbi.nlm.nih.gov/refseq',
         'code': 'NC_000001.11'}]}]}]}],
 'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
    'code': 'dna',
    'display': 'DNA Sequence'}]},
 'location': [{'sequenceLocation': {'sequenceContext': {'reference': '#ref-to-nc000001',
     'type': 'MolecularDefinition'},
    'coordinateInterval': {'coordinateSystem': {'system': {'coding': [{'system': 'http://loinc.org',
         'code': 'LA30100-4',
         'display': '0-based interval counting'}]},
      'origin': {'coding': [{'system': 'http://hl7.org/fhir/uv/molecular-definition-data-types/CodeSystem/coordinate-origin',
         'code': 'sequence-start',
       

### Example 2 

In [5]:
# Creating the VRS object and then normalizing it.
# Example 2 - Insertion origin: "NC_000001.11:g.113901365_113901366insATA"
start = 113901365
end = 113901365
refseq = "NC_000001.11"
alt_seq = "ATA"


refget_accession = dp.derive_refget_accession(f"refseq:{refseq}")
seq_ref = SequenceReference(
    refgetAccession=refget_accession.split("refget:")[-1]
    )

seq_location = SequenceLocation(
    sequenceReference=seq_ref,
    start = start,
    end=end,
)
lit_seq_expr = LiteralSequenceExpression(
    sequence=sequenceString(alt_seq)
)
insertion_example = Allele(
    location=seq_location,
    state=lit_seq_expr
)

norm_insertion_example = norm.normalize(insertion_example)
norm_insertion_example.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.3edM6TTGAmx8DnPV-uzA6IYlAfatAP2s',
 'type': 'Allele',
 'digest': '3edM6TTGAmx8DnPV-uzA6IYlAfatAP2s',
 'location': {'id': 'ga4gh:SL.OUMCiUkn_AGlFuFCFTdfppig932_HV2k',
  'type': 'SequenceLocation',
  'digest': 'OUMCiUkn_AGlFuFCFTdfppig932_HV2k',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'},
  'start': 113901365,
  'end': 113901365},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'ATA'}}

In [6]:
# Translating the normalized allele into an allele profile
allele_profile_insertion = allele_translator.translate_allele_to_fhir(norm_insertion_example)

print(type(allele_profile_insertion))
allele_profile_insertion.model_dump()

<class 'profiles.allele.Allele'>


{'resourceType': 'MolecularDefinition',
 'contained': [{'resourceType': 'MolecularDefinition',
   'id': 'ref-to-nc000001',
   'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
      'code': 'dna',
      'display': 'DNA Sequence'}]},
   'representation': [{'code': [{'coding': [{'system': 'http://www.ncbi.nlm.nih.gov/refseq',
         'code': 'NC_000001.11'}]}]}]}],
 'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
    'code': 'dna',
    'display': 'DNA Sequence'}]},
 'location': [{'sequenceLocation': {'sequenceContext': {'reference': '#ref-to-nc000001',
     'type': 'MolecularDefinition'},
    'coordinateInterval': {'coordinateSystem': {'system': {'coding': [{'system': 'http://loinc.org',
         'code': 'LA30100-4',
         'display': '0-based interval counting'}]},
      'origin': {'coding': [{'system': 'http://hl7.org/fhir/uv/molecular-definition-data-types/CodeSystem/coordinate-origin',
         'code': 'sequence-start',
       

### Example 3

In [7]:
# Creating the VRS object and then normalizing it.
# Example 3 - Substituion origin: "NC_000002.12:g.27453449C>T"
start = 27453448
end = 27453449
refseq = "NC_000002.12"
alt_seq = "T"


refget_accession = dp.derive_refget_accession(f"refseq:{refseq}")
seq_ref = SequenceReference(
    refgetAccession=refget_accession.split("refget:")[-1]
    )

seq_location = SequenceLocation(
    sequenceReference=seq_ref,
    start = start,
    end=end,
)
lit_seq_expr = LiteralSequenceExpression(
    sequence=sequenceString(alt_seq)
)
sub_example = Allele(
    location=seq_location,
    state=lit_seq_expr
)

norm_sub_example = norm.normalize(sub_example)
norm_sub_example.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.xfKU4c8mG_yegL5ZOL26JDiznySNkoMl',
 'type': 'Allele',
 'digest': 'xfKU4c8mG_yegL5ZOL26JDiznySNkoMl',
 'location': {'id': 'ga4gh:SL.y0ckc1_lhMYKnh0f6FAEoEpgHyfX13OW',
  'type': 'SequenceLocation',
  'digest': 'y0ckc1_lhMYKnh0f6FAEoEpgHyfX13OW',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.pnAqCRBrTsUoBghSD1yp_jXWSmlbdh4g'},
  'start': 27453448,
  'end': 27453449},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'T'}}

In [8]:
# Translating the normalized allele into an allele profile
allele_profile_sub_example = allele_translator.translate_allele_to_fhir(norm_sub_example)

print(type(allele_profile_sub_example))
allele_profile_sub_example.model_dump()

<class 'profiles.allele.Allele'>


{'resourceType': 'MolecularDefinition',
 'contained': [{'resourceType': 'MolecularDefinition',
   'id': 'ref-to-nc000002',
   'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
      'code': 'dna',
      'display': 'DNA Sequence'}]},
   'representation': [{'code': [{'coding': [{'system': 'http://www.ncbi.nlm.nih.gov/refseq',
         'code': 'NC_000002.12'}]}]}]}],
 'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
    'code': 'dna',
    'display': 'DNA Sequence'}]},
 'location': [{'sequenceLocation': {'sequenceContext': {'reference': '#ref-to-nc000002',
     'type': 'MolecularDefinition'},
    'coordinateInterval': {'coordinateSystem': {'system': {'coding': [{'system': 'http://loinc.org',
         'code': 'LA30100-4',
         'display': '0-based interval counting'}]},
      'origin': {'coding': [{'system': 'http://hl7.org/fhir/uv/molecular-definition-data-types/CodeSystem/coordinate-origin',
         'code': 'sequence-start',
       

### Conclusion

In this notebook, we demonstrated the translation of GA4GH VRS alleles into HL7 FHIR Allele. We began by creating VRS allele objects, normalizing them, and converting them into FHIR-compliant representations using the VrsFhirAlleleTranslation class. This process enhances interoperability between GA4GH and HL7 FHIR standards, enabling seamless integration of genomic data into healthcare systems. The examples included deletion, insertion, and substitution variants.

We recognize that VRS is continuously evolving, with VRS 2.0 on the horizon. As the new version stabilizes, we plan to assess its impact and potentially refactor our implementation to align with the latest standard, ensuring continued compatibility and functionality.