### Demonstration Overview: Translating a VRS (version 1.3) object into an Allele Profile

This notebook demonstrates the translation of GA4GH VRS (Version 1.3) into the HL7 FHIR AlleleProfile format. At the time of development, VRS 1.3 was the stable version of the model. However, the VRS community is actively working toward establishing VRS 2.0 as the next stable release. For more details on VRS, refer to the official [VRS Documentation](https://vrs.ga4gh.org/en/1.3/).  

In this notebook, we use example data to generate VRS objects with the `vrs-python` library. Once created, these objects can be converted into FHIR-compliant AlleleProfiles using the `vrs_allele_to_allele_profile` method from the `VrsFhirAlleleTranslation` class. This method enables a one-way transformation from VRS to FHIR AlleleProfiles.

### Prerequisites and Setup

To support the one way transformation from VRS Allele to AlelleProfile of this demonstration, we set up the environment by importing the necessary libraries and modules. These include:

1. **External Package**:
   - `models` from `ga4gh.vrs`: Provides foundational data models for working with GA4GH Variation Representation Schema (VRS).

2. **Custom Project Modules**:
   - `AlleleNormalizer` from `normalize.allele_normalizer`: A utility for normalizing a vrs allele object.
   - `VrsFhirAlleleTranslation` from `moldeftranslator.allele_translator`: A translation component for converting VRS alleles into Allele Profile.

In [12]:
# importing the vrs models
from ga4gh.vrs import models
from moldeftranslator.allele_translator import VrsFhirAlleleTranslation
from normalize.allele_normalizer import AlleleNormalizer
from ga4gh.vrs.models import SequenceLocation,SequenceReference,LiteralSequenceExpression,sequenceString,Allele
from api.seqrepo_api import SeqRepoAPI

seqrepo_api = SeqRepoAPI()
normalize = AlleleNormalizer()
alleleTranslator = VrsFhirAlleleTranslation()

### Example 1

In [None]:
refget_accession = seqrepo_api.seqrepo_dataproxy.derive_refget_accession(f"refseq:{refseq}")
seq_ref = SequenceReference(
    refgetAccession=refget_accession.split("refget:")[-1]
    )

seq_location = SequenceLocation(
    sequenceReference=seq_ref,
    start = start,
    end=end,
)
lit_seq_expr = LiteralSequenceExpression(
    sequence=sequenceString(alt_seq)
)
allele = Allele(
    location=seq_location,
    state=lit_seq_expr
)

In [13]:
# Creating the VRS object and then normalizing it.
# Example 1 - Insertion origin: "NC_000001.11:g.113901365_113901366insATA"
start = 113901365
end = 113901365
refseq = "NC_000001.11"
alt_seq = "ATA"

refget_accession = seqrepo_api.seqrepo_dataproxy.derive_refget_accession(f"refseq:{refseq}")
seq_ref = SequenceReference(
    refgetAccession=refget_accession.split("refget:")[-1]
    )

seq_location = SequenceLocation(
    sequenceReference=seq_ref,
    start = start,
    end=end,
)
lit_seq_expr = LiteralSequenceExpression(
    sequence=sequenceString(alt_seq)
)
insertion_example = Allele(
    location=seq_location,
    state=lit_seq_expr
)
norm_insertion_example= normalize.post_normalize_allele(insertion_example)
norm_insertion_example.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.3edM6TTGAmx8DnPV-uzA6IYlAfatAP2s',
 'type': 'Allele',
 'digest': '3edM6TTGAmx8DnPV-uzA6IYlAfatAP2s',
 'location': {'id': 'ga4gh:SL.OUMCiUkn_AGlFuFCFTdfppig932_HV2k',
  'type': 'SequenceLocation',
  'digest': 'OUMCiUkn_AGlFuFCFTdfppig932_HV2k',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO'},
  'start': 113901365,
  'end': 113901365},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'ATA'}}

In [14]:
# Translating the normalized allele into an allele profile
allele_profile_insertion = alleleTranslator.vrs_allele_to_allele_profile(norm_insertion_example)

print(type(allele_profile_insertion))
allele_profile_insertion.model_dump()

<class 'profiles.alleleprofile.AlleleProfile'>


{'resourceType': 'MolecularDefinition',
 'contained': [{'resourceType': 'MolecularDefinition',
   'id': 'ref-to-nc000001',
   'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
      'code': 'dna',
      'display': 'DNA Sequence'}]},
   'representation': [{'code': [{'coding': [{'system': 'http://www.ncbi.nlm.nih.gov/refseq',
         'code': 'NC_000001.11'}]}]}]}],
 'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
    'code': 'dna',
    'display': 'DNA Sequence'}]},
 'location': [{'sequenceLocation': {'sequenceContext': {'reference': '#ref-to-nc000001',
     'type': 'MolecularDefinition'},
    'coordinateInterval': {'coordinateSystem': {'system': {'coding': [{'system': 'http://loinc.org',
         'code': 'LA30100-4',
         'display': '0-based interval counting'}]}},
     'startQuantity': {'value': 113901365.0},
     'endQuantity': {'value': 113901365.0}}}}],
 'representation': [{'focus': {'coding': [{'system': 'http://hl7.org/fhir

### Example 2

In [15]:
# Creating the VRS object and then normalizing it.
# Example 2 - Substituion origin: "NC_000002.12:g.27453449C>T"
start = 27453448
end = 27453449
refseq = "NC_000002.12"
alt_seq = "T"

refget_accession = seqrepo_api.seqrepo_dataproxy.derive_refget_accession(f"refseq:{refseq}")
seq_ref = SequenceReference(
    refgetAccession=refget_accession.split("refget:")[-1]
    )

seq_location = SequenceLocation(
    sequenceReference=seq_ref,
    start = start,
    end=end,
)
lit_seq_expr = LiteralSequenceExpression(
    sequence=sequenceString(alt_seq)
)
sub_example = Allele(
    location=seq_location,
    state=lit_seq_expr
)
norm_sub_example= normalize.post_normalize_allele(sub_example)
norm_sub_example.model_dump(exclude_none=True)

{'id': 'ga4gh:VA.xfKU4c8mG_yegL5ZOL26JDiznySNkoMl',
 'type': 'Allele',
 'digest': 'xfKU4c8mG_yegL5ZOL26JDiznySNkoMl',
 'location': {'id': 'ga4gh:SL.y0ckc1_lhMYKnh0f6FAEoEpgHyfX13OW',
  'type': 'SequenceLocation',
  'digest': 'y0ckc1_lhMYKnh0f6FAEoEpgHyfX13OW',
  'sequenceReference': {'type': 'SequenceReference',
   'refgetAccession': 'SQ.pnAqCRBrTsUoBghSD1yp_jXWSmlbdh4g'},
  'start': 27453448,
  'end': 27453449},
 'state': {'type': 'LiteralSequenceExpression', 'sequence': 'T'}}

In [16]:
# Translating the normalized allele into an allele profile
allele_profile_sub_example = alleleTranslator.vrs_allele_to_allele_profile(norm_sub_example)

print(type(allele_profile_sub_example))
allele_profile_sub_example.model_dump()

<class 'profiles.alleleprofile.AlleleProfile'>


{'resourceType': 'MolecularDefinition',
 'contained': [{'resourceType': 'MolecularDefinition',
   'id': 'ref-to-nc000002',
   'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
      'code': 'dna',
      'display': 'DNA Sequence'}]},
   'representation': [{'code': [{'coding': [{'system': 'http://www.ncbi.nlm.nih.gov/refseq',
         'code': 'NC_000002.12'}]}]}]}],
 'moleculeType': {'coding': [{'system': 'http://hl7.org/fhir/sequence-type',
    'code': 'dna',
    'display': 'DNA Sequence'}]},
 'location': [{'sequenceLocation': {'sequenceContext': {'reference': '#ref-to-nc000002',
     'type': 'MolecularDefinition'},
    'coordinateInterval': {'coordinateSystem': {'system': {'coding': [{'system': 'http://loinc.org',
         'code': 'LA30100-4',
         'display': '0-based interval counting'}]}},
     'startQuantity': {'value': 27453448.0},
     'endQuantity': {'value': 27453449.0}}}}],
 'representation': [{'focus': {'coding': [{'system': 'http://hl7.org/fhir/m

### Conclusion

In this notebook, we demonstrated the translation of GA4GH VRS alleles into HL7 FHIR AlleleProfiles. We began by creating VRS allele objects, normalizing them, and converting them into FHIR-compliant representations using the VrsFhirAlleleTranslation class. This process enhances interoperability between GA4GH and HL7 FHIR standards, enabling seamless integration of genomic data into healthcare systems. The examples included deletion, insertion, and substitution variants.

We recognize that VRS is continuously evolving, with VRS 2.0 on the horizon. As the new version stabilizes, we plan to assess its impact and potentially refactor our implementation to align with the latest standard, ensuring continued compatibility and functionality.