### SPDI Module Overview

The `SPDI` module provides functionality for creating, validating, and representing SPDI expressions.


#### Features
- **Validation of SPDI Expressions**: SPDI expressions undergo validation to ensure adherence to the SPDI format rules.
- **Representation Conversion**: SPDI objects can be converted to strings or dictionaries, enhancing their usability and interoperability.

#### Dependencies
The SPDI module does not have external dependencies and operates solely on Python's built-in functionality.


In [1]:
import json

# Import SPDI class 
from src.spdi.spdi_class import SPDI

# SPDI translation method
from src.spdi.spdi_utils import SPDITranslate
spdi_translator = SPDITranslate()



#### Creating a SPDI Expression
* Create SPDI objects with validation steps that check all 4 attributes (sequence:position:deletion:insertion).

In [2]:
spdi_example_data = [
    # Example of Deletion
    {'sequence': 'NC_000001.11', 'position': '1014263', 'deletion': 'CC', 'insertion': 'C'},
    # Example of Insertion
    {'sequence': 'NC_000001.11', 'position': '113901365', 'deletion': '', 'insertion': 'ATA'},
    # Example of Duplication
    {'sequence': 'NC_000001.11', 'position': '5880117', 'deletion': 'TGAGCTTCCA', 'insertion': 'TGAGCTTCCATGAGCTTCCA'}
    ]

spdi_objects= []
print("SPDI objects created:")
for spdi in spdi_example_data: 
    spdi_objects.append(SPDI(**spdi))
spdi_objects

SPDI objects created:


[<src.spdi.spdi_class.SPDI at 0x11027b7d0>,
 <src.spdi.spdi_class.SPDI at 0x105c40690>,
 <src.spdi.spdi_class.SPDI at 0x110264310>]

#### Methods inside of the SPDI module that is able to convert a SPDI object to a string and dictionary.

In [3]:
# Converting the SPDI object to a string using SPDI class method: to_string()
# The string format is: sequence:position:deletion:insertion
print("SPDI Objects to String:")
for spdi_object in spdi_objects:
    print(spdi_object.to_string())


SPDI Objects to String:
NC_000001.11:1014263:CC:C
NC_000001.11:113901365::ATA
NC_000001.11:5880117:TGAGCTTCCA:TGAGCTTCCATGAGCTTCCA


In [4]:
# Taking a SPDI object and converting it to a SPDI dictionary
print('SPDI Object to Dictionary:')
for spdi_object in spdi_objects:
    print(spdi_object.to_dict())

SPDI Object to Dictionary:
{'sequence': 'NC_000001.11', 'position': '1014263', 'deletion': 'CC', 'insertion': 'C'}
{'sequence': 'NC_000001.11', 'position': '113901365', 'deletion': '', 'insertion': 'ATA'}
{'sequence': 'NC_000001.11', 'position': '5880117', 'deletion': 'TGAGCTTCCA', 'insertion': 'TGAGCTTCCATGAGCTTCCA'}


### SPDI Module Overview

The SPDITranslate module facilitates the translation of SPDI expressions to HGVS and VRS formats. It utilizes external APIs for translation. 

#### Features

- **Translation to Right-Shift HGVS**: Converts SPDI expressions to right shift HGVS using the NCBI Variation Services API.
- **Translation to VRS**: Translates SPDI expressions to VRS using the VRS python translator module.

#### Dependencies
- **External APIs**:
  - Biocmmons SeqRepo API
  - NCBI Variation Services API


In [5]:
# Taking a SPDI string and converting it to a rightshift HGVS expression
for spdi in spdi_objects:
    print(f'SPDI Expression: {spdi.to_string()}') 
    print(f'Translated to HGVS: {spdi_translator.from_spdi_to_rightshift_hgvs(spdi)}\n')


SPDI Expression: NC_000001.11:1014263:CC:C
Translated to HGVS: NC_000001.11:g.1014265del

SPDI Expression: NC_000001.11:113901365::ATA
Translated to HGVS: NC_000001.11:g.113901365_113901366insATA

SPDI Expression: NC_000001.11:5880117:TGAGCTTCCA:TGAGCTTCCATGAGCTTCCA
Translated to HGVS: NC_000001.11:g.5880118_5880127dup



In [6]:
for spdi in spdi_objects:
    print(f'SPDI Expression: {spdi.to_string()}') 
    print(f'Translated to VRS:\n{json.dumps(spdi_translator.from_spdi_to_vrs(spdi).as_dict(),indent = 2)}\n')

SPDI Expression: NC_000001.11:1014263:CC:C
Translated to VRS:
{
  "_id": "ga4gh:VA.BmF3zr2l6XLpLaK8GInM6Q3Emc3JyPD3",
  "type": "Allele",
  "location": {
    "_id": "ga4gh:VSL.i6Of9s2jVDuJ4vwU6sCeG-jT7ygmlfx6",
    "type": "SequenceLocation",
    "sequence_id": "ga4gh:SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
    "interval": {
      "type": "SequenceInterval",
      "start": {
        "type": "Number",
        "value": 1014263
      },
      "end": {
        "type": "Number",
        "value": 1014265
      }
    }
  },
  "state": {
    "type": "LiteralSequenceExpression",
    "sequence": "C"
  }
}

SPDI Expression: NC_000001.11:113901365::ATA
Translated to VRS:
{
  "_id": "ga4gh:VA.J9BMdktHGGjE843oD0T_bwUV6WxojkCW",
  "type": "Allele",
  "location": {
    "_id": "ga4gh:VSL.TMxdXtmi4ctcTRipHMD6py1Nv1kLMyJd",
    "type": "SequenceLocation",
    "sequence_id": "ga4gh:SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
    "interval": {
      "type": "SequenceInterval",
      "start": {
        "type": "Num

### CVCTranslator Module Overview

The `CVCTranslator` module offers functionality for translating variations from HGVS, SPDI, or VRS formats into a standardized representation known as `CoreVariantClass`.

#### Features
- **SPDI to CoreVariantClass Translation**: Translates SPDI expressions into CoreVariantClass objects.

- **HGVS to CoreVariantClass Translation**: Translates HGVS expressions into CoreVariantClass objects.

- **VRS to CoreVariantClass Translation**: Translates VRS expressions into CoreVariantClass objects.

#### Dependencies
- **External APIs**:
  - Biocmmons SeqRepo API
  - NCBI Variation Services API

- **Python Packages**:
  - bioutils.normalize
  - hgvs

In [7]:
from src.variant_to_cvc_translate import CVCTranslator
cvc_translator = CVCTranslator()

In [8]:
for spdi in spdi_objects: 
    print(f'SPDI Expression: {spdi.to_string()}') 
    print(f'Translated to CVC:\n{cvc_translator.spdi_to_cvc(spdi.to_string())}\n')

SPDI Expression: NC_000001.11:1014263:CC:C
Translated to CVC:
CoreVariantClass(0-based interbase,DNA,CC,C,1014263,1014265,None,None,None,None,None,NC_000001.11,{})

SPDI Expression: NC_000001.11:113901365::ATA
Translated to CVC:
CoreVariantClass(0-based interbase,DNA,,ATA,113901365,113901365,None,None,None,None,None,NC_000001.11,{})

SPDI Expression: NC_000001.11:5880117:TGAGCTTCCA:TGAGCTTCCATGAGCTTCCA
Translated to CVC:
CoreVariantClass(0-based interbase,DNA,TGAGCTTCCA,TGAGCTTCCATGAGCTTCCA,5880117,5880127,None,None,None,None,None,NC_000001.11,{})



#### TODO Voca Normalize SPDI Expression: Need to find examples also need to possible edit this and include it in the cvc translator module. 


In [None]:
from src.spdi.spdi_normalize import VocaNormalizeSpdi
vnormspdi = VocaNormalizeSpdi()

In [None]:
# Example of a SPDI object and a SPDI string 

normalize_example = [
    
    {'sequence': 'NC_000023.11', 'position': '32386322', 'deletion': 'T', 'insertion': 'GA'},
    {'sequence': 'NC_000019.10', 'position': '44908821', 'deletion': 'C', 'insertion': 'T'},
    # Must have at least two distinct nucleotides in the deletion and insertion
    # {'sequence': 'NC_000013.11', 'position': '32936731', 'deletion': 'C', 'insertion': 'C'},
    {'sequence': 'NC_000013.11', 'position': '19993837', 'deletion': 'GT', 'insertion': 'GTGT'}
]

example_spdi_obj = []
for example in normalize_example:
    example_spdi_obj.append(SPDI(**example))

example_spdi_string = []
for example in normalize_example:
    example_spdi_string.append(':'.join(example.values()))


In [None]:
#SPDI Object
print("SPDI Object Example:")
for example in example_spdi_obj:
    print(vnormspdi.spdi_voca_normalize(example))


In [None]:
#SPDI String
print("SPDI String Example:")
for example in example_spdi_string:
    print(vnormspdi.spdi_voca_normalize(example))