# Transcript Accession Analysis
This notebook highlights an example BCR::ABL1 fusion that is described in the literature using older transcript accessions versions and how FUSOR can be leveraged to represent this data.

In [1]:
from os import environ

# These are the configurations for the UTA and SeqRepo databases. These should
# be adjusted by the user based on the locations where these databases exist.
environ["UTA_DB_URL"] = "postgresql://anonymous@localhost:5432/uta/uta_20241220"
environ["SEQREPO_ROOT_DIR"] = "/usr/local/share/seqrepo/2024-12-20"

In [2]:
from fusor.fusor import FUSOR

fusor = FUSOR()

***Using Gene Database Endpoint: http://localhost:8000***


Suppose our example fusion event is reported in an article from 2018 using the following syntax:
**NM_004327.3:c.3000::NM_005157.5:c.2000**.

These transcript accessions have since been updated to reflect newer versions.

Assume these coordinates are described with respect to the CDS start site, and that these are **residue** coordinates. We would like to describe this fusion using genomic coordinates from the **GRCh38** build.

## A. Using the `quick_start` module
Run the cell below to quickly generate the `AssayedFusion` object using the `quick_start` module:

In [3]:
from fusor.quick_start import FUSORQuickStart

fqs = FUSORQuickStart(fusor=fusor,
                      five_prime_junction="NM_004327.3:c.3000",
                      three_prime_junction="NM_005157.5:c.2000",
                      five_prime_gene="BCR",
                      three_prime_gene="ABL1",
                      cds_start_site=True
)
fusion = await fqs.quick_start()
fusion.model_dump(exclude_none=True)

{'structure': [{'type': <FUSORTypes.TRANSCRIPT_SEGMENT_ELEMENT: 'TranscriptSegmentElement'>,
   'transcript': 'refseq:NM_004327.3',
   'transcriptStatus': <TranscriptPriority.LONGEST_COMPATIBLE_REMAINING: 'longest_compatible_remaining'>,
   'strand': <Strand.POSITIVE: 1>,
   'exonEnd': 16,
   'exonEndOffset': -12,
   'gene': {'conceptType': 'Gene',
    'name': 'BCR',
    'primaryCoding': {'id': 'hgnc:1014',
     'system': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/',
     'code': 'HGNC:1014'}},
   'elementGenomicEnd': {'id': 'ga4gh:SL.tQkBo8Xr_nDIElZFWqsy44yXXLxjW5Rd',
    'type': 'SequenceLocation',
    'extensions': [{'name': 'is_exonic', 'value': True}],
    'digest': 'tQkBo8Xr_nDIElZFWqsy44yXXLxjW5Rd',
    'sequenceReference': {'id': 'refseq:NC_000022.11',
     'type': 'SequenceReference',
     'refgetAccession': 'SQ.7B7SHsmchAR0dFcDCuSFjJAo7tX87krQ'},
    'end': 23295143}},
  {'type': <FUSORTypes.TRANSCRIPT_SEGMENT_ELEMENT: 'TranscriptSegmentElement'>,
   'trans

## B. Using FUSOR/Cool-Seq-Tool modules
This section shows how the same object can be generated using individual function calls to FUSOR and Cool-Seq-Tool that are then combined together.

## Representing the 5' partner
The following cells can be run to generate a `TranscriptSegmentElement` for the 5' partner

In [4]:
cds_range = await fusor.cool_seq_tool.uta_db.get_cds_start_end(tx_ac="NM_004327.3")
cds_start = cds_range[0]
f"The CDS start is: {cds_start}"

'The CDS start is: 596'

In [5]:
from cool_seq_tool.schemas import Assembly

tx_data = await fusor.cool_seq_tool.uta_db.get_genomic_tx_data(
    tx_ac="NM_004327.3",
    pos=(cds_start+3000-1,cds_start+3000),
    target_genome_assembly=Assembly.GRCH38
)
tx_data.model_dump(exclude_none=True)

{'gene': 'BCR',
 'strand': <Strand.POSITIVE: 1>,
 'tx_pos_range': (3476, 3608),
 'alt_pos_range': (23295023, 23295155),
 'alt_aln_method': 'splign',
 'tx_exon_id': 956565,
 'alt_exon_id': 6619783,
 'tx_ac': 'NM_004327.3',
 'alt_ac': 'NC_000022.11',
 'coding_start_site': 0,
 'coding_end_site': 0,
 'alt_pos_change_range': (23295142, 23295143),
 'pos_change': (119, 12)}

We can now use the outputted data to create a `TranscriptSegmentElement` using FUSOR

In [6]:
from cool_seq_tool.schemas import CoordinateType

five_prime_transcript = await fusor.transcript_segment_element(
    tx_to_genomic_coords=False,
    transcript="NM_004327.3",
    genomic_ac="NC_000022.11",
    seg_end_genomic=23295143,
    coordinate_type=CoordinateType.RESIDUE
)
five_prime_transcript = five_prime_transcript[0]
five_prime_transcript.model_dump(exclude_none=True)

{'type': <FUSORTypes.TRANSCRIPT_SEGMENT_ELEMENT: 'TranscriptSegmentElement'>,
 'transcript': 'refseq:NM_004327.3',
 'transcriptStatus': <TranscriptPriority.LONGEST_COMPATIBLE_REMAINING: 'longest_compatible_remaining'>,
 'strand': <Strand.POSITIVE: 1>,
 'exonEnd': 16,
 'exonEndOffset': -12,
 'gene': {'conceptType': 'Gene',
  'name': 'BCR',
  'primaryCoding': {'id': 'hgnc:1014',
   'system': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/',
   'code': 'HGNC:1014'}},
 'elementGenomicEnd': {'id': 'ga4gh:SL.tQkBo8Xr_nDIElZFWqsy44yXXLxjW5Rd',
  'type': 'SequenceLocation',
  'extensions': [{'name': 'is_exonic', 'value': True}],
  'digest': 'tQkBo8Xr_nDIElZFWqsy44yXXLxjW5Rd',
  'sequenceReference': {'id': 'refseq:NC_000022.11',
   'type': 'SequenceReference',
   'refgetAccession': 'SQ.7B7SHsmchAR0dFcDCuSFjJAo7tX87krQ'},
  'end': 23295143}}

## Representing the 3' partner
The following cells can be run to generate a `TranscriptSegmentElement` for the 3' partner

In [7]:
cds_range = await fusor.cool_seq_tool.uta_db.get_cds_start_end(tx_ac="NM_005157.5")
cds_start = cds_range[0]
f"The CDS start is: {cds_start}"

'The CDS start is: 192'

In [8]:
tx_data = await fusor.cool_seq_tool.uta_db.get_genomic_tx_data(
    tx_ac="NM_005157.5",
    pos=(cds_start+2000-1,cds_start+2000),
    target_genome_assembly=Assembly.GRCH38
)
tx_data.model_dump(exclude_none=True)

{'gene': 'ABL1',
 'strand': <Strand.POSITIVE: 1>,
 'tx_pos_range': (1870, 5577),
 'alt_pos_range': (130883968, 130887675),
 'alt_aln_method': 'splign',
 'tx_exon_id': 3530423,
 'alt_exon_id': 6630666,
 'tx_ac': 'NM_005157.5',
 'alt_ac': 'NC_000009.12',
 'coding_start_site': 0,
 'coding_end_site': 0,
 'alt_pos_change_range': (130884289, 130884290),
 'pos_change': (321, 3385)}

We can now use the outputted data to create a `TranscriptSegmentElement` using FUSOR

In [9]:
three_prime_transcript = await fusor.transcript_segment_element(
    tx_to_genomic_coords=False,
    transcript="NM_005157.5",
    genomic_ac="NC_000009.12",
    seg_start_genomic=130884290,
    coordinate_type=CoordinateType.RESIDUE
)
three_prime_transcript = three_prime_transcript[0]
three_prime_transcript.model_dump(exclude_none=True)

{'type': <FUSORTypes.TRANSCRIPT_SEGMENT_ELEMENT: 'TranscriptSegmentElement'>,
 'transcript': 'refseq:NM_005157.5',
 'transcriptStatus': <TranscriptPriority.LONGEST_COMPATIBLE_REMAINING: 'longest_compatible_remaining'>,
 'strand': <Strand.POSITIVE: 1>,
 'exonStart': 11,
 'exonStartOffset': 321,
 'gene': {'conceptType': 'Gene',
  'name': 'ABL1',
  'primaryCoding': {'id': 'hgnc:76',
   'system': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/',
   'code': 'HGNC:76'}},
 'elementGenomicStart': {'id': 'ga4gh:SL.4AalyLSqDQxfxmlLmx0T9dizutqExADN',
  'type': 'SequenceLocation',
  'extensions': [{'name': 'is_exonic', 'value': True}],
  'digest': '4AalyLSqDQxfxmlLmx0T9dizutqExADN',
  'sequenceReference': {'id': 'refseq:NC_000009.12',
   'type': 'SequenceReference',
   'refgetAccession': 'SQ.KEO-4XBcm1cxeo_DIQ8_ofqGUkp4iZhI'},
  'start': 130884289}}

## Create `AssayedFusion` object 
We can now assemble the fusion event, described with FUSOR using the provided transcript accessions from the literature

In [10]:
fusion = fusor.assayed_fusion(structure=[five_prime_transcript, three_prime_transcript])
fusion.model_dump(exclude_none=True)

{'structure': [{'type': <FUSORTypes.TRANSCRIPT_SEGMENT_ELEMENT: 'TranscriptSegmentElement'>,
   'transcript': 'refseq:NM_004327.3',
   'transcriptStatus': <TranscriptPriority.LONGEST_COMPATIBLE_REMAINING: 'longest_compatible_remaining'>,
   'strand': <Strand.POSITIVE: 1>,
   'exonEnd': 16,
   'exonEndOffset': -12,
   'gene': {'conceptType': 'Gene',
    'name': 'BCR',
    'primaryCoding': {'id': 'hgnc:1014',
     'system': 'https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/',
     'code': 'HGNC:1014'}},
   'elementGenomicEnd': {'id': 'ga4gh:SL.tQkBo8Xr_nDIElZFWqsy44yXXLxjW5Rd',
    'type': 'SequenceLocation',
    'extensions': [{'name': 'is_exonic', 'value': True}],
    'digest': 'tQkBo8Xr_nDIElZFWqsy44yXXLxjW5Rd',
    'sequenceReference': {'id': 'refseq:NC_000022.11',
     'type': 'SequenceReference',
     'refgetAccession': 'SQ.7B7SHsmchAR0dFcDCuSFjJAo7tX87krQ'},
    'end': 23295143}},
  {'type': <FUSORTypes.TRANSCRIPT_SEGMENT_ELEMENT: 'TranscriptSegmentElement'>,
   'trans