# FUSOR Demonstrative Analysis
This notebook contains a demonstrative analysis of the FUSOR package, showing how fusion events detected from patient samples at Nationwide Children's Hospital's Institute for Genomic Medicine (IGM) can be matched to the CIViC and Molecular Oncology Almanac (MOA) databases.

The cells below are run to set environment variables and load in FUSOR.

In [1]:
from os import environ
import logging
from pathlib import Path
from tqdm import tqdm

# These are the configurations for the UTA and SeqRepo databases. These should
# be adjusted by the user based on the locations where these databases exist.
environ["UTA_DB_URL"] = "postgresql://anonymous@localhost:5432/uta/uta_20241220"
environ["SEQREPO_ROOT_DIR"] = "/usr/local/share/seqrepo/2024-12-20"

logging.getLogger("cool_seq_tool").setLevel(logging.ERROR)

In [2]:
from fusor.fusor import FUSOR

fusor = FUSOR()

***Using Gene Database Endpoint: http://localhost:8000***


## Prepare and Load `CategoricalFusion` Data

### CIViC

The cell below loads from the saved CIViC cache (from 9/22/25) and creates a list of `CategoricalFusion` objects. We are filtering for `accepted` and `submitted` variants from CIViC.

In [3]:
from fusor.harvester import CIVICHarvester

harvester = CIVICHarvester(fusor=fusor, local_cache_path="data/caches/civic_cache_20250922.pkl", include_status=["accepted", "submitted"])
civic_fusions = await harvester.load_records()

ERROR:fusor.harvester:Cannot translate fusion: FGFR3(entrez:2261)::v due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints
Traceback (most recent call last):
  File "/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py", line 412, in load_records
    cat_fusion = await self.translator.translate(civic=fusion)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py", line 984, in translate
    raise ValueError(msg)
ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints
ERROR:fusor.harvester:Cannot translate fusion: TCF3(entrez:6929)::PBX1(entrez:5087) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints
Traceback (most recent call last):
  File "/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py", line 412, in load_records
    cat_fusion

### MOA

The cell below loads from the saved MOA cache (9/4/25) and creates a list of `CategoricalFusion` objects.

In [4]:
from fusor.harvester import MOAHarvester

harvester = MOAHarvester(fusor=fusor, cache_dir=Path("data/caches"), use_local=True)
moa_fusions = harvester.load_records()

## Translate `AssayedFusion` objects from EnFusion files

The cell below can be run to generate a list of translated `AssayedFusion` objects from a subset of EnFusion output from 18 patients at IGM. This cell takes around 6 minutes to run. The fusion events were detected at a sequencing depth of 200 million reads, and were returned if they were detected by three or more of the following fusion callers: FusionMap, FusionCatcher, JAFFA, STAR-Fusion, CICERO, Arriba.

In [5]:
from fusor.harvester import EnFusionHarvester
from cool_seq_tool.schemas import Assembly
import re

harvester = EnFusionHarvester(fusor=fusor, assembly=Assembly.GRCH38)
assayed_fusions_enfusion = []
patient_ids = []
files = [patient_file.name for patient_file in Path("data/fusion-test-data").iterdir() if patient_file.is_file()]

for file in tqdm(files):
    fusions = await harvester.load_records(fusion_path=Path(f"data/fusion-test-data/{file}"))
    for fusion in fusions:
        patient_ids.append(re.search(r"S-\d+-\d+", file).group())
    assayed_fusions_enfusion.extend(fusions)

100%|██████████| 18/18 [06:06<00:00, 20.34s/it]


## 1. Use `FusionMatcher` to match patient fusions against CIViC and MOA

The cell below can be run to use the `FusionMatcher` module to match the translated EnFusion output, containting patient fusion data, against CIViC and MOA. Matches are prioritized and returned according to predefined [match criteria](https://github.com/cancervariants/fusor/wiki/Fusion-Match-Classes).

In [6]:
from fusor.fusion_matching import FusionMatcher

fm = FusionMatcher(assayed_fusions=assayed_fusions_enfusion,
                   comparator_fusions=civic_fusions + moa_fusions)

matches = await fm.match_fusion()

### Create Patient Variant Dictionary
The cell below can be run to create a dictionary containing `AssayedFusions` with at least one variant match for each patient sample.

In [7]:
patient_match_dict = {key: [] for key in patient_ids}

for matching_output, patient_id in zip(matches, patient_ids):
    if matching_output:
        if patient_id not in patient_match_dict:
            patient_match_dict[patient_id] = matching_output
        else:
            curr = patient_match_dict[patient_id]
            curr.append(matching_output)
            patient_match_dict[patient_id] = curr

## 2. Fusion Match Characterization
The cells below analyze the results of the matching analysis, describing the types of `CategoricalFusions` matches that are returned given an `AssayedFusion` query and the types of evidence that are associated with those matches.

The helper functions below are used throughout this section:

In [8]:
from fusor.models import CategoricalFusion
from fusor.fusion_matching import MatchType
from collections import Counter

def count_match_types(matches: list[tuple[CategoricalFusion, MatchType]]) -> tuple:
    """Count the number of match types across a list of matching output
    
    :param matches: A list of tuples containing CategoricalFusion and MatchType objects
    :return: A tuple describing the number of fusions that belong in each shared
        fusion match category"""
    priority_list = [match[1].priority // 10 for match in matches]
    element_counts = Counter(priority_list)
    map = {1: "exact", 2:"shared_genes_exact_one_partner", 3:"shared_genes", 4:"exact_one_partner", 5:"shared_gene_one_partner"}
    return {map[key]: value for key, value in element_counts.items()}