# TreeAlign with clone labels as input
## Introduction
TreeAlign is a model for scRNA and scDNA integration. TreeAlign can take in either total copy number information or allele specific copy number information or both to assign cells from scRNA to a clones identified with scDNA

## Loading data

In [1]:
from treealign import CloneAlignClone
from treealign import CloneAlignTree

import pandas as pd
from Bio import Phylo

In [2]:
# load total copy number input

# scRNA read count matrix where each row represents a gene, 
# each column represents a cell
expr = pd.read_csv("../data/example_expr.csv", index_col=0)

# scDNA copy number matrix where each row represents a gene,
# each column represents a cell
# the numbers of the matrix reprents the copy number at given cells and genes
cnv = pd.read_csv("../data/example_gene_cnv.csv", index_col=0)

In [3]:
# load allele specific input

# b allele frequency matrix
# each row represents a snp
# each column represents a cell
# The number in the matrix is the b allele frequency at the given snp and cell
hscn = pd.read_csv("../data/example_snp_baf.csv", index_col=0)

# reference allele count matrix from scRNA
# each row represents a snp
# each column represents a cell
snv_allele = pd.read_csv("../data/example_snp_allele.csv", index_col=0)

# total count matrix at SNPs from scRNA
# each row represents a snp
# each column represents a cell
snv_total = pd.read_csv("../data/example_snp_total.csv", index_col=0)

In [4]:
# clone labels for each cell in scDNA
clone = pd.read_csv("../data/example_cell_clone.csv")

In [5]:
# there are four clones in the example datasets: clone A, B, C, None
clone

Unnamed: 0,cell_id,clone_id
0,SPECTRUM-OV-022_S1_LEFT_ADNEXA-A108833A-R03-C08,C
1,SPECTRUM-OV-022_S1_LEFT_ADNEXA-A108833A-R03-C09,A
2,SPECTRUM-OV-022_S1_LEFT_ADNEXA-A108833A-R03-C10,B
3,SPECTRUM-OV-022_S1_LEFT_ADNEXA-A108833A-R03-C13,A
4,SPECTRUM-OV-022_S1_LEFT_ADNEXA-A108833A-R03-C14,C
...,...,...
1057,SPECTRUM-OV-022_S1_RIGHT_ADNEXA-A98179A-R30-C61,B
1058,SPECTRUM-OV-022_S1_RIGHT_ADNEXA-A98179A-R30-C64,C
1059,SPECTRUM-OV-022_S1_RIGHT_ADNEXA-A98179A-R30-C65,B
1060,SPECTRUM-OV-022_S1_RIGHT_ADNEXA-A98179A-R30-C67,B


## Running TreeAlign with clone labels

In [None]:
# construct CloneAlignTree object for data preprocessing

# `repeat` is set to 1 here for demonstration purposes. it would be better to set `repeat` larger than 5. 
# obj = CloneAlignClone(clone=clone, expr=expr, cnv=cnv, hscn=hscn, snv_allele=snv_allele, snv=snv_total, repeat=1)

# it is possible to run TreeAlign with total copy number data only
obj = CloneAlignClone(clone=clone, expr=expr, cnv=cnv, repeat=1)

# it is also possible to run TreeAlign with allele specific data only
# obj = CloneAlignClone(clone=clone, hscn=hscn, snv_allele=snv_allele, snv=snv_total, repeat=1)

# running TreeAlign to assign cells to phylogenetic subclades
obj.assign_cells_to_clones()

gene count: 1675
cell count: 1000




seed = 19, initial_loss = 7799439.516691416
Start Inference.


In [None]:
# to view more details about parameters you can customize when you run TreeAlign
help(CloneAlignClone)

## Getting results
The output of TreeAlign includes: 1. a table indicating the clone to which the cells in scRNA data are assigned. 2. for each gene, a score ranging between 0 and 1 reflecting dosage effects.

In [None]:
clone_assign_df, gene_type_score_df, allele_assign_prob_df = obj.generate_output()

In [None]:
# subclade assignment for each cell in scRNA data
clone_assign_df

# save clone assignment results to csv file
# clone_assign_df.to_csv("test.csv")

In [None]:
# the probability of having dosage effects for each gene
gene_type_score_df

In [None]:
allele_assign_prob_df