# scClone2DR Tutorial

This notebook demonstrates how to use the scClone2DR package for analyzing single-cell drug response data. It covers both real data analysis and simulated data experiments.

## Prerequisites: Generate Test Data

**Important**: Before running this tutorial, you must first generate synthetic test data by running the notebook:

```
./data/generate_fake_data.ipynb
```

This will create fake data that mimics the structure of the real datasets used in the paper (which are confidential and cannot be shared). The synthetic data includes:
- Single-cell RNA expression data with clone annotations
- Fast Drug response measurements
- Clone metadata

## Tutorial Contents

This notebook is divided into two main sections:

### 1. Real Data Analysis
Demonstrates the complete workflow using real-world data format:
- Loading single-cell RNA and drug response data
- Training the scClone2DR model
- Making predictions on held-out test data
- Visualizing results (fold changes, cell counts, survival probabilities)
- Analyzing clone proportions and drug effects

### 2. Simulated Data Analysis
Shows how to work with fully synthetic data where ground truth is known:
- Generating simulated training data with known parameters
- Training and evaluating model performance
- Comparing predictions against ground truth
- Comprehensive result visualization

## Quick Start

1. Run `./data/generate_fake_data.ipynb` to create test data
2. Execute cells sequentially in this notebook
3. Adjust parameters (train/test split, regularization, training steps) as needed

In [None]:
import scClone2DR as sccdr
import matplotlib.pyplot as plt
import numpy as np
from copy import deepcopy

# Real Data

## Initialize Real-Data Model
Set file paths and create the scClone2DR model instance for real data.

In [None]:
path_rna = '/{Set path to the repository}/package/data/'
path_fastdrug = '/{Set path to the repository}/package/data/FD_data.csv'
model = sccdr.models.scClone2DR(path_fastdrug=path_fastdrug, path_rna=path_rna)

In [None]:
data_ref = model.get_real_data(concentration_DMSO=5, concentration_drug=5)

## Train/Test Split and Training
Split the real dataset and train the model with L1/L2 regularization.

In [None]:
idxs_train = [i for i in range(int(0.8*data_ref['N']))]
idxs_test = [i for i in range(data_ref['N']) if not(i in idxs_train)]

data_train, data_test, sample_names_train, sample_names_test = model.get_real_data_split(idxs_train, idxs_test)
params_svi = model.train(data_train, penalty_l1=0.1, penalty_l2=0.1 , n_steps=600)

## Estimate Latent Gamma
Average guide samples to get a stable estimate of the latent gamma values.

In [None]:
import torch
mean_gamma = model.guide.sample_latent()
mean_gamma = torch.zeros(len(mean_gamma))
for i in range(100):
    mean_gamma += model.guide.sample_latent()/100
dim = int(len(mean_gamma)/model.n_clonelabels)
for i, clonelabel in enumerate(model.clonelabels):
    params_svi['gamma_{0}'.format(clonelabel)] = mean_gamma[i*dim:(i+1)*dim]

## Prepare Validation Parameters
Copy parameters, detach tensors, and align them to the validation split.

In [None]:
params_svi_validation = {}
for key, val in params_svi.items():
    if torch.is_tensor(val):
        params_svi_validation[key] = val.clone().detach()
    else:
        params_svi_validation[key] = val

params_svi_validation['proportions'] = params_svi_validation['proportions'][len(idxs_train):,:]
params_svi_validation['theta_fd'] = params_svi_validation['theta_fd'][len(idxs_train):]
data_validation = deepcopy(data_test)

## Posterior Sampling
Sample from the model on the validation data.

In [None]:
res = model.sampling(data_validation, params=params_svi_validation)

## Fold Change Computation
Compute fold changes and collect outputs for evaluation.

In [None]:
fold_changes, pi, colors = model.get_fold_change(data_validation, params_svi_validation, output_results=True)

## Fold Change Scatter Plot
Compare predicted vs observed fold changes visually.

In [None]:
plt.scatter(fold_changes['not pred'], fold_changes['pred'], c=colors)

## Fraction Visualization
Show tumor fractions for the validation data.

In [None]:
sccdr.resultanalysis.show_fractions(data_validation, res, idxdrug=0)

## Cell Count Visualization
Display predicted number of non-malignant cells in control wells.

In [None]:
sccdr.resultanalysis.show_cells(data_validation, res)

## Proportion Visualization
Plot clone proportions inferred by the model.

In [None]:
sccdr.resultanalysis.show_proportions(data_validation, params_svi_validation)

## Beta Effects
Inspect beta parameters to interpret drug effects.

In [None]:
sccdr.resultanalysis.show_beta(data_validation, params_svi_validation)

## Count Scatter Plot
Visualize observed vs predicted counts.

In [None]:
sccdr.resultanalysis.scatter_counts(data_validation, res)

## Survival Probabilities
Compute survival probabilities from single-cell features and plot them.

In [None]:
params_svi_tensor = model.convert_to_tensor(params_svi_validation)
pi = model.compute_survival_probas_single_cell_features(data_validation, params=params_svi_tensor)
sccdr.resultanalysis.survival_probabilities(data_validation, pi.detach().numpy(), model.cluster2clonelabel, idxdrug=0)

# Simulated data

## Initialize Simulated-Data Model
Import libraries and create a fresh model for simulated data.

In [None]:
import scClone2DR as sccdr
import matplotlib.pyplot as plt
import numpy as np
model = sccdr.models.scClone2DR()

## Generate Simulated Data
Create synthetic training data with known parameters.

In [None]:
data_ref = model.get_simulated_training_data({'C':24,'R':5,'N':100,'Kmax':7, 'D':30, 'theta_rna':15}, neg_bin_n=100)

## Simulated Train/Test Split
Split simulated data and train the model.

In [None]:
idxs_train = [i for i in range(int(0.5*data_ref['N']))]
idxs_test = [i for i in range(int(0.5*data_ref['N']), data_ref['N'])]

data_train, data_test = model.get_data_split_simu(data_ref, idxs_train, idxs_test)
params_svi = model.train(data_train, penalty_l1=0.1, penalty_l2=0.1 , n_steps=1000)

## Sample Simulated Data
Run posterior sampling on the simulated training data.

In [None]:
res = model.sampling(data_train, params=params_svi)

## Fold Change vs Ground Truth
Compute fold changes and compare to true parameters.

In [None]:
fold_changes, pi, truepi, colors = model.get_fold_change(data_ref, params_svi, true_params=data_ref, output_results=True)

## True vs Predicted Plot
Scatter plot of true versus predicted fold changes.

In [None]:
plt.scatter(fold_changes['true'], fold_changes['pred'], c=colors)

## Observed vs Predicted Plot
Scatter plot of observed versus predicted fold changes.

In [None]:
plt.scatter(fold_changes['not pred'], fold_changes['pred'], c=colors)

## Simulated Tumour Fractions
Visualize the tumor fractions in control wells.

In [None]:
sccdr.resultanalysis.show_fractions(data_train, res, idxdrug=0)

## Simulated Cell Counts
Show predicted number of non-malignant cells in control wells.

In [None]:
sccdr.resultanalysis.show_cells(data_train, res)

## Simulated Proportions
Plot clone proportions for simulated data.

In [None]:
sccdr.resultanalysis.show_proportions(data_train, params_svi)

## Simulated Beta Effects
Inspect beta parameters in the simulated setting.

In [None]:
sccdr.resultanalysis.show_beta(data_ref, params_svi)

## Simulated Count Scatter
Visualize observed vs predicted counts for simulated data.

In [None]:
sccdr.resultanalysis.scatter_counts(data_train, res)

## Subclone Survival Probabilities
Compute survival probabilities from subclone-level features.

In [None]:
params_svi_tensor = model.convert_to_tensor(params_svi)
pi = model.compute_survival_probas_subclone_features(data_ref, params=params_svi_tensor)
sccdr.resultanalysis.survival_probabilities(data_train, pi, model.cluster2clonelabel, idxdrug=0)

## Compute Summary Statistics
Aggregate statistics comparing predictions to ground truth.

In [None]:
params_svi_tensor['pi'] = model.compute_survival_probas_subclone_features(data_ref, params_svi)
data_ref['pi'] = model.compute_survival_probas_subclone_features(data_ref, data_ref)
model.compute_all_stats(data_ref, data_ref, params_svi_tensor)