# PhyliCS: tutorial

PhyliCS is a pipeline for **multi-sample** copy-number variation (CNV) analysis on
**single-cell DNA** sequencing data. It allows to quantify **intra-tumor heterogeneity** and to investigate **temporal and spatial evolution** of tumors.

This tutorial is meant to show how to run the main stages of the pipeline. Specificallly, we will execute the code needed to reproduce the second use-case *(Temporal evolution)* presented in the Supplementary materials of our paper and we will, just, list the commands required to reproduce the first use-case *(Spatial intra-tumor heterogeneity)*, since it implies a dataset made of, approximately, 6000 cells and it would take an execution time which is not feasible for a turorial. 

## Temporal evolution
Here, we want to use PhyliCS to investigate temporal evolution of CNVs in a cancer case. To this purpose, we are going to take advantage of the results of one of the CNV analyses performed by Garvin et. al [1] to validate their tool (*Ginkgo*). Specifically, it uses the single-cell data of two samples coming from a breast tumor and its liver metastasis (T16P/M), used by Navin et al. [2] for their study on intra-tumor heterogeneity characterization. Since CNV calls are, publicly, available in Ginkgo github repository, we will skip the CNV calling stage and we will move directly to the analysis.

### Single-sample analysis
To perform single-sample analysis, type:

In [1]:
!mkdir data/navin_out 
!phylics --run --run_single --input_dirs primary:data/navin_primary metastasis:data/navin_metastasis --meta_format json --output_path data/navin_out --verbose

Successfully created the directory data/navin_out/primary_post_CNV 
Successfully created the directory data/navin_out/metastasis_post_CNV 
[single_sample_post_analysis]  Complete analysis
--------------------------------------------------------------------------------
[single_sample_post_analysis]  Computing heatmap and phylogenetic tree (method = complete, metric = euclidean)
[single_sample_post_analysis] -- cophenet coefficient: 0.9922210550775276
--------------------------------------------------------------------------------
[single_sample_post_analysis]  Plotting mean ploidy distribution
--------------------------------------------------------------------------------
[single_sample_post_analysis]  Computing mean CNV profile
[single_sample_post_analysis] -- mean ploidy = 2.884057
--------------------------------------------------------------------------------
[single_sample_post_analysis]  Computing the optimal number of clusters
[single_sample_post_analysis] -- n_clusters = 2 - Th

#### Output
- `data/navin_out/<sample_name>_post_CNV`
    - `heatmap.png`: heatmap and dendrogram computed by the phylogenetic algorithm. 
    - `mean_cnv.png`: average copy-number plot. It shows which is the average copy-number, computed on all cells, for each genome position.
    
    - `mean_ploidy_distribution.svg`: mean ploidy density distribution plot. The mean ploidy is the mean copy-number of each single cell and this plot shows how the mean ploidies are distributed. It allows to high-light groups of pseudo-diploids cells.
    
    - `silhouette_results.png`: silhouette plot for each of the tested K's. 
    - `per_k_silhouette_scores.csv`: average silhoutte scores for each of the  tested K's.
    - `silhouette_summary.png`: dot plot of the silhouette score for the tested K's.
    - `clusters.tsv`: composition and mean copy-number of the clusters built with the Silhouette method.
    - `clusters_heatmap.png`: heatmaps of the  clusters built with the Silhouette method.
    - `metadata.json`: file containing some general information about the current analysis.
    

### Cell filtering
To filter out normal cells, type:

In [2]:
!phylics --run_cell_filtering --input_dirs primary:data/navin_primary --intervals 1.5-2.3 --output_path data/navin_out --verbose

Cell filtering execution
Successfully created the directory data/navin_out/primary_filtered 
[valid_cells]  Initial cells: 52
[valid_cells]  Filtered out cells: 33
[valid_cells]  Remaining cells: 19


In [3]:
!phylics --run_cell_filtering --input_dirs metastasis:data/navin_metastasis --intervals 1.5-2.3 --output_path data/navin_out --verbose

Cell filtering execution
Successfully created the directory data/navin_out/metastasis_filtered 
[valid_cells]  Initial cells: 48
[valid_cells]  Filtered out cells: 25
[valid_cells]  Remaining cells: 23


### Multiple-sample analysis
To perform multiple-sample analysis, type:

In [4]:
!phylics --run --run_multiple --input_dirs primary:data/navin_out/primary_filtered metastasis:data/navin_out/metastasis_filtered --n_permutations 1000 --meta_format json --output_path data/navin_out --verbose

Successfully created the directory data/navin_out/primary_metastasis_postCNV 
[multi_sample_post_analysis]  CNV calls merging
--------------------------------------------------------------------------------
[multi_sample_post_analysis]  Complete analysis
--------------------------------------------------------------------------------
[multi_sample_post_analysis]  Heterogeneity score computation
[multi_sample_post_analysis] -- Permutation test (n_permutations = 1000)
[multi_sample_post_analysis] ---- iteration: 0
[multi_sample_post_analysis] ---- iteration: 100
[multi_sample_post_analysis] ---- iteration: 200
[multi_sample_post_analysis] ---- iteration: 300
[multi_sample_post_analysis] ---- iteration: 400
[multi_sample_post_analysis] ---- iteration: 500
[multi_sample_post_analysis] ---- iteration: 600
[multi_sample_post_analysis] ---- iteration: 700
[multi_sample_post_analysis] ---- iteration: 800
[multi_sample_post_analysis] ---- iteration: 900
[multi_sample_post_analysis] ---- Permuta

### Download data
```
!mkdir -p 10x_dataset/breast_tissue_A 10x_dataset/breast_tissue_B 10x_dataset/breast_tissue_C
!wget -O 10x_dataset/breast_tissue_A/possorted_bam.bam http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-dna/1.1.0/breast_tissue_A_2k/breast_tissue_A_2k_possorted_bam.bam
!wget -O 10x_dataset/breast_tissue_A/per_cell_summary_metrics.csv http://cf.10xgenomics.com/samples/cell-dna/1.1.0/breast_tissue_A_2k/breast_tissue_A_2k_per_cell_summary_metrics.csv
!wget -O 10x_dataset/breast_tissue_B/possorted_bam.bam http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-dna/1.1.0/breast_tissue_B_2k/breast_tissue_B_2k_possorted_bam.bam
!wget -O 10x_dataset/breast_tissue_B/per_cell_summary_metrics.csv http://cf.10xgenomics.com/samples/cell-dna/1.1.0/breast_tissue_B_2k/breast_tissue_B_2k_per_cell_summary_metrics.csv
!wget -O 10x_dataset/breast_tissue_C/possorted_bam.bam http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-dna/1.1.0/breast_tissue_C_2k/breast_tissue_C_2k_possorted_bam.bam
!wget -O 10x_dataset/breast_tissue_B/per_cell_summary_metrics.csv http://cf.10xgenomics.com/samples/cell-dna/1.1.0/breast_tissue_C_2k/breast_tissue_C_2k_per_cell_summary_metrics.csv
```

## References
1. Tyler Garvin, Robert Aboukhalil, Jude Kendall, Timour Baslan, Gurinder S Atwal, James Hicks, Michael Wigler, and Michael C Schatz. Interactive analysis and assessment of single-cell copy-number variations. Nature methods, 12(11):1058, 2015.

2. Nicholas Navin, Alexander Krasnitz, Linda Rodgers, Kerry Cook, Jennifer Meth, Jude Kendall, Michael Riggs, Yvonne Eberling, Jennifer Troge, Vladimir Grubor, et al. Inferring tumor progression from genomic heterogeneity. Genome research, 20(1):68–80, 2010.
