# BESCAPE - tutorial on deconvolution of bulk RNA using single-cell annotations

BESCAPE (BESCA Proportion Estimator) is a deconvolution module. It utilises single-cell annotations coming from the BESCA workflow to build a Gene Expression Profile (GEP). This GEP is used as a basis vector to deconvolute bulk RNA samples i.e. predict cell type proportions within a sample.

BESCAPE has a useful implementation, whereby the user can specify their own GEP, as well as choose any of the supported deconvolution methods. Thus, it effectively allows decoupling of the deconvolution algorithm from its underlying GEP (basis vector).

This tutorial presents the workflow for deconvolution, as well as the link to BESCA single-cell annotations.

We assume that either Docker or Singularity services have already been installed.

# Initialising the predictor object

Initiate the decovnolution predictor object. Requires either a Docker, or a Singularity image to run. Both methods are shown below.

## 1. Docker
To initiate the Bescape deconvolution object, we to set the service to 'docker' and docker_image='bedapub/bescape:version'. It will first look for local docker images, and if not available, will pull the bescape image from DockerHub. This also means that one can locally build a customised Docker image from the BESCAPE source and set use it in the Bescape object.

All bescape docker images are hosted on DockerHub here: https://hub.docker.com/r/bedapub/bescape/tags


In [None]:
import os
from bescape import Bescape

# docker
# may take some time if the docker image is being built for the first time
deconv = Bescape(service='docker', docker_image='bedapub/bescape:0.1')

## 2. Singularity
When using Singularity, the user specifies the absolute path for the Singularity container file. If the path is not given, Bescape will attempt to pull the lastest docker image from Dockerhub and build a new copy of a Singularity container file.

In [None]:
import os
from bescape import Bescape

# singularity
deconv = Bescape(service='singularity', path_singularity=None)

# Performing Deconvolution
Once the Bescape object has been initialised, the methods are the same for both `docker` and `singularity`. The module distinguishes between two types of basis vectors as input:

## 1. Gene Expression Profile (GEP) 
- generated from single-cell annotations using __BESCA.export__ functions
- currently supported packages: 
    1. bescape - in-house method based on nu-SVR (CIBERSORT)
- implemented in the __Bescape.deconvolute_gep( )__ method

### 1.1. method = Bescape

In [None]:
# Important to specify ABSOLUTE directory paths
wd = os.getcwd() # assumes this notebook has its wd set in the "~/../bescape/docs/" folder as cloned from the github repository
annot = wd + '/datasets/bescape/gep'
inpt = wd + '/datasets/bescape/input'
output = wd + '/datasets/bescape/output'

print(output)
# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_gep(dir_annot= annot, 
                      dir_input= inpt,
                      dir_output= output, 
                      method='bescape')

### 1.2. method = EPIC

As bulk input EPIC takes in ExpressionSet with the `@assayData` slot filled with gene expression count from each bulk sample. The counts should be given in TPM, RPKM or FPKM when using the prebuilt reference profiles.

If we leave `dir_annot='epic'`, EPIC will provide a prebuilt reference profile that can predict: __B cells, CAFs, CD4+ T cells, CD8+ T cells, NK, cells, and Macrophages__.

In [None]:
# Important to specify ABSOLUTE directory paths
wd = os.getcwd()
annot = wd + '/datasets/epic/gep'
inpt = wd + '/datasets/epic/input'
output = wd + '/datasets/epic/output'

# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_gep(dir_annot= 'epic', 
                      dir_input= inpt,
                      dir_output= output, 
                      method='epic')

## 2. Single-cell annotation AnnData object 
- should contain single-cell annotations of multiple samples from which the deconvolution method generates its own GEP
- currently supported packages:
    1. MuSiC
    2. SCDC
- implemented in the __Bescape.deconvolute_sc( )__ method

### 2.1. MuSiC

In [None]:
# Important to specify ABSOLUTE directory paths
wd = os.getcwd()
annot = wd + '/datasets/music/gep'
inpt = wd + '/datasets/music/input'
output = wd + '/datasets/music/output'

# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_sc(dir_annot= annot, 
                      dir_input= inpt,
                      dir_output= output, 
                      method='music')

### 2.2. SCDC

Using SCDC requires additional parameters:
* `celltype_var` - variable name containing the cell type annot in @phenoData of the eset
* `celltypesel` - cell types of interest to estimate
* `samplevar` - variable name in @phenoData identifying the sample name

In [None]:
from bescape import Bescape
import os
# singularity
deconv = Bescape(service='singularity', path_singularity='~/singularity_images/bescape_singularity.sif')
wd = os.getcwd()
print(wd)
dir_annot = wd + '/datasets/music/gep/'
dir_input = wd + '/datasets/music/input'
dir_output = wd + '/datasets/music/output'

deconv.deconvolute_sc(dir_annot=dir_annot, 
                      dir_input=dir_input,
                      dir_output=dir_output, 
                      method='music', 
                      celltype_var='cluster', 
                      celltype_sel=["alpha","beta","delta","gamma","acinar","ductal"], 
                      sample_var='sample')