# Example workflow

In [1]:
import rectanglepy as rectangle
from anndata import AnnData

## Creating the input data

Rectangle requires the single-cell data in the form of a scverse [`AnnData`](https://anndata.readthedocs.io/en/latest/) object, and the bulk data as a pandas DataFrame.

Do see an example of this, we can load the tutorial data provided by Rectangle.

In [2]:
sc_counts, annotations, bulks  = rectangle.load_tutorial_data()

The tutorial data set contains single-cell RNA-seq counts as a pandas DataFrame, with the cell type annotations as a pandas Series.

In [3]:
sc_counts.iloc[:, :5].head()

Unnamed: 0,MIR1302-2HG,AL627309.1,AL627309.4,AC114498.1,AL669831.5
E2L4_GATGCTACAGGCACAA,0,0,0,0,0
L5_AACAACCAGGAACTAT,0,0,0,0,0
L5_TCCTTCTGTACTCCGG,0,0,0,0,0
L2_GCCCGAACACGTATAC,0,0,0,0,0
E2L2_ATGCATGTCACACCCT,0,0,0,0,0


In [12]:
annotations.head()

E2L4_GATGCTACAGGCACAA    Monocytes
L5_AACAACCAGGAACTAT      Monocytes
L5_TCCTTCTGTACTCCGG      Monocytes
L2_GCCCGAACACGTATAC      Monocytes
E2L2_ATGCATGTCACACCCT    Monocytes
Name: 0, dtype: object

In [13]:
bulks.T.head()

Unnamed: 0,pbmc_1,pbmc_10,pbmc_12
UBE2Q2P2,0.0,0.081115,0.0
SSX9,0.0,0.0,0.0
CXorf67,0.118865,0.086782,0.188464
EFCAB8,0.0,0.0,0.03157
SPATA31B1P,0.0,0.0,0.0


The count dataframe with it's annotations can easily convert to an AnnData object.


In [4]:
sc_adata = AnnData(sc_counts, obs=annotations.to_frame(name="cell_type"))

# Single step Rectangle workflow

To deconvolute the bulk data in a single step, use the "rectangle" method. This method returns a tuple of the estimated cell type proportions and the signature result.

In [None]:
estimations, signature_result = rectangle.rectangle(sc_adata, bulks)

The rectangle method returns two objects:
1. `estimations`: a pandas DataFrame with the estimated cell type proportions for each bulk sample.
2. `signature_result`: a [`RectangleSignatureResult`](../generated/rectanglepy.pp.RectangleSignatureResult.rst) object containing  additional information about the signature and the unknown content.


In [6]:
estimations

Unnamed: 0,B cells,ILC,Monocytes,NK cells,Plasma cells,Platelet,T cells CD4 conv,T cells CD8,Tregs,mDC,pDC,Unknown
pbmc_1,0.090739,0.011386,0.229843,0.022667,0.005956,0.004898,0.02833039,0.162392,0.416964,0.024425,0.002401,0.0
pbmc_10,0.110518,0.01691,0.297931,0.016344,0.000659,0.021996,8.220139000000001e-18,0.118382,0.39183,0.02075,0.004678,0.0
pbmc_12,0.071532,0.007335,0.225517,0.101075,0.003095,0.018637,0.0,0.275009,0.256503,0.010054,0.002985,0.028258


In [7]:
signature_result

<rectanglepy.pp.rectangle_signature.RectangleSignatureResult at 0x166e33280>

# 2-step Rectangle workflow

Rectangle can also be run in two steps, first creating the signature and then deconvoluting the bulk data.

## Create Signature result

In [None]:
signature_result = rectangle.pp.build_rectangle_signatures(sc_adata, bulks=bulks)

This creates a [`RectangleSignatureResult`](../generated/rectanglepy.pp.RectangleSignatureResult.rst) object.

## Deconvolute bulk data

We can then use the signature result to deconvolute the bulk data. This is done using the `deconvolution` method of the `RectangleSignatureResult` object.

In [None]:
estimations, _ = rectangle.tl.deconvolution(signature_result, bulks)

This returns a pandas DataFrame with the estimated cell type proportions.

In [12]:
estimations

Unnamed: 0,B cells,ILC,Monocytes,NK cells,Plasma cells,Platelet,T cells CD4 conv,T cells CD8,Tregs,mDC,pDC,Unknown
pbmc_1,0.090739,0.011386,0.229843,0.022667,0.005956,0.004898,0.02833039,0.162392,0.416964,0.024425,0.002401,0.0
pbmc_10,0.110518,0.01691,0.297931,0.016344,0.000659,0.021996,8.220139000000001e-18,0.118382,0.39183,0.02075,0.004678,0.0
pbmc_12,0.071532,0.007335,0.225517,0.101075,0.003095,0.018637,0.0,0.275009,0.256503,0.010054,0.002985,0.028258


# Analysis of unknown content estimations

Rectangle not only gives you an estimation of unkown content,it also aids in the identification of the genes associated with the unknown content. Specifically it computes the correlation between unknown content and (1) gene expression levels and (2) gene-wise expression error (computed as the expression levels in the reconstructed bulk vs. the true expression)".

The correlation between the unknown cell type and the genes linked to the unknown cell type. There are two columns:
1. `corr_expr`: the correlation between the unknown cell type and the expression of the genes in the bulk.
2. `corr_err`: the correlation between the unknown cell type and the error (bulk - bulk_est) calculated during the bulk reconstruction step of the unknown content.

In [8]:
unkn_gene_corr = signature_result.unkn_gene_corr
unkn_gene_corr.head()

Unnamed: 0,corr_expr,corr_err
A1BG-AS1,-0.470345,-0.433278
A2M,0.20356,-0.797782
AAAS,-0.95778,-0.980177
AAED1,0.99992,0.999998
ABAT,-0.727324,-0.144902


And also the genewise result of ‘bulk - bulk_est’, which is the difference between the  expression in the bulk and the estimated expression in the reconstructed used to calculate the unknown content:

In [21]:
unkn_bulk_err = signature_result.unkn_bulk_err
unkn_bulk_err.iloc[:, :5].head()

Unnamed: 0,A1BG-AS1,A2M,AAAS,AAED1,ABAT
pbmc_1,1.757772,-0.551459,10.402414,-8.103443,-8.706732
pbmc_10,0.7,-0.729346,9.421015,-8.095237,-11.24954
pbmc_12,0.788494,-0.844238,5.706905,-4.165561,-10.300633


# Spatial data

To see how Rectangle can be used with spatial data, we can load a remote dataset using the `spatialdata` package.
See the [spatialdata documentation](https://spatialdata.scverse.org/en/stable/index.html) for more details.
The dataset we will use is the [Visium](https://spatialdata.scverse.org/en/stable/tutorials/notebooks/datasets/README.html#id19) dataset.

In [None]:
import spatialdata
data_path  = "../visium.zarr"
spatial_data = spatialdata.read_zarr(data_path)

# to build the signature we will load the WU dataset
from anndata import read_h5ad
wu_data = read_h5ad("../wu.h5ad")


We can run Rectangle on the spatial data using the `rectangle` method, by using the 'table' object of the `SpatialData` object.

In [None]:
spatial_bulks = spatial_data["table"].to_df()
estimations, signature_result = rectangle.rectangle(wu_data,spatial_bulks)