# Quick start

In [1]:
import rectanglepy as rectangle
from anndata import AnnData

## Creating the input data

Rectangle requires the single-cell data in the form of a scverse [`AnnData`](https://anndata.readthedocs.io/en/latest/) object, and the bulk data as a pandas DataFrame.

Do see an example of this, we can load the tutorial data provided by Rectangle.

In [2]:
sc_counts, annotations, bulks  = rectangle.load_tutorial_data()

The tutorial data set contains single-cell RNA-seq counts as a pandas DataFrame, with the cell type annotations as a pandas Series.

In [3]:
sc_counts.iloc[:, :5].head()

Unnamed: 0,MIR1302-2HG,AL627309.1,AL627309.4,AC114498.1,AL669831.5
E2L4_GATGCTACAGGCACAA,0,0,0,0,0
L5_AACAACCAGGAACTAT,0,0,0,0,0
L5_TCCTTCTGTACTCCGG,0,0,0,0,0
L2_GCCCGAACACGTATAC,0,0,0,0,0
E2L2_ATGCATGTCACACCCT,0,0,0,0,0


In [12]:
annotations.head()

E2L4_GATGCTACAGGCACAA    Monocytes
L5_AACAACCAGGAACTAT      Monocytes
L5_TCCTTCTGTACTCCGG      Monocytes
L2_GCCCGAACACGTATAC      Monocytes
E2L2_ATGCATGTCACACCCT    Monocytes
Name: 0, dtype: object

In [13]:
bulks.T.head()

Unnamed: 0,pbmc_1,pbmc_10,pbmc_12
UBE2Q2P2,0.0,0.081115,0.0
SSX9,0.0,0.0,0.0
CXorf67,0.118865,0.086782,0.188464
EFCAB8,0.0,0.0,0.03157
SPATA31B1P,0.0,0.0,0.0


The count dataframe with it's annotations can easily convert to an AnnData object.


In [4]:
sc_adata = AnnData(sc_counts, obs=annotations.to_frame(name="cell_type"))

## Single step Rectangle workflow

To deconvolute the bulk data in a single step, use the "rectangle" method. This method returns a tuple of the estimated cell type proportions and the signature result.

In [None]:
estimations, signature_result = rectangle.rectangle(sc_adata, bulks)

The rectangle method returns two objects:
1. `estimations`: a pandas DataFrame with the estimated cell type proportions for each bulk sample.
2. `signature_result`: a [`RectangleSignatureResult`](../generated/rectanglepy.pp.RectangleSignatureResult.rst) object containing  additional information about the signature and the unknown content.


In [6]:
estimations

Unnamed: 0,B cells,ILC,Monocytes,NK cells,Plasma cells,Platelet,T cells CD4 conv,T cells CD8,Tregs,mDC,pDC,Unknown
pbmc_1,0.090739,0.011386,0.229843,0.022667,0.005956,0.004898,0.02833039,0.162392,0.416964,0.024425,0.002401,0.0
pbmc_10,0.110518,0.01691,0.297931,0.016344,0.000659,0.021996,8.220139000000001e-18,0.118382,0.39183,0.02075,0.004678,0.0
pbmc_12,0.071532,0.007335,0.225517,0.101075,0.003095,0.018637,0.0,0.275009,0.256503,0.010054,0.002985,0.028258


In [7]:
signature_result

<rectanglepy.pp.rectangle_signature.RectangleSignatureResult at 0x166e33280>

## 2-step Rectangle workflow

Rectangle can also be run in two steps, first creating the signature and then deconvoluting the bulk data.

## Create Signature result

In [None]:
signature_result = rectangle.pp.build_rectangle_signatures(sc_adata, bulks=bulks)

This creates a [`RectangleSignatureResult`](../generated/rectanglepy.pp.RectangleSignatureResult.rst) object.

## Deconvolute bulk data

We can then use the signature result to deconvolute the bulk data. This is done using the `deconvolution` method of the `RectangleSignatureResult` object.

In [None]:
estimations, _ = rectangle.tl.deconvolution(signature_result, bulks)

This returns a pandas DataFrame with the estimated cell type proportions.

In [12]:
estimations

Unnamed: 0,B cells,ILC,Monocytes,NK cells,Plasma cells,Platelet,T cells CD4 conv,T cells CD8,Tregs,mDC,pDC,Unknown
pbmc_1,0.090739,0.011386,0.229843,0.022667,0.005956,0.004898,0.02833039,0.162392,0.416964,0.024425,0.002401,0.0
pbmc_10,0.110518,0.01691,0.297931,0.016344,0.000659,0.021996,8.220139000000001e-18,0.118382,0.39183,0.02075,0.004678,0.0
pbmc_12,0.071532,0.007335,0.225517,0.101075,0.003095,0.018637,0.0,0.275009,0.256503,0.010054,0.002985,0.028258


# Spatial transcriptomics deconvolution

To see how Rectangle can be used with spatial data, we can load a remote dataset using the `spatialdata` package.
See the [spatialdata documentation](https://spatialdata.scverse.org/en/stable/index.html) for more details.

We use the [10x Visium data generated from the human dorsolateral prefrontal cortex](https://github.com/LieberInstitute/HumanPilot/tree/master/10X/151673)

Which can be dowloaded from here:
[https://spatial-dlpfc.s3.us-east-2.amazonaws.com/h5/151673_filtered_feature_bc_matrix.h5](https://spatial-dlpfc.s3.us-east-2.amazonaws.com/h5/151673_filtered_feature_bc_matrix.h5)

In [None]:
import spatialdata
import spatialdata_io
import pandas as pd

s_data = spatialdata_io.visium(path='../data', dataset_id='151673',scalefactors_file='scalefactors_json.json', tissue_positions_file='tissue_positions.csv', counts_file='151673_filtered_feature_bc_matrix.h5')


 To build the signature we will load the [M1 Allen Brain atlas](https://portal.brain-map.org/atlases-and-data/rnaseq/human-m1-10x)

In [None]:
counts = pd.read_csv('../data/human_m1/matrix.csv', index_col=0)
counts = counts.astype(int)
metadata = pd.read_csv('../data/human_m1/metadata.csv', index_col=0)
annotations = metadata['cell_type_alias_label']


We do a simple preprocessing of the annotations to group the cell types into broader categories.

In [None]:
# remove anything after the second space(word)
annotations = annotations.str.split(' ', n=2).str[:2].str.join(' ')
# remove anything afert first minus
annotations = annotations.str.split('-', n=1).str[0]

We can run Rectangle on the spatial data using the `rectangle` method, by using the 'table' object of the `SpatialData` object.

In [None]:
adata = AnnData(counts, obs=annotations.to_frame(name='cell_type'))

data_table = s_data['table']
bulks = data_table.to_df()
# Convert bulks from counts to CPM
bulks_cpm = bulks.div(bulks.sum(axis=1), axis=0) * 1e6


In [None]:
estimations, signature_result = rectangle.rectangle(adata, bulks_cpm)


We can annotate the spatial data with the estimations by creating a new `AnnData` object from the estimations and adding it to the `SpatialData` object.

In [None]:
from spatialdata.models import TableModel

table_data = AnnData(estimations)
adata_for_sdata = TableModel.parse(table_data)

adata_for_sdata.uns["spatialdata_attrs"] = {
    "region": "spots",
    "region_key": "region",
    "instance_key": "spot_id",
}

adata_for_sdata.obs["region"] = pd.Categorical(["spots"] * len(adata_for_sdata))
adata_for_sdata.obs["spot_id"] = s_data.tables.data['table'].obs["spot_id"]

s_data.tables["rectangle_results"] = adata_for_sdata