# Conversion between R and Python objects
This notebook covers two general methods for converting between R SingleCellExperiment objects and Python AnnData objects. Information about the AnnData format can be found [here](https://anndata.readthedocs.io/en/stable/).

## 0. Load libraries
This notebook uses rpy2 to run both R and Python.

In [1]:
import anndata2ri
from rpy2.robjects import r

anndata2ri.activate()
%load_ext rpy2.ipython

import rpy2.rinterface_lib.callbacks
import logging
rpy2.rinterface_lib.callbacks.logger.setLevel(logging.ERROR)

## 1. Using Loom files
SCE and AnnData objects can be exported as Loom files, which can then be read into both R and Python. The Loom file format is based on HDF5. Loom file format specs can be found [here](https://linnarssonlab.org/loompy/format/index.html). The Loom file format preserves the original count matrix as well as all annotations. 

### 1.1. Exporting Loom file from R
This uses the LoomExperiment package in R.

In [2]:
%%R
library(scRNAseq)
library(LoomExperiment)
library(AnnotationHub)
library(scater)

# Load in data, preprocess
sce.416b <- LunSpikeInData(which="416b")
sce.416b$block <- factor(sce.416b$block)

# Rename rows with symbols
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
rowData(sce.416b)$ENSEMBL <- rownames(sce.416b)
rowData(sce.416b)$SYMBOL <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
    keytype="GENEID", column="SYMBOL")
rowData(sce.416b)$SEQNAME <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
    keytype="GENEID", column="SEQNAME")
rownames(sce.416b) <- uniquifyFeatureNames(rowData(sce.416b)$ENSEMBL, 
                                           rowData(sce.416b)$SYMBOL)

# Convert and export to Loom file
scle.416b <- LoomExperiment(sce.416b)
export(scle.416b, 'scle416b.loom')

### 1.2. Importing Loom file in Python
This requires the scanpy package:

In [3]:
import scanpy as sc

adata = sc.read_loom("scle416b.loom")
adata

AnnData object with n_obs × n_vars = 192 × 46604 
    obs: 'Source.Name', 'block', 'cell.line', 'cell.type', 'colnames', 'genotype', 'phenotype', 'single.cell.well.quality', 'spike.in.addition', 'strain'
    var: 'ENSEMBL', 'Length', 'SEQNAME', 'SYMBOL', 'rownames'

## 2. Converting between objects within a notebook
With the rpy2 package, both R and Python can be run within the same notebook. The [anndata2ri package](https://github.com/theislab/anndata2ri) allows conversion between SingleCellExperiment and AnnData objects within a notebook (must be activated while loading rpy2, as above). Two methods are shown here.

### 2.1. Using an R code block
Inputs and outputs to the R code block can be specified using -i and -o. With anndata2ri activated, any AnnData object passed in to an R code block will be automatically converted to a SingleCellExperiment object. Similarly, any SingleCellExperiment object that is output from an R code block will be automatically converted to an AnnData object.

In [4]:
%%R -o adata
library(scRNAseq)
sce.416b <- LunSpikeInData(which="416b")
adata <- as(sce.416b, 'SingleCellExperiment')

In [5]:
adata

AnnData object with n_obs × n_vars = 192 × 46604 
    obs: 'Source Name', 'cell line', 'cell type', 'single cell well quality', 'genotype', 'phenotype', 'strain', 'spike-in addition', 'block'
    var: 'Length'

In [6]:
%%R -i adata
adata

class: SingleCellExperiment 
dim: 46604 192 
metadata(0):
assays(1): X
rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
  ENSMUSG00000095742 CBFB-MYH11-mcherry
rowData names(1): Length
colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
  SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
  SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
  SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
colData names(9): Source.Name cell.line ... spike.in.addition block
reducedDimNames(0):
spikeNames(0):
altExpNames(0):


### 2.2. Inline
Conversion to an AnnData object can also be done in a single line:

In [7]:
adata = r('as(sce.416b, "SingleCellExperiment")')
adata

AnnData object with n_obs × n_vars = 192 × 46604 
    obs: 'Source Name', 'cell line', 'cell type', 'single cell well quality', 'genotype', 'phenotype', 'strain', 'spike-in addition', 'block'
    var: 'Length'

## 3. A few notes
Conversion is not always perfect; there are a few small issues with naming conventions that can arise. For example, when reading in data to Python as a Loom file, often the gene ids are read in as a separate column in adata.var rather than as the index. This may need to be manually adjusted:

In [8]:
adata = sc.read_loom("scle416b.loom")
adata.var = adata.var.rename(columns={"rownames": "gene_ids"})
adata.var.index = adata.var.gene_ids

Also, column naming conventions between R and Python often differ. Periods and spaces should not be used for column names, so it may be necessary to replace periods and spaces with an underscore:

In [9]:
adata.obs.columns = adata.obs.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('.', '_')

The AnnData object uses a sparse data matrix format to store the counts matrix. Sometimes when converting directly from R, this format is not preserved. For certain functions, it may be necessary to manually convert the counts matrix to a sparse matrix:

In [10]:
from scipy.sparse import csr_matrix
adata = rpy2.robjects.r('sce.416b')
adata.X = csr_matrix(adata.X)

Finally, when reading an AnnData into R, the counts matrix automatically gets labeled "X" rather than "counts". For some Bioconductor function, this becomes problematic. It may be necessary to rename the matrix "counts" (or "logcounts" if the data has been log-normalized) before applying the desired operations:

In [11]:
%%R -i adata
names(assays(adata)) <- "counts" # or "logcounts" if log-normalized
adata

class: SingleCellExperiment 
dim: 46604 192 
metadata(0):
assays(1): X
rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
  ENSMUSG00000095742 CBFB-MYH11-mcherry
rowData names(1): Length
colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
  SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
  SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
  SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
colData names(9): Source.Name cell.line ... spike.in.addition block
reducedDimNames(0):
spikeNames(0):
altExpNames(0):
