In [1]:
import os
import sys
import warnings

os.chdir("..")
sys.path.append("../../")
warnings.filterwarnings("ignore")

# Use `scdesign3()` to achieve all-in-one simulation

## Introduction

In this section, we will show how to use `scDesign3` method `scdesign3()` to perform all-in-one simulation and get the new dataset.

To get detailed information of the input and output of the function, please check [API](../set_up/_autosummary/scDesign3Py.scDesign3.scdesign3.rst).

## Step 1: Import packages and Read in data

### import packages

When importing the `scDesign3Py` package, the initiation process includes finding the **R** interpreter and detect whether the **R** package **scDesign3** is installed. If the **R** package **scDesign3** isn't installed, `scDesign3Py` will try to automatically install the dependency.

In [2]:
import anndata as ad
import pandas as pd
import scDesign3Py

The R project used is located at /home/ld/anaconda3/envs/pyscdesign/lib/R


### Read in data

The input data should be an `anndata.AnnData` object because so far only the transformation of `anndata.AnnData` to **R** `SingleCellExperiment` object has been implemented. 

Here, we read in the `h5ad` file directly. The raw data is from the [scvelo](https://scvelo.readthedocs.io/scvelo.datasets.pancreas/) and we only choose top 30 genes to save time.

```{eval-rst}
.. Note::
    If you have any problem in building an `anndata.AnnData` object, you can check the `anndata` `document <https://anndata.readthedocs.io/en/latest/>`_ .
```

In [3]:
data = ad.read_h5ad("data/PANCREAS.h5ad")
data = data[:, 0:30]
data

View of AnnData object with n_obs × n_vars = 2087 × 30
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
    var: 'highly_variable_genes'
    obsm: 'X_pca', 'X_umap', 'X_x_pca', 'X_x_umap'

## Step 2: Create an instance of the scDesign3 class

When creating the instance, the basic setting can be specified, including how many cores used to computing, which parallel method to use and whether to return a more pythonic output.

Details of the settings are shown in [API](../set_up/_autosummary/scDesign3Py.scDesign3.__init__.rst).

```{eval-rst}
.. Note::
    If you are a windows user, please refer to :doc:`Get BPPARAM <./bpparam>` section to use more than one core for parallel computing.
```

In [4]:
test = scDesign3Py.scDesign3(n_cores=3, parallelization="pbmcmapply",return_py=True)
test.set_r_random_seed(123)

## Step 3: call `scdesign3()` method

In [5]:
simu_res = test.scdesign3(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    mu_formula="s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula="s(pseudotime, k = 5, bs = 'cr')",
    family_use="nb",
    usebam=False,
    corr_formula="1",
    copula="gaussian",
)

R[write to console]: Input Data Construction Start

R[write to console]: Input Data Construction End

R[write to console]: Start Marginal Fitting





R[write to console]: Marginal Fitting End

R[write to console]: Start Copula Fitting

R[write to console]: Convert Residuals to Multivariate Gaussian





R[write to console]: Converting End

R[write to console]: Copula group 1 starts

R[write to console]: Copula Fitting End

R[write to console]: Start Parameter Extraction





R[write to console]: Parameter
Extraction End

R[write to console]: Start Generate New Data

R[write to console]: Use Copula to sample a multivariate quantile matrix

R[write to console]: Sample Copula group 1 starts





R[write to console]: New Data Generating End



## Step 4: Check the simulation results and do downstream analysis if needed.

As we set `return_py` = True when initializing, the return value of the `scdesign3()` will be converted to a more familiar version for Python users, like `pandas.DataFrame`. 

In [6]:
simu_res["new_count"].iloc[0:6,0:6]

Unnamed: 0,Pyy,Iapp,Chgb,Rbp4,Spp1,Chga
AAACCTGAGAGGGATA,0.0,-0.0,207.0,10.0,0.0,68.0
AAACCTGGTAAGTGGC,1.0,-0.0,9.0,1.0,0.0,10.0
AAACGGGCAAAGAATC,60.0,2.0,22.0,12.0,-0.0,25.0
AAACGGGGTACAGTTC,305.0,334.0,17.0,25.0,1.0,6.0
AAACGGGGTGAAATCA,0.0,-0.0,0.0,-0.0,1.0,0.0
AAACGGGTCAAACAAG,1.0,1.0,0.0,-0.0,5.0,-0.0


In [7]:
simu_res["model_aic"]

aic.marginal    265624.396442
aic.copula       -1870.915929
aic.total       263753.480514
dtype: float64

The class property `whole_pipeline_res` also stores the simulation result, however, in `rpy2.robjects.vectors.ListVector` version. (To check all class property, please refer to [API](../set_up/_autosummary/scDesign3Py.scDesign3.rst))

Actually, if `return_py` = False, the return value is exactly the same as that in the property. You can call `print()` to show the result, which will give you a totally **R** style output.

In [8]:
print(test.whole_pipeline_res)

$new_count
        AAACCTGAGAGGGATA AAACCTGGTAAGTGGC AAACGGGCAAAGAATC AAACGGGGTACAGTTC
Pyy                    0                1               60              305
Iapp                   0                0                2              334
Chgb                 207                9               22               17
Rbp4                  10                1               12               25
Spp1                   0                0                0                1
Chga                  68               10               25                6
Cck                    5               52                2               17
Ins1                   0                0                0              266
Nnat                   0                0                4               33
Ins2                   0                0                6               15
Neurog3                2                5                0                0
Tmsb4x                 4               21                0                6
X

You can use `rx2` method to get your interested result.

In [9]:
print(test.whole_pipeline_res.rx2("model_aic"))

aic.marginal   aic.copula    aic.total 
  265624.396    -1870.916   263753.481 



```{eval-rst}
.. Caution::
    If you are familiar to `rpy2` package or if you do not need any manipulation of the result, you may set the `return_py` as False. 
    
    If you are new to `rpy2`, you may prefer to set the `return_py` as True as the output will be transformed to a version which may be more familiar to you though the conversion will need extra cost.
```