In [None]:
import os
import sys

os.chdir("..")
sys.path.append("../../")

# Quick Start

## Introduction

The **R** package **scDesign3** is a unified probabilistic framework that generates realistic in silico high-dimensional single-cell omics data of various cell states, including discrete cell types, continuous trajectories, and spatial locations by learning from real datasets. **pyscDesign3** is the python interface for **scDesign3**.

As a quick start, we demonstrate how to use **pyscDesign3** to simulate an scRNA-seq dataset with one continuous developmental trajectory.

## Step 1: Import packages and Read in data

### import pacakges

In [None]:
import anndata as ad
import numpy as np
import pyscDesign3

### Read in data

The raw data is from the [scvelo](https://scvelo.readthedocs.io/scvelo.datasets.pancreas/), which describes pancreatic endocrinogenesis. We pre-select the top 1000 highly variable genes and filter out some cell types to ensure a **single trajectory**.

To save time, we only use the top 30 genes.

In [None]:
data = ad.read_h5ad("data/PANCREAS.h5ad")
data = data[:, 0:30]
data

## Step 2: `scdesign3()` performs all-in-one simulation

First create an instance of the `scDesign` class to use the `scdesign3()` class method.

In [None]:
test = pyscDesign3.scDesign3()

The function `scdesign3()` takes in an `anndata.AnnData` object with the cell covariates (such as cell types, pesudotime, or spatial coordinates) stored in the `anndata.AnnData.obs`, and performs the all-in-one simulation.

In [None]:
simu_res = test.scdesign3(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    mu_formula="s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula="s(pseudotime, k = 5, bs = 'cr')",
    family_use="nb",
    usebam=True,
    corr_formula="1",
    copula="gaussian",
)

```{eval-rst}
.. Note::
    Details of the usage of the `scdesign3()` function will be shown in :doc:`tutorial <./all_in_one>` section.
```

## Step 3: Construct new `anndata.AnnData` object with the simulated result

Besides constructing the simulated `anndata.AnnData` object, we can also calculate the log transformed data for visualization.

In [None]:
simu_data = ad.AnnData(X=simu_res["new_count"], obs=simu_res["new_covariate"])
simu_data.layers["log_transformed"] = np.log1p(simu_data.X)
data.layers["log_transformed"] = np.log1p(data.X)

## Step 4: Visualization

In [None]:
plot = pyscDesign3.plot_reduceddim(
    ref_anndata=data,
    anndata_list=simu_data,
    name_list=["Reference", "scDesign3"],
    assay_use="log_transformed",
    if_plot=True,
    color_by="pseudotime",
    n_pc=20,
    point_size=5,
)

UMAP plot

In [None]:
plot["p_umap"]

PCA plot

In [None]:
plot["p_pca"]