# Mirroring a basic `cellxgene prepare` workflow
### The following demonstrates how a typical run of `cellxgene prepare` corresponds to the underlying scanpy processing. For more details, see https://chanzuckerberg.github.io/cellxgene/data.html

In [1]:
import scanpy as sc

raw_data = sc.datasets.pbmc3k() ## Load the raw pbmc3k dataset distributed with scanpy, write to an h5ad file
sc.write('./pbmc3k-raw.h5ad', raw_data)

### Here, we intercept with the `cellxgene prepare` workflow  
This command for `cellxgene prepare` runs all of the following steps under the hood:  
`cellxgene prepare pbmc3k-raw.h5ad --run-qc --recipe seurat --layout tsne --layout umap --output pbmc3k-prepared.h5ad`

In [2]:
## Step 1: Calculate QC metrics and store in the anndata object
sc.pp.calculate_qc_metrics(raw_data, inplace=True) 

In [3]:
## Step 2: Normalize with a very vanilla recipe
normalized_data = sc.pp.recipe_seurat(raw_data, copy=True)

In [4]:
## Step 3: Do some basic preprocessing to run PCA and compute the neighbor graph  
sc.pp.pca(normalized_data)
sc.pp.neighbors(normalized_data)

In [5]:
## Step 4: Infer clusters with the Louvain algorithm  
sc.tl.louvain(normalized_data)

In [6]:
## Step 5: Compute tsne and umap embeddings  
sc.tl.umap(normalized_data)

In [7]:
## Write to output file  
sc.write('pbmc3k-prepared.h5ad', normalized_data)