# DevCellPy Tutorial

DevCellPy is a Python package designed for hierarchical multilayered classification of cells based on single-cell RNA-sequencing (scRNA-seq). It implements the machine learning algorithm Extreme Gradient Boost (XGBoost) (Chen and Guestrin, 2016) to automatically predict cell identities across complex permutations of layers and sublayers of annotation. An example classification hierarchy is illustrated below.

![image-4.png](attachment:image-4.png)

Given DevCellPy's highly customizable classification scheme, users can input the annotation hierarchy of their scRNA-seq datasets into DevCellPy to guide the automatic classification and prediction of cells according to the provided hierarchy. DevCellPy allows users to designate any identity at each layer of classification and is not constrained by cell type——for example, assigning timepoint as one of the annotation layers allows for cell identity predictions at that layer to be conditioned on the age of the cells. In addition to hierarchical cell classification, DevCellPy implements the SHapley Additive exPlanations (SHAP) package (Lundberg etal, 2020), which provides the user with interpretability methods for the model and determines the positive and negative gene predictors of cell identities across all annotation layers.

Below we provide a comprehensive tutorial on DevCellPy's usage as well as overall concepts in its design.

Paper: Galdos Xu etal. 2021

## DevCellPy Back End

DevCellPy implements a `Layer` object to maintain information regarding each level in the classification. The `Layer` object is encapsulated and independent, and it contains key information including the name of the layer, a dictionary of any sublayers succeeding it, a trained XGBoost model, and more. `Layer` objects can be exported from and imported into the DevCellPy module.

## Installation Notes

DevCellPy has been formatted into a wrapper function that can be easily installed through pip and run through the command line of the Terminal or Command Prompt.

**NOTE:** All Python and XGBoost versions must remain the same throughout usage of all training, predicting, and feature ranking options. Ex) If Python 3.7 is used to train a dataset, Python 3.7 must be used to predict a query dataset using the trained dataset.

<a id='toc'></a>

# Table of Contents

1) [Pre-DevCellPy Data Preparation](1.predevcellpy_tutorial.ipynb#predevcellpy)

2) [DevCellPy Train](2.train_tutorial.ipynb#train)

3) [DevCellPy Predict](3.predict_tutorial.ipynb#predict)

4) [DevCellPy Feature Ranking](4.featureranking_tutorial.ipynb#featureranking)

5) [Post-DevCellPy R Analysis](5.postdevcellpy_tutorial.ipynb#postdevcellpy)

6) [Cardiac Developmental Atlas Option](6.cardiacdevatlas_tutorial.ipynb#cardiac)

7) [DevCellPy Run Options Summary and Examples](7.summary.ipynb#summary)

8) [DevCellPy Code](8.code.ipynb#code)

9) [References](9.references.ipynb#references)