# 7. Training and Prediction Example Using Cardiac Cell Atlas

The following examples demonstrate how to use the Cardiac Cell Atlas to Train the devCellPy algorithm as well as to use the algorithm for prediction of sample data. 

***Note: Please note that some of the script below are written in R code and other in bash. These will be clearly noted below. 

## Setting Up Files for Training devCellPy

Export Normalized Expression data as a CSV File from the Cardiac Atlas Seurat Object that can be found at: 
    https://zenodo.org/records/7183939#.Y0TuEC-B3T8 under name "cardiac_atlas_seurat_object.rds"
    
Alternatively, the cardiac atlas Seurat object can be converted to an H5AD file as used with scanPy. 

devCellPy allows for the use of either the normalized expression data as a CSV file for an H5AD object. 

***Below we illustrate how to export the Normalized Expression CSV file in R as well as how to convert the cardiac atlas Seurat object to an H5AD file. 
   

### Export Normalized Expression Data as CSV

In [None]:
# R Code

## Import Required R Packages
library(Seurat)
library(scrattch.io)

## Read in Cardiac Atlas Object
cardiacatlas = readRDS("/path/to/cardiac_atlas_seurat_object.rds")

## Export Normalized Expression Data as CSV File

write_dgCMatrix_csv(cardiacatlas@assays$RNA@data, "/path/to/cardiacatlas_norm_express.csv")


### Convert Seurat Object to H5AD Object

In [None]:
# R Code

## Import Required R Packages
library(Seurat)
library(SeuratDisk)

## Read in Cardiac Atlas Object
cardiacatlas = readRDS("/path/to/cardiac_atlas_seurat_object.rds")

## Save Object as a SeuratDisk 

SaveH5Seurat(cardiacatlas, "/path/to/cardiacatlas.h5seurat")

## Convert SeuratDisk Object to AnnData H5AD
## Note: this will save the H5seurat as "cardiacatlas.h5ad". The .X slot will contain normalized expression data

Convert("/path/to/cardiacatlas.h5seurat", dest = "h5ad")


### Create Bash Variables with Input File Paths

Three files are required for training devCellPy including:
1. Normalized expression file (can be csv or H5AD object)
2. Metadata File 
3. Label File

Please refer to Section 2 of the Training Tutorial for more in-depth explanation of each of these file types.

***Note: Metadata and Label Files for the Cardiac Atlas can be found at the Zenodo link:
  https://zenodo.org/records/7183939#.Y0TuEC-B3T8 

In [None]:
# Bash Code

## Normalized Expression File (can be path to CSV or H5AD object)

norm_express= "/path/to/cardiac_atlas_norm_express.csv"

## Metadata File For All Cells

metadata_file= "/path/to/cardiac_atlas_metadata.csv"

# Label Files for Layered Training

label_file = "/path/to/cardiac_atlas_labels.csv"


## Training devCellPy Using Cardiac Atlas

In [None]:
# Bash Code

## Run devCellPy Training
## Note that given that our atlas contains multiple timepoints, we  designate which layer contains timepoint info

    
## TrainAll: training all layers w/o cross validation and metrics. Note: "--testSplit" is removed. 
devCellpy 
    --runMode trainAll 
    --trainNormExpr $norm_express 
    --trainMetadata $metadata_file 
    --labelInfo $label_info 
    --rejectionCutoff 0.5 
    --timepointLayer "Cardiomyocytes"
    

## TrainAll: training all layers w/ cross validation and metrics. Take longer to run due to additional calculations
devCellpy 
    --runMode trainAll 
    --trainNormExpr $norm_express 
    --trainMetadata $metadata_file 
    --labelInfo $label_info 
    --rejectionCutoff 0.5 
    --testSplit 0.1 
    --timepointLayer "Cardiomyocytes"
    

## Feature Ranking to Retrieve Gene Importance Information

Below we show how to run Feature Ranking Using the SHAP algorithm implemented within devCellPy. The path must be given to the .pkl which contains the devCellPy trained LayerObject that will be used to identify top positive and negative gene predictors of cell types. 

While multiple pickle objects can be given as input paths, users using a computing cluster may find it more useful to run these commands for each object separately for each LayerObject. 

In [None]:
#Bash Code

devCellPy 
    --runMode featureRankingOne 
    --trainNormExpr $norm_express  
    --trainMetadata $metadata_file 
    --layerObjectPaths /Cardiac_Atlas_Trained Model/Root_object.pkl
    --featureRankingSplit 0.1

## Prediction of New Data Using Trained devCellPy Object

Below, we provide an example of how to use the trained devCellPy Cardiac Cell Atlas algorithm to predict new data. We illustrate how to conduct single layer prediction as well as to predict cell types across multiple layers.

***Note: The prediction of single layer allows for users to obtain information on the prediction probabilities for all categories within the training dataset. 

### Set Up New Dataset Files for Prediction

In this example we will use data from Li etal 2019 (PMID: 31142541) as input for prediction using the trained Cardiac Atlas. 

***Note: The Li etal 2019 Seurat object can be found in the devCellPy Github under folder "Example Prediction Dataset:
    https://github.com/devCellPy-Team/devCellPy/tree/main/Example%20Prediction%20Dataset

In [None]:
# R Code

## Import Required R Packages
library(Seurat)
library(SeuratDisk)

## Read in Li etal 2019 Object
lietal2019 = readRDS("/path/to/lietal2019_object.rds")

## Export Normalized Expression Data as CSV File

write_dgCMatrix_csv(lietal2019@assays$RNA@data, "/path/to/lietal2019_norm_express.csv")

## Export as an H5AD objcet if desired

SaveH5Seurat(lietal2019, "/path/to/lietal2019.h5seurat")
Convert("/path/to/lietal2019.h5seurat", dest = "h5ad")


### Predict Cell types of New Dataset Using ONE Layer

In [None]:
PredictOne: prediction of query using single layer
* (runMode = predictOne, predNormExpr, layerObjectPaths, rejectionCutoff)

devCellPy
    --runMode predictOne 
    --predNormExpr "/path/to/lietal2019_norm_express.csv"
    --layerObjectPaths "/Cardiac_Atlas_Trained_Model/Root_object.pkl"
    --rejectionCutoff 0.5

### Predict Cell Types of New Data Across MULTIPLE Layers

In [None]:
PredictAll: prediction of all layers w/o val_metadata, each layer influences the next layer
* (runMode = predictAll, predNormExpr, layerObjectPaths, rejectionCutoff)
* Example: 
    
devCellPy
    --runMode predictAll 
    --predNormExpr "/path/to/lietal2019_norm_express.csv"
    --layerObjectPaths "/Cardiac_Atlas_Trained_Model/Root_object.pkl",
        "/Cardiac_Atlas_Trained_Model/Cardiomyocytes/E825/E825_object.pkl",
        "/Cardiac_Atlas_Trained_Model/Cardiomyocytes/E825/E825VentricularCM/E825VentricularCM_object.pkl"
    --rejectionCutoff 0.5