DAFi: User-directed unsupervised filtering and identification of cell populations from flow cytometry data
Yu "Max" Qian, Ph.D., mqian@jcvi.org or qianyu.cs@gmail.com, Ivan Chang, Ph.D., ichang@jcvi.org, and Bob Sinkovits, Ph.D., sinkovit@sdsc.edu
Paper link: https://onlinelibrary.wiley.com/doi/full/10.1002/cyto.a.23371
Lee AJ, Chang I, Burel JG, Lindestam Arlehamn CS, Mandava A, Weiskopf D, Peters B, Sette A, Scheuermann RH, Qian Y. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data. Cytometry A, 2018. 93(6):597-610. PMCID: PMC6030426.
https://github.com/PedroMilanezAlmeida/DAFi
The plugin is developed by Dr. Pedro Milanez-Almeida of Tsang group of NIH/NIAID/CHI, with help and support by Dr. Josef Spidlen and Miguel Velazquez-Palafox of BD/FlowJo.
The DAFi package provides a new framework for cell population identification for flow cytometry (FCM) data. The framework is compatible with many existing clustering algorithms such as Kmeans, Kmeans++, mini-batch Kmeans, gaussian mixture models, k-medoids, self-organizing map, etc, and allows user to input user defined gating hierarchy to convert the aforementioned unsupervised algorithms into powerful semi-supervised and automatic cell population identification approach. First, the data is read and preprocessed, then configuration files are parsed and implemented as user defined gating directions. Clustering algorithm choosen in the initiation is then applied to the data events of the whole FCM data at the start of the iterative gating/filtering loop, and optionally again at each specified population subset (reclustering). At each gating step, the user can specify to apply bisecting (events filtering based on user defined boundaries just like in manual gating), slope-based (events filtering based on user defined slopes), or cluster centroids based filtering (filtering all events members of a cluster based on their centroid's inclusion or exclusion by the user defined gates). Outputs include population events and percentages table, as well as an events printout table consisting of all event's transformed channel values and population membership info for external analysis and plotting. Several built-in plotting options are also available, e.g. 2D dot plots of the user specified gating channels, and centroids overlay to the 2D dot plots.
- C - DAFi written in C
- FCSTrans - DAFi preprocessing of FCS files written in R via the flowCore package
- Notebooks - Jupyter Notebook templates used for DAFi report generation and data analysis
- Python - Accessory python scripts for DAFi pipeline
- R - DAFi written in R
- docker - Docker container build definitions for DAFi-jupyter and R-DAFi
- docs - R-DAFi documentations generated by pkgdown
- inst - DAFi installation and testing support files
- man - R-DAFi manual pages
- vignettes - R-DAFi vignettes
There are two concurrent implementations of the DAFi framework, one for the HPC environment and uses optimized C binary codes to provide extensive parallelization for large datasets, while the other is for the desktop environment and uses existing R-based packages such as flowCore, FlowSOM, and ClusterR to provide flexibility of choosing different clustering algorithms and recursive filtering strategies. Both versions’ source codes and binary releases are available through the github repository, as well as their docker images for trouble free installation.
-
A raw FCS file for the R implementation of DAFi or a transformed text-based FCS file for the C implementation of DAFi.
-
inclusion.config: a 12-column tab delimited file, for recursive data filtering
Pop_ID | DimensionX | DimensionY | Min_X | Max_X | Min_Y | Max_Y | Parent_ID | Cluster_Type(0: Clustering; 1: Bisecting; 2: Slope-based) | Visualize_or_Not | Recluster_or_Not | Cell_Phenotype(optional) |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 4 | 20 | 70 | 5 | 55 | 0 | 0 | 0 | 0 | Lymphocyte |
2 | 1 | 2 | 30 | 90 | 0 | 110 | 1 | 1 | 0 | 1 | Singlets |
3 | 4 | 5 | 100 | 150 | 80 | 140 | 2 | 2 | 1 | 1 | LiveSinglets |
4 | 19 | 17 | 76 | 140 | 106 | 200 | 3 | 1 | 0 | 1 | CD4T |
5 | 19 | 17 | 76 | 140 | 55 | 105 | 3 | 1 | 0 | 0 | CD8T |
6 | 8 | 7 | 81 | 140 | 50 | 120 | 3 | 1 | 0 | 0 | CD4Treg |
7 | 8 | 7 | 20 | 80 | 25 | 90 | 3 | 1 | 0 | 0 | CD4Tnonreg |
- exclusion.config: a 11-column tab delimited file with the same format, but for reversed filtering:
Pop_ID | DimensionX | DimensionY | Min_X | Max_X | Min_Y | Max_Y | Parent_ID | Cluster_Type(0: Clustering; 1: Bisecting; 2: Slope-based) | Visualize_or_Not | Recluster_or_Not |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 4 | 0 | 85 | 100 | 200 | 0 | 0 | 1 | 0 |
Check the releases to obtain the latest release
For the DAFi R implementation framework, R version > 3.4 is required (https://cran.r-project.org/bin/). In addition, please have installed:
- flowCore (https://www.bioconductor.org/packages/release/bioc/html/flowCore.html)
- flowViz (http://bioconductor.org/packages/release/bioc/html/flowViz.html)
- ClusterR (https://cran.r-project.org/web/packages/ClusterR/index.html)
- FlowSOM (https://bioconductor.org/packages/release/bioc/html/FlowSOM.html)
For automated build and install, including dependent packages listed above, please install the devtools library
install.packages("devtools")
so you can initiate the automated install of DAFi package
devtools::install_github("JCVenterInstitute/DAFi-gating", build_vignettes = TRUE)
then checkout the built-in vignette for the DAFi library for more documentation
library(DAFi)
browseVignettes(DAFi)
icc or gcc compilers required for compiling binary from source.
For optimal performance please compile with intel optimization flags:
icc -O3 -xHost -o dafi_gating DAFi-gating_omp.c -lm
In addition, precompiled binaries without enhanced optimizations is available under releases
If you have docker containerization system enabled, you can download the pre-configured Dockerbuild and build the DAFi dockerized container that will allow you to run a local Jupyter or R-studio server with all necessary packages. You can also find the dafi-jupyter and r-dafi containers on the docker hub