Ultrafast clustering of single-cell flow cytometry data using FlowGrid
Authors: Xiaoxin Ye and Joshua W. K. Ho Contact: email@example.com Copyright © 2018, Victor Chang Cardiac Research Institute
Input data format
Our FlowGrid algorithm could be applied into many format data set but the sample code only accept csv format. In the csv file, the first row is feature name and each columns is seperated by "
,". If you have true label file , you could use --l filename to input label file for testing the ARI of FlowGrid result.
Before using the package, we need to install the dependent package sklearn and numpy.
pip install -r requirements.txt --user
pip install sklearn numpy scipy --user
A summary of the argument of sample code is included in the table below.
|--f||the input file name||required|
|--n||number of bins||required|
|--eps||maximun distance between two bins||required|
|--t||threshold for high density bin||optional (default:40)|
|--o||the output file name||optional (default: out.csv)|
|--l||the true label file name||optional|
After installing all the dependent packages, you could try to use the sample code to run FlowGrid on the sample data.
python sample_code.py --f sample_data.csv --n 4 --eps 1.1 --l sample_label.csv
The predicted label is saved at out.csv and the sample result is as follow.
The number of cells is: 23377 The number of dimensions is: 4 runing time: 0.027 ARI:0.9816