Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time
November 17, 2016 09:40
July 28, 2016 11:09
November 9, 2017 20:41
March 23, 2017 17:07
July 28, 2016 17:48
November 17, 2016 09:39
March 24, 2017 10:09
November 28, 2016 21:49


Pattern Recognition for Cell-free DNA

Predict a fastq is cfdna or not

# predict a single file
python <single_fastq_file>

# predict files
python <fastq_file1> <fastq_file2> ... 

# predict files with wildcard
python *.fq

warning: this tool doesn't work for trimmed fastq

prediction output

For each file given in the command line, this tool will output a line <prediction>: <filename>, like

cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R1_001.fastq.gz
cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R2_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R1_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R2_001.fastq.gz

Add -q or --quite to enable quite output mode, in which it will only output:

  • a file with name of cfdna, but prediction is not-cfdna
  • a file without name of cfdna, but prediction is cfdna

Train a model

This tool has a pre-trained model (cfdna.model), which can be used for prediction. But you still can train a model by yourself.

  • prepare/link all your fastq files in some folder
  • for files from cfdna, include cfdna (case-insensitive) in the filename, like 20160220-cfdna-015_S15_R1_001.fq
  • for files from genomic DNA, include gdna (case-insensitive) in the filename, like 20160220-gdna-002_S2_R1_001.fq
  • for files from FFPE DNA, include ffpe (case-insensitive) in the filename, like 20160123-ffpe-040_S0_R1_001.fq
  • run:
python /fastq_folder/*.fq


If you used CfdnaPattern for your publication, please cite:

Full options:

python <fastq_files> [options] 

  --version             show program's version number and exit
  -h, --help            show this help message and exit
                        specify which file to store the built model.
  -a ALGORITHM, --algorithm=ALGORITHM
                        specify which algorithm to use for classfication,
                        candidates are svm/knn/rbf/rf/gnb/benchmark, rbf means
                        svm using rbf kernel, rf means random forest, gnb
                        means Gaussian Naive Bayes, benchmark will try every
                        algorithm and plot the score figure, default is knn.
  -c CFDNA_FLAG, --cfdna_flag=CFDNA_FLAG
                        specify the filename flag of cfdna files, separated by
                        semicolon. default is: cfdna
  -o OTHER_FLAG, --other_flag=OTHER_FLAG
                        specify the filename flag of other files, separated by
                        semicolon. default is: gdna;ffpe
  -p PASSES, --passes=PASSES
                        specify how many passes to do training and validating,
                        default is 10.
  -n, --no_cache_check  if the cache file exists, use it without checking the
                        identity with input files