Oct 16, 2017

Sample re-identification pipeline:

This pipeline enables the usage of the minION generated DNA reads for re-identification of DNA samples.

Cell line authentication:

Re-identification of cell lines in (pre-) clinical research is crucial to verify working materials. Using the MinION in conjunction with our identification pipeline allows sample authentication on-site: either in the lab or in the clinic.

Requirements for running the pipeline are a cell line database and minION reads.


The database for Cancer Cell line authentication is available at .

The Cancer Cell line database is built from data generated by the CCLE (Broad Institute:

MinION reads:

The MinION libraries can be prepared by the appropriate genomics library preparation method provided by Oxford Nanopore Technologies. Input material ranges from 200ng-1000ng.

Re-identification of forensic samples:


The database for forensic purposes is not provided here to respect genetic privacy of the individuals in our database.

Publically available genomes: OpenSNP:

MinION reads:

The MinION libraries can be prepared by the appropriate genomics library preparation method provided by Oxford Nanopore Technologies.

Analysis pipeline


The pipeline requires the following components:

Detailed Installation Instructions

These programs have been tested on Ubuntu 14.04 64bit GNU/Linux machine. Other environments might require some adjustments.

Download the latest pipeline code

# Using GIT
git clone
cd personal-identification-pipeline

# Or using ZIP

The setup directory contains helper script to install the required software:

# Install required packages:
sudo ./setup/

# Install python modules:
sudo pip install -r ./setup/requirements.txt

# Install samtools 1.3.1 (will use 'sudo' automatically)

# Install bgzip/tabix 1.3.1 (will use 'sudo' automatically)

# Install BWA 0.7.15 (will use 'sudo; automatically)

Building data files

The personal-identification pipeline requires few pre-processed data files.

Download hg19 reference genome, build BWA index (this will take some time, depending on the machine's hardware. About ~70m on a 2.5Ghz Intel XEON E5):


Download dbSNP-138 Common and build db (requires downloading ~620MB, will take some time depending on the network speed):


Optionally, download Yaniv Erlich's genotype file:



The demo directory contains a simplified example of the pipeline workflow. See ./demo/ for more details.

Test re-ID:

DIR THP1 FAST5 files:

THP1 SNP-array-file:

Example usage for THP1 re-id:

./ test-run THP1-FAST5-dir/ thp1-snp-file-dir/ hg19/hg19.fa


./ –s snp138Common.fixed.txt.gz test-run THP1-FAST5-dir/ thp1-snp-file-dir/ hg19/hg19.fa

Help and Usage information

The following scripts support help and usage information with the --help parameter (-h in case of the shell script): -h --help --help --help --help --help --help --help

“Rapid DNA Re-Identification for Cell Line Authentication and Forensics” Zaaijer et al., 2017


Yaniv Erlich Sophie Zaaijer


Copyright (C) 2016 Yaniv Erlich (

All Rights Reserved. This program is licensed under GPL version 3. See LICENSE file for full details.