Skip to content
Branch: master
Go to file

Latest commit


Failed to load latest commit information.
Latest commit message
Commit time


In order to facilitate ease of access, some of the information available through Harvard Personal Genome Project page and the GET-Evidence site has been consolidated into a small SQLite database (~120Mb uncompressed). This project is a collection of scripts to download data, consolidate into a SQLite database, upload to an Arvados project and create an HTML visualization front end for easy exploration of the data.

You can explore the most recent snapshot of the Harvard Personal Genome Project database snapshot available through a Curoverse hosted collection

Quick start

To grab the repository:

$ git clone
$ cd untap

We need to run the application inside a HTTP server,

such as nginx

$ cd $HOME
$ sudo apt-get install nginx
$ sudo /etc/init.d/nginx start
$ mkdir /var/www
$ cat > /etc/nginx/sites-enabled/untap <<EOF
	server {
	  root /var/www;

	  location / {
$ sudo ln -s $HOME/untap /var/www/untap
$ sudo chmod -R 777 /var/www/untap
$ sudo nginx -s reload

or with a python module

$ cd html
$ python -m SimpleHTTPServer

Now we need to obtain a dataset. Either 1) download the snapshot provided at the Untap hosted on Curoverse or 2) follow the instructions in the following section to scrape Tapestry and build your own snapshot. In both cases, the database should be put in the root directory, i.e. /untap/hu-pgp.sqlite3.gz.

Now if you go to Untap.html you should see the application running and tabs such as "Summary" should show graphs when you select a dropdown option (e.g. "allergies").

Updating the Database

The Quick start uses a static snapshot of the database and may not be up-to-date. To re-scrape all the data yourself for a more up-to-date copy, see the following instructions.

You may need several dependencies if they're not installed already.

$ sudo apt-get install jq
$ sudo add-apt-repository -y ppa:ethereum/ethereum
$ sudo apt-get install golang
$ mkdir -p ~/go; echo "export GOPATH=$HOME/go" >> ~/.bashrc
$ echo "export PATH=$PATH:$HOME/go/bin:/usr/local/go/bin" >> ~/.bashrc
$ source ~/.bashrc
$ go get
$ sudo apt-get install parallel

To download the database from and run:

$ ./public-database-snapshot

If you would like to upload to an Arvados project (requires an account on an Arvados system and appropriate config files):

$ ./upload-to-arvados

Installing the html directory in the appropriate place will allow you to see the visualization. Care needs to be taken to make sure the SQLite database file gets copied over properly.

Guided Walkthrough

For a guided walkthrough of how to use this application, see Introduction.


Since the SQLite database is so small (~120Mb uncompressed) it can be loaded into the browser and explored directly. There are a few canned visualizations, explanations of the SQLite schema and custom visualizations available. Sometimes the database takes a while to load so please be patient if you don't immediately see any graphs in the Summary, Variants or Custom section.

Summary Information Visualizations

Age Summary

This includes some canned summary statistics for the Harvard Personal Genome Project cohort, including age distribution, gender, ethnicity, etc



This shows a matrix of participants who have genomic data and variants.


Custom Visualization

This allows you to do your own custom queries. There are some example queries that can be selected in the lower right hand corner.



This page gives the schema for the SQLite database provided.



This page gives some simple queries that allow you to explore the underlying tables that exist in the SQLite database.


Source code is provided under AGPLv3. All collected data from the Harvard Personal Genome Project is under CC0.


No description, website, or topics provided.




No releases published

Contributors 4

You can’t perform that action at this time.