fast webservices based query tool for large sets of genomic features
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
annotations gene/exon processing now outputs for all 0 coverages May 14, 2018
client @ cdc1d8b
deploy fixed missing sample bug in lucene index builing script Oct 17, 2018
docs
instances
performance_benchmark fixed benchmark runner; added uncompressed tabix; updated sqlite_noin… Jan 6, 2017
performance_tests
schemas
scripts added comments to calc program Sep 18, 2018
.gitignore
.gitmodules
Dockerfile
README.md
build_docker.sh moved Dockerfile to root; added docker-related scripts Aug 30, 2018
bulk_test_large_samples.txt.2.b64
bulk_test_large_samples.txt.2.b64.output
cmds.txt added git submodule to cmds.txt Apr 27, 2018
deploy_docker.sh
entrypoint.sh
extract_splice_sites.py
jons_breaking_bulk_query.b64
license.txt added CC BY-NC 4.0 license file and header Feb 15, 2017
lucene_indexer.py
requirements.txt initial docker file for base Jun 11, 2017
run_docker.sh
snample.py
snannotation.py
snapconfshared.py snapconf->snapconfshared major consolidation Sep 18, 2018
snaptron.py snapconf->snapconfshared major consolidation Sep 18, 2018
snaptron_server
snaptronws.py snapconf->snapconfshared major consolidation Sep 18, 2018
snaputil.py
sniterator.py
snquery.py

README.md

Snaptron

fast webservices based query tool for searching exon-exon splice junctions and related sample metadata

Ask questions in the project's

Join the chat at https://gitter.im/snaptron/Lobby

User Guide

http://snaptron.cs.jhu.edu

Deployment

git clone https://github.com/ChristopherWilks/snaptron.git

NOTE: Avoid setting Snaptron up on a Lustre or NFS filesystem since Snaptron's reliance on SQLite may cause problems on those systems.

To setup an instance based on a particular compilation (srav1, srav2, gtex, tcga):

./deploy_snaptron.sh srav1

This process will take at least 10's of minutes and may go for an hour or more depending on your bandwidth, storage, and compute capacity.

It has to build all the dependencies, download all the source data, and create multiple indices.

For the largest of the compilations (srav2) the final data footprint will be ~75 gigabytes on disk. About 54 gigabytes of this is in the SQLite database which is created locally once the raw data has been downloaded from the Snaptron server. The data transfer is therefore ~20 gigabytes.

The PyLucene install in particular requires extensive dependencies and several minutes to build. Through it will display various errors and warnings. These are not critical as long as it ends with this output:

Finished processing dependencies for lucene==4.10.1

Enabling uncompressed Tabix

To enable the uncompressed version of the Tabix indices (faster than compressed), you must first download our modified version of [HTSlib 1.2.1] (http://snaptron.cs.jhu.edu/data/htslib-1.2.1_nocomp.tar.gz) source which sets the compression level to 0 (no compression).

You must build the source and make sure the resulting bgzip binary is before any other bgzip versions in your PATH.

Then run the above script with an additional argument:

./deploy_snaptron.sh srav1 1

Running the Snaptron server

within the snaptron working directory:

source python/bin/activate
python ./srav1_snaptron_server --no-daemon

The Snaptron server defaults to port 1555 on localhost.

Tests

Snaptron has both unit tests and system tests ("round trip testing"). These only work for the SRAv1 and SRAv2 compilations.

In a separate terminal in the Snaptron working directory run:

Unit Tests

source ./python/bin/activate
python ./test_snaptron.py

System Tests

These require the Snaptron server to be running.

./tests.sh 1

The system tests use file diffing to determine if the services are working correctly.