Cancer Gene Trust Daemon
[The Cancer Gene Trust] (https://genomicsandhealth.org/work-products-demonstration-projects/cancer-gene-trust) is a simple, real-time, global network for sharing somatic cancer data and associated clinical information.
The Cancer Gene Trust Daemon (cgtd) stores steward submissions in a distributed, replicated and decentralized content addressable database. It provides a basic HTML interface and RESTful API to add, list and authenticate submissions as well as the peering relationship between stewards.
search.cancergenetrust.org is an example search engine accross all current stewards that builds a searchable index with network viewer.
Submissions consist of a JSON manifest with a list of fields and files. Fields typically include de-identified clinical data (i.e. tumor type). Files typically consist of somatic variant vcf files and gene expression tsv file. Manifest's and files are stored and referenced by the [multihash] (https://github.com/jbenet/multihash) of their content.
Eash steward has a top level JSON index file containing it's dns domain, list of submissions by multihash and list of peers by address. Each steward has a public private key pair which is used to authenticate their submissions. A steward's address is the multihash of their public key. A steward's address resolves to the multihash of the latest version of its index. Updates to a steward's index file are signed using their private key. This provides authentication and authorization for its contents as well as any other content referenced via multihash within it including all submissions.
The current underlying implementation leverages [ipfs] (http://ipfs.io) for storage and replication and ipns for address resolution and public/private key operations. The server is implemented using python and [flask] (http://flask.pocoo.org/)
Running a Production Instance
Note: The only dependencies for the following is make and docker.
Start the ipfs server and store in ./data/ and generate a default configuration and public/private key pair in ./data/config:
Reset the steward's index to no submissions, no peers, set a domain:
DOMAIN=lorem.edu make reset
Startup the cgtd container listening on port 80:
To verify both cgtd and ipfs are working you can query your steward's address:
To make a test submission:
or via curl:
docker exec -it cgtd curl -X POST localhost:5000/v0/submissions \ -F "a_field_name=a_field_value" \ -F files=@tests/ALL/SSM-PAKMVD-09A-01D.vcf
To access the submission:
Access control for mutable operations such as adding submissions or peers is restricted to localhost as a poor mans authentication. As a result we curl from within the cgtd container above.
To populate a server with a bunch of test data:
Finally to see the index for you server including submissions:
Build, Debug and Test Locally
Build a local cgtd docker container:
Start ipfs, initialize, and start a cgtd container in debug:
make ipfs make reset make debug
This runs the cgtd container listening on port 5000 out of the local folder so you can make changes and it will live reload.
To run tests open another terminal window and:
Vancouver GA4GH 2016 Presentation: https://goo.gl/F5Asym
Overview and White Paper: https://goo.gl/1RyUQ2
Search Example: http://search.cancergenetrust.org