Skip to content
This repository has been archived by the owner on May 31, 2024. It is now read-only.
/ OpenGDC Public archive

An open-source Java tool to automatically extract and convert all clinical and genomic data from the Genomic Data Commons to BED, GTF, CSV, and JSON format

License

Notifications You must be signed in to change notification settings

cumbof/OpenGDC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenGDC

OpenGDC is an open-source Java tool for the automatic extraction, extension, and conversion in BED, GTF, CSV, JSON, and XML format of all the genomic experiments and clinical information from the Genomic Data Commons (GDC) portal https://gdc.cancer.gov/.

How to use

This is a NetBeans project. Just clone the repo, load it into NetBeans, set the GUI.java class as the main class, and compile (JRE 1.8 or higher is required). Double click on the produced JAR and start playing with OpenGDC.

Build a repository

The software includes a built-in mode to create a repository with all the original public available data of GDC and the converted once. To enable this mode, set the UpdateScheduler.java class as the main class of the project, and produce your JAR. This requires a date as argument like in the following example:

java -jar UpdateScheduler.jar 2020-01-01

The specified date is internally used to filter and retrieve the GDC data produced starting from Jan 01, 2020 (in this case). To create an automatic procedure to maintan the repository up to date, the most easy solution is to exploit crontab to schedule the execution of the software one time every X days. This can be done by creating a simple bash script like the following one:

#!/bin/bash
datetime=$( tail -n 1 opengdc-history.txt )
java -jar UpdateScheduler.jar $datetime
date +\%Y-\%m-\%d >> opengdc-history.txt

This script exploit an external TXT file opengdc-history.txt which take trace of the last day on which the execution of the software has been performed. We recommend to initialise the opengdc-history.txt file with just a single line containing a date as far as possible from the start up of the GDC program to initially build the repo with all the public available data.

Links

Credits

Please credit OpenGDC in your manuscript by citing:

Eleonora Cappelli, Fabio Cumbo, Anna Bernasconi, Arif Canakoglu, Stefano Ceri, Marco Masseroli, and Emanuel Weitschek. "OpenGDC: unifying, modeling, integrating cancer genomic data and clinical metadata" Appl. Sci. 2020, 10(18), 6367. https://doi.org/10.3390/app10186367

About

An open-source Java tool to automatically extract and convert all clinical and genomic data from the Genomic Data Commons to BED, GTF, CSV, and JSON format

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages