Example using Cohorts to manage TCGA-BLCA for analysis
- Query GDC for clinical and sample datasets for TCGA-BLCA data (query code to be merged into pygdc)
- Set up a Cohort using Cohorts to manage these data
- Mock-analysis of said Cohort to show functionality of Cohorts.
There are a few steps you will have to follow before using this code.
- Copy
config_template.ini
toconfig.ini
- Install gdc-client.
- Install per the instructions
- Edit the variable
GDC_CLIENT_PATH
inconfig.ini
- Log into GDC, request access to TCGA & download an auth-token
- Gain authorization
- Download the authentication token
- Edit the variable
GDC_TOKEN_PATH
inconfig.ini
Once you have these items set up, you can run one or both of the refresh_*.py
scripts to fetch data from the GDC portal.
Then, you can try out the various *.ipynbs in the repo for yourself, or use them as a starting point for further analysis.
The refresh_*.py
scripts make use of the query_tcga package. This cannot currently be installed via pip
.
Instead, you will want to install as follows:
pip install git+git://github.com/jburos/query_tcga
This code will eventually be merged into the cleaner pygdc package. For now, the merge of these codebases is a WIP.