Translating Cancer Data Commons (CDA) to 🔥 FHIR (Fast Healthcare Interoperability Resources) format.
- from source
# clone repo & setup virtual env
python3 -m venv venv
. venv/bin/activate
pip install -e .
To run the transformer, ensure that CDA raw data is located in the ./data/raw/ directory. If you need to retrieve the raw data, please contact cancerdataaggregator @ gmail.
Usage: cda2fhir transform [OPTIONS]
Options:
-s, --save Save FHIR ndjson to CDA2FHIR/data/META folder.
[default: True]
-v, --verbose
-ns, --n_samples TEXT Number of samples to randomly select - max 100.
-nd, --n_diagnosis TEXT Number of diagnosis to randomly select - max 100.
-nf, --n_files TEXT Number of files to randomly select - max 100.
-f, --transform_files Transform CDA files to FHIR DocumentReference and Group.
-p, --path TEXT Path to save the FHIR NDJSON files. default is
CDA2FHIR/data/META.
--help Show this message and exit.
- example
cda2fhir transform
NOTE: in-case of interest in validating your FHIR data with GEN3, you will need to go through the user-guide, setup, and documentation of GEN3 tracker before running the cda2fhir
commands.
mv ~/.gen3/gen3_client_config.ini ~/.gen3/gen3_client_config.ini-xxx
mv ~/.gen3/gen3-client ~/.gen3/gen3-client-xxx
time cda2fhir validate
{'summary': {'Specimen': 721837, 'Observation': 731005, 'ResearchStudy': 423, 'BodyStructure': 163, 'Condition': 95262, 'ResearchSubject': 160649, 'Patient': 138738}}
real 5m
user 5m
sys 0m5.1s
mv ~/.gen3/gen3-client-xxx ~/.gen3/gen3-client
mv ~/.gen3/gen3_client_config.ini-xxx ~/.gen3/gen3_client_config.ini
This command will validate your FHIR entities and their reference relations to each other. It will also generate a summary count of all entities in each ndjson file.
NOTE: This process may take 5 minutes or more, depending on your platform or compute power due to the size of the current data.
Current integration testing runs on all data and may take approximately 2 hours.
pytest -cov