Transplant2Mongo allows users to easily insert the complex set of UNOS STAR tab-separated variable files containing U.S. organ transplant data into a Mongo database using Python.
The intended user is a researcher interested in doing analysis on UNOS STAR data. This software parses and inserts the UNOS STAR text files into a Mongo Database such that the database can be easily queried using open-source software.
Transplant2Mongo was has been tested with UNOS STAR files from the OPTN obtained in June 2014 and March 2015.
Installation requires the following components:
- MongoDB 2.6+
- Python 3.6 and the packages: pymongo, pandas, and tqdm
- UNOS STAR File data, which must be obtained directly from the OPTN.
These instructions have been tested on macOS 10.14, MongoDB 3.4, and Python 3.6; and Ubuntu 14.04, MongoDB 2.6, and Python 3.7.
After installing MongoDB using the installation instructions, start it with
mongod
If you're new to MongoDB, you may first need to create the default directory mongo uses to run the server.
mkdir -p /data/db
Permission issues may prevent the default directory creation, in that case you can specify a different directory where you do have read/write access
mkdir -p ~/data/db
mongod --dbpath ~/data/db
Next, install Python dependencies using
pip install pymongo pandas tqdm seaborn
Download transplant2mongo
from GitHub using
git clone https://github.com/ceharvs/transplant2mongo/
cd transplant2mongo
The GitHub repository comes with sample data (synthetic - not based on real patient information) to test the install with. We suggest running the code with the sample data first to verify the install completed properly and your environment is set up properly.
Run the following to create a database using the sample data and test.
From the command line, execute
make
This will import the test data into a Mongo database. To test the import, execute
make test-sample-data
After verifying the installation using the test data, either modify the UNOS_DATA
variable in the Makefile to point to the directory of actual STAR files or pass it as a variable, e.g.,
make UNOS_DATA=/path/to/Delimited Text File/
Parameters in the Makefile include:
UNOS_DATA
: The location the 'Delimited Text File' directory. By default, this is the sample data directory included in the repository.COMPONENTS
: The UNOS STAR files that you have access to and are in theUNOS_DATA
directory. This is a list of file types (which includedeceased living intestine kidpan liver thoracic
) separated by a single space. The default setting for the Makefile uses all possible components.SERVER
: The location of database client, by default this is 'localhost' and should be 'localhost' unless the database will be hosted on a remote server. Mongodb must be running at this location.DB
: The name of the database to be used within Mongodb, by default, this is 'organ_data'.
Robo 3T (https://robomongo.org/) can be used to browse the MongoDB database. After installation, select "new connection" and use the defaults.
An example Jupyter notebook, query-samples.ipynb, can be used for analysis and to get started with queries and generating statistical analysis and graphics.
Launch Jupyter Notebooks by running
jupyter notebook query-examples.ipynb
from the command line within the directory transplant2mongo
. This should open a web browser where you can click to open the file.
The included Python script, query.py
, performs sample database queries and prints the output to a CSV file. By default the output will print to output.csv
, but this can be altered via command line arguments:
python query.py --server localhost --db organ_data --collection Deceased_Donor --attributes ABO AGE_DON GENDER_DON --file_name output.csv
The script takes in the client, database name, collection name, and a list of attributes to retrieve and print out to the CSV file. Running with the --test
option will only pull the first five entires and won't save the output to a file.
Christine Harvey, The MITRE Corporation (ceharvey@mitre.org / ceharvs@gmail.com)
Approved for Public Release; Distribution Unlimited. The MITRE Corporation. Case Number 16-2039.
The data reported here have been supplied by the United Network for Organ Sharing as the contractor for the Organ Procurement and Transplantation Network. The interpretation and reporting of these data are the responsibility of the author(s) and in no way should be seen as an official policy of or interpretation by the OPTN or the U.S. Government.