This repository contains a tutorial for exploring FHIR data with Apache Spark in an interactive notebook. This is supported with the Bunsen library, and uses synthetic data generated by the Synthea project to eliminate data security concerns.
Also see the proposal for SQL on FHIR. The Bunsen project is being enhanced to comply to that specification so a future version of this tutorial can follow.
The easiest way to run this is in a Docker container. If you don't have Docker already, download and install the community edition here.
Next, the Bunsen project offers a Docker container that includes it, Project Jupyter, Apache Spark, and needed dependencies. This totals about 5 gigabytes when uncompressed, but that includes common OS and Python images that may be used for other needs. All of it is installed with the following command:
docker pull cerner/bunsen-notebook
See the Bunsen Docker documentation for more information.
Launch the Tutorial
Once the above Docker image is installed, clone or download this bunsen-tutorial git repository. Then change your directory to the bunsen-tutorial folder and launch the tutorial with the following command:
docker run -p 8888:8888 -p 4040:4040 -v $PWD:/home/jovyan/work cerner/bunsen-notebook
or on Windows:
docker run -p 8888:8888 -p 4040:4040 -v %cd%:/home/jovyan/work cerner/bunsen-notebook
Click on the URL displayed at the bottom of the screen and Jupyter will open in your web browser. From there, navigate to work/getting_started.ipynb. From there, just follow the instructions in that notebook!
First-time users can simply read the instructions and execute the cells. As you become familiar with the system, feel free to experiment by editing queries or code and seeing what happens. Any changes you make will be saved to your copy of the notebooks themselves.