# Synthea Setup & 1 000‑Patient FHIR Generation
This Jupyter notebook walks you through cloning **Synthea**, building it with **Gradle**, and generating a 1 000‑patient **FHIR R4** dataset on Windows 10.

Open this notebook in **VS Code** (File ▶ Open File) and execute each cell.  
When you see a `%%bash` cell it will run in the bash shell that the Python kernel uses (Git‑Bash if that is first on your *PATH*).

## 0 Prerequisites
* **Git** & **Git Bash** (install [Git for Windows](https://gitforwindows.org/))
* **winget** client (Windows 10 21H1 or later) – or install Java manually
* **Docker Desktop** (for later HAPI‑FHIR tasks)
* This notebook assumes your Windows username is `evan_` and you want data on remote drive `//Desktop-family/K/`.
Adjust paths as needed.

## 1 Install Java 17
Need to mannually install Java 17
1. Need to add PATH for system environment
2. Need to add JAVA_HOME in system environment

 make sure following powershell command show Java information

In [2]:
!powershell -Command "java -version; Write-Output \$env:JAVA_HOME"


\


java version "17.0.12" 2024-07-16 LTS
Java(TM) SE Runtime Environment (build 17.0.12+8-LTS-286)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.12+8-LTS-286, mixed mode, sharing)


## 2 Clone Synthea

In [12]:
!powershell -Command "if (Test-Path '~/synthea') { Write-Output 'Synthea directory exists at ~/synthea' } else { Write-Output 'Synthea not found in home directory' }"

Synthea not found in home directory


In [13]:
!powershell -Command "$ProgressPreference = 'Continue'; git clone https://github.com/synthetichealth/synthea.git 'C:\Users\evan_\synthea' -v"

Cloning into 'C:\Users\evan_\synthea'...
POST git-upload-pack (193 bytes)
POST git-upload-pack (gzip 3635 to 1822 bytes)


## 3 Build & test Synthea

In [16]:
!powershell -Command "$ProgressPreference = 'Continue'; cd ~/synthea -v"

In [18]:
!powershell -Command "$ProgressPreference = 'Continue'; cd ~/synthea; .\gradlew build check test -v"

Downloading https://services.gradle.org/distributions/gradle-8.9-bin.zip
............10%.............20%.............30%.............40%.............50%.............60%.............70%.............80%.............90%.............100%

Welcome to Gradle 8.9!

Here are the highlights of this release:
 - IDE Integration Improvements
 - Daemon JVM Information

For more details see https://docs.gradle.org/8.9/release-notes.html


------------------------------------------------------------
Gradle 8.9
------------------------------------------------------------

Build time:    2024-07-11 14:37:41 UTC
Revision:      d536ef36a19186ccc596d8817123e5445f30fef8

Kotlin:        1.9.23
Groovy:        3.0.21
Ant:           Apache Ant(TM) version 1.10.13 compiled on January 4 2023
Launcher JVM:  17.0.12 (Oracle Corporation 17.0.12+8-LTS-286)
Daemon JVM:    C:\Program Files\Java\jdk-17 (no JDK specified, using current Java home)
OS:            Windows 11 10.0 amd64



## 4 Generate 1 000 synthetic patients
We’ll export everything into `D:/synthea_out`.  Change `EXPORT_DIR` to any path you like.

In [None]:
!powershell -Command "$ProgressPreference = 'Continue'; 
$EXPORT_DIR='//Desktop-family/K/synthea_out'; 
cd ~/synthea; .\run_synthea -p 1000 --exporter.baseDirectory=$EXPORT_DIR
"

ARG = -p
ARG = 1000
ARG = --exporter.baseDirectory
ARG = //Desktop-family/K/synthea_out
ARG = 
syntheaArgs =  '-p','1000','--exporter.baseDirectory','//Desktop-family/K/synthea_out',
> Task :versionTxt
> Task :compileJava UP-TO-DATE
> Task :processResources UP-TO-DATE
> Task :classes UP-TO-DATE

> Task :run
Scanned 89 modules and 157 submodules.
Loading submodule modules\allergies\allergy_panel.json
Loading submodule modules\allergies\drug_allergy_incidence.json
Loading submodule modules\allergies\environmental_allergy_incidence.json
Loading submodule modules\allergies\food_allergy_incidence.json
Loading submodule modules\allergies\immunotherapy.json
Loading submodule modules\allergies\outgrow_env_allergies.json
Loading submodule modules\allergies\outgrow_food_allergies.json
Loading submodule modules\allergies\severe_allergic_reaction.json
Loading submodule modules\anemia\anemia_sub.json
Loading submodule modules\breast_cancer\chemotherapy_breast.json
Loading submodule modules\breast_c

SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#noProviders for further details.


## 5 Verify output

In [None]:
!powershell -Command "$ProgressPreference = 'Continue';$EXPORT_DIR='//Desktop-family/K/synthea_out';Write-Output 'Number of patient bundles:';Get-ChildItem -Path $EXPORT_DIR/fhir/*.json | Measure-Object | Select-Object -ExpandProperty Count;Write-Output 'Peek at first bundle''s type field (should be ''transaction''):';$firstFile = (Get-ChildItem -Path $EXPORT_DIR/fhir/*.json | Select-Object -First 1).FullName;Get-Content -Path $firstFile -TotalCount 20 | Select-String -Pattern 'type' | Select-Object -First 1"

Number of patient bundles:
1148
Peek at first bundle's type field (should be 'transaction'):

  "resourceType": "Bundle",


