<a target="_parent" href="https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/safe-synthetics/hipaa-transform-synthesize.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# ⚕ Using Safe Synthetics to support HIPAA compliance

This notebook leverages tailor-made configurations for Safe Synthetics to support HIPAA compliance. You can try with the sample dataset, or test it out using your own dataset.

After specifying a dataset, this notebook will holdout 5% to use for calculating quality & privacy metrics at the end. It will then redact true identifiers in your dataset such as names and addresses, and synthesize your data to obfuscate quasi-identifiers. Finally, it will generate a report for you to measure the quality & privacy of your synthetic data.

## 💾 Install Gretel SDK

In [1]:
%%capture

%pip install -U gretel-client

## 🌐 Configure your Gretel Session

In [2]:
from gretel_client.navigator_client import Gretel

gretel = Gretel(api_key="prompt")

Gretel API Key: ··········
Logged in as 20040817dkn@gmail.com ✅


INFO:gretel_client.navigator_client:Using project: default-sdk-project-5551d3d6ac5971e
INFO:gretel_client.navigator_client:Project link: https://console.gretel.ai/proj_2zp921NWiDOu8m1uwNxlAfSzWlj


## 🔬 Preview input data

In [3]:
import pandas as pd
ds = "https://gretel-datasets.s3.us-west-2.amazonaws.com/hipaa_patients.csv"
df = pd.read_csv(ds)

print(f"Number of rows: {len(df)}")
df.head()

Number of rows: 1725


Unnamed: 0,patient_id,first_name,last_name,date_of_birth,race,weight,height,event_type,event_date,event_name,provider_name,reason,result,details,notes
0,pmc-6431471-1,Aisha,Liang,04/17/1960,Asian,135.0,61.0,Admission,04/17/2023,Initial admission,Dr. Rosa Fernandez,"Generalized malaise, dyspnea, cough",\N,"{""intensity"": ""N/A"", ""location"": ""N/A""}",Patient admitted with symptoms including malai...
1,pmc-6203866-2,Alejandro,Gomez,05/16/1978,Hispanic,165.0,70.0,Admission,01/10/2023,\N,St. Mary's Hospital,\N,\N,"{""intensity"": ""medium"", ""location"": ""thorax""}",Patient admitted for work-up related to thorax...
2,pmc-6589433-1,Harpreet,Chaudhry,03/10/1941,Asian,144.0,61.0,Symptom,02/20/2018,Foreign body sensation under tongue and dry mouth,Oral Health Associates,foreign body sensation under tongue and dry mouth,\N,"{""intensity"":""medium"",""location"":""under the to...",Patient reported foreign body sensation under ...
3,pmc-6320630-1,Aisha,Smith,06/20/2005,Black or African American,134.0,64.0,Admission,08/01/2023,\N,Central Medical Hospital,Checkup for kyphoscoliosis,Admitted,"{""intensity"": null, ""location"": null}",The patient is admitted for further diagnosis ...
4,pmc-8678078-1,Ana,Luna,11/23/1976,Hispanic,140.0,65.0,Symptom,02/12/2020,Refractory pain reported in the cervical spine,Dr. R. Gomez,Pain in cervical spine,\N,"{""intensity"":""severe"",""location"":""cervical spi...",Patient reports severe ongoing pain and limite...


## 🏃 Run Safe Synthetics

In [4]:
synthetic_dataset = gretel.safe_synthetic_dataset\
    .from_data_source(ds) \
    .transform("transform/hipaa") \
    .synthesize("tabular_ft", {"train": {"params": {"num_input_records_to_sample": 10000}}}, num_records=1000) \
    .create()

INFO:gretel_client.safe_synthetics.dataset:Configuring generator for data source: https://gretel-datasets.s3.us-west-2.amazonaws.com/hipaa_patients.csv
INFO:gretel_client.safe_synthetics.dataset:Configuring holdout: 0.05
INFO:gretel_client.safe_synthetics.dataset:Configuring transform step
INFO:gretel_client.safe_synthetics.dataset:Configuring synthetic data generation model: tabular_ft
INFO:gretel_client.workflows.builder:▶️ Creating Workflow: w_2zp9BfVen4935xNK8DFgpR3sexW
INFO:gretel_client.workflows.builder:▶️ Created Workflow Run: wr_2zp9BfV0iedWvoHUutJJokyVleF
INFO:gretel_client.workflows.builder:🔗 Workflow Run console link: https://console.gretel.ai/workflows/w_2zp9BfVen4935xNK8DFgpR3sexW/runs/wr_2zp9BfV0iedWvoHUutJJokyVleF


In [5]:
synthetic_dataset.wait_until_done()

INFO:gretel_client.workflows.workflow:Fetching task logs for workflow run wr_2zp9BfV0iedWvoHUutJJokyVleF
INFO:gretel_client.workflows.workflow:Got task wt_2zp9BkPK28bOg9kyN29yby8V5vh
INFO:gretel_client.workflows.workflow:Workflow run is now in status: RUN_STATUS_ACTIVE
INFO:gretel_client.workflows.workflow:[read-data-source] 2025-07-13 14:30:31.524201+00:00 Preparing step 'read-data-source'
INFO:gretel_client.workflows.workflow:[read-data-source] 2025-07-13 14:30:39.259223+00:00 Starting 'data_source' task execution
INFO:gretel_client.workflows.workflow:[read-data-source] 2025-07-13 14:30:41.497083+00:00 Task 'data_source' executed successfully
INFO:gretel_client.workflows.workflow:[read-data-source] 2025-07-13 14:30:41.497509+00:00 Task execution completed. Saving task outputs.
INFO:gretel_client.workflows.workflow:[read-data-source] 2025-07-13 14:30:42.149866+00:00 Task outputs saved.
INFO:gretel_client.workflows.workflow:[read-data-source] Task Status is now: RUN_STATUS_COMPLETED
IN

## 🔬 Preview output data

In [None]:
synthetic_dataset.dataset.df.head()

## 📊 Evaluate quality & privacy results

In [None]:
synthetic_dataset.report.table

In [None]:
synthetic_dataset.report.display_in_notebook()