<a href="https://colab.research.google.com/github/Hoilap/Lab_demo/blob/main/docs/notebooks/safe-synthetics/hipaa-transform-synthesize.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a target="_parent" href="https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/safe-synthetics/hipaa-transform-synthesize.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# ⚕ Using Safe Synthetics to support HIPAA compliance

This notebook leverages tailor-made configurations for Safe Synthetics to support HIPAA compliance. You can try with the sample dataset, or test it out using your own dataset.

After specifying a dataset, this notebook will holdout 5% to use for calculating quality & privacy metrics at the end. It will then redact true identifiers in your dataset such as names and addresses, and synthesize your data to obfuscate quasi-identifiers. Finally, it will generate a report for you to measure the quality & privacy of your synthetic data.

## 💾 Install Gretel SDK

In [None]:
%%capture

%pip install -U gretel-client

## 🌐 Configure your Gretel Session

In [None]:
from gretel_client.navigator_client import Gretel

gretel = Gretel(api_key="prompt")

## 🔬 Preview input data

In [None]:
import pandas as pd
ds = "https://gretel-datasets.s3.us-west-2.amazonaws.com/hipaa_patients.csv"
df = pd.read_csv(ds)

print(f"Number of rows: {len(df)}")
df.head()

## 🏃 Run Safe Synthetics

In [None]:
synthetic_dataset = gretel.safe_synthetic_dataset\
    .from_data_source(ds) \
    .transform("transform/hipaa") \
    .synthesize("tabular_ft", {"train": {"params": {"num_input_records_to_sample": 10000}}}, num_records=1000) \
    .create()

In [None]:
synthetic_dataset.wait_until_done()

## 🔬 Preview output data

In [6]:
synthetic_dataset.dataset.df.head()

Unnamed: 0,patient_id,first_name,last_name,date_of_birth,race,weight,height,event_type,event_date,event_name,provider_name,reason,result,details,notes
0,pmc-6283709-1,Debra,Parker,05/02/1947,Asian,180.0,63.0,Admission,01/20/2023,\N,South General Hospital,Hyperthermia and symptoms of heatstroke detected,Stable condition upon admission,{},"Patient presented with severe heat exhaustion,..."
1,pmc-6415376-2,Jennifer,Ellis,11/21/1949,White,150.0,62.0,Symptom,04/12/2023,Abdominal pain assessment,Dr. Imani Patel,Reporting abdominal pain in left upper quadran...,"Non-tender, non-bloody","{""intensity"":""moderate"", ""location"":""left uppe...",The patient showed moderate tenderness during ...
2,pmc-6433976-1,Tracy,Wallace,11/22/1947,Black or African American,185.0,70.0,Diagnosis Test,03/15/2023,HIV test,Central Clinic,Routine checkup,Positive,"{""dosage"":null,""frequency"":null,""referral"":null}",The HIV test showed positive results confirmin...
3,pmc-6431957-1,Mary,Bradley,12/08/2013,Other,50.0,45.0,Medical Examination,07/01/2022,Initial Evaluation,University Medical Center,Exposure to smoke,Normal vitals except increased heart rate,"{""intensity"": null, ""locaton"": null, ""dosage"":...",Patient shows signs of distress after exposure...
4,pmc-6163617-1,Allen,Carrillo,01/23/1964,Hispanic,178.0,70.0,Admission,10/12/2023,Null,Saint Mary's Healthcare,Discharge from hospital due to COVID pandemic,Patients were transferred to ICU upon arrival ...,{},Our patients met no immediate complications ot...


## 📊 Evaluate quality & privacy results

In [None]:
synthetic_dataset.report.table

In [None]:
synthetic_dataset.report.display_in_notebook()