diff --git a/nemo/NeMo-Safe-Synthesizer/README.md b/nemo/NeMo-Safe-Synthesizer/README.md
new file mode 100644
index 00000000..805d45b2
--- /dev/null
+++ b/nemo/NeMo-Safe-Synthesizer/README.md
@@ -0,0 +1,32 @@
+# NeMo Safe Synthesizer Example Notebooks
+
+
+This directory contains the tutorial notebooks for getting started with NeMo Safe Synthesizer.
+
+## 📦 Set Up the Environment
+
+We will use the `uv` python management tool to set up our environment and install the necessary dependencies. If you don't have `uv` installed, you can follow the installation instructions from the [uv documentation](https://docs.astral.sh/uv/getting-started/installation/).
+
+Install the sdk as follows:
+
+```bash
+uv venv
+source .venv/bin/activate
+uv pip install nemo-microservices[safe-synthesizer]
+```
+
+
+Be sure to select this virtual environment as your kernel when running the notebooks.
+
+## 🚀 Deploying the NeMo Safe Synthesizer Microservice
+
+To run these notebooks, you'll need access to a deployment of the NeMo Safe Synthesizer microservice. You have two deployment options:
+
+
+### 🐳 Deploy the NeMo Safe Synthesizer Microservice Locally
+
+Follow our quickstart guide to deploy the NeMo safe synthesizer microservice locally via Docker Compose.
+
+### 🚀 Deploy NeMo Microservices Platform with Helm
+
+Follow the helm installation guide to deploy the microservices platform.
diff --git a/nemo/NeMo-Safe-Synthesizer/advanced/advanced_privacy.ipynb b/nemo/NeMo-Safe-Synthesizer/advanced/advanced_privacy.ipynb
new file mode 100644
index 00000000..f8560a21
--- /dev/null
+++ b/nemo/NeMo-Safe-Synthesizer/advanced/advanced_privacy.ipynb
@@ -0,0 +1,304 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "630e3e17",
+ "metadata": {},
+ "source": [
+ "# 🔐 NeMo Safe Synthesizer: Advanced Privacy (Differential Privacy)\n",
+ "\n",
+ "> ⚠️ **Warning**: NeMo Safe Synthesizer is in Early Access and not recommended for production use.\n",
+ "\n",
+ "
\n",
+ "\n",
+ "In this notebook, we create synthetic tabular data using the NeMo Microservices Python SDK with differential privacy enabled. The notebook should take about 1.5 hours to run.\n",
+ "\n",
+ "After completing this notebook, you'll be able to:\n",
+ "- **Use the NeMo Microservices SDK** to interact with Safe Synthesizer\n",
+ "- **Enable differential privacy** to provide additional privacy protection\n",
+ "- **Access an evaluation report** on the quality and privacy of the synthetic data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a538526a",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8be84f5d",
+ "metadata": {},
+ "source": [
+ "#### 💾 Install dependencies\n",
+ "\n",
+ "Ensure you have a NeMo Microservices Platform deployment available. If you're using a managed or remote deployment, have the correct base URLs and tokens ready."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9f5d6f5a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "from nemo_microservices import NeMoMicroservices\n",
+ "from nemo_microservices.beta.safe_synthesizer.builder import SafeSynthesizerBuilder\n",
+ "\n",
+ "import logging\n",
+ "\n",
+ "logging.basicConfig(level=logging.WARNING)\n",
+ "logging.getLogger(\"httpx\").setLevel(logging.WARNING)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7395f0c8",
+ "metadata": {},
+ "source": [
+ "### ⚙️ Initialize the NeMo Safe Synthesizer Client\n",
+ "\n",
+ "- The Python SDK provides a wrapper around the NeMo Microservices Platform APIs.\n",
+ "- `http://localhost:8080` is the default URL for `base_url` in quickstart.\n",
+ "- If using a managed or remote deployment, ensure you use the correct base URLs and tokens."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8c15ab93",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "client = NeMoMicroservices(\n",
+ " base_url=\"http://localhost:8080\",\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8f1cfb12",
+ "metadata": {},
+ "source": [
+ "NeMo DataStore is launched as one of the services. We'll use it to manage storage, so set the following:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "426186a3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "datastore_config = {\n",
+ " \"endpoint\": \"http://localhost:3000/v1/hf\",\n",
+ " \"token\": \"\",\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2d66c819",
+ "metadata": {},
+ "source": [
+ "## 📥 Load input data\n",
+ "\n",
+ "Safe synthesizer learns the patterns and correlations of an input data set in order to produce synthetic data with similar properties. Use the sample dataset provided or change the following cell to try with your own data.\n",
+ "\n",
+ "The sample dataset is of a set of customer default payments. It includes columns of Personally Identifiable Information (PII) such as sex, education level, marriage status, and age. In addition, it contains several billing and payments accounts and a binary indicator of whether the next month's payment would default."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9c989a42",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install ucimlrepo || uv pip install ucimlrepo"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7204f213",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from ucimlrepo import fetch_ucirepo \n",
+ " \n",
+ "# fetch dataset \n",
+ "default_of_credit_card_clients = fetch_ucirepo(id=350) \n",
+ "df = default_of_credit_card_clients.data.original\n",
+ " \n",
+ "\n",
+ "# Display the first few rows of the combined DataFrame\n",
+ "print(df.head()) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d8ca3a11",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "87d72c68",
+ "metadata": {},
+ "source": [
+ "## 🏗️ Create a Safe Synthesizer job\n",
+ "\n",
+ "The `SafeSynthesizerBuilder` provides a fluent interface to configure and submit jobs.\n",
+ "\n",
+ "This job will:\n",
+ "- Initialize the builder with the NeMo Microservices client.\n",
+ "- Use the loaded DataFrame as the input data source.\n",
+ "- Configure the job to use the specified datastore for model storage.\n",
+ "- Enable automatic replacement of personally identifiable information (PII).\n",
+ "- Enable differential privacy (DP) with a configurable epsilon.\n",
+ "- Use structured generation to enforce the schema during data generation.\n",
+ "- Submit the job to the microservices platform."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "85d9de56",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "job = (\n",
+ " SafeSynthesizerBuilder(client)\n",
+ " .from_data_source(df)\n",
+ " .with_datastore(datastore_config)\n",
+ " .with_replace_pii()\n",
+ " .with_differential_privacy(dp_enabled=True, epsilon=8.0)\n",
+ " .with_generate(use_structured_generation=True)\n",
+ " .create_job()\n",
+ ")\n",
+ "\n",
+ "print(f\"job_id = {job.job_id}\")\n",
+ "job.wait_for_completion()\n",
+ "\n",
+ "print(f\"Job finished with status {job.fetch_status()}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fa2eacb2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# If your notebook shuts down, it's okay, your job is still running on the microservices platform.\n",
+ "# You can get the same job object and interact with it again by uncommenting the following code\n",
+ "# snippet, and modifying it with the job id from the previous cell output.\n",
+ "\n",
+ "# from nemo_microservices.beta.safe_synthesizer.sdk.job import SafeSynthesizerJob\n",
+ "# job = SafeSynthesizerJob(job_id=\"\", client=client)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "285d4a9d",
+ "metadata": {},
+ "source": [
+ "## 👀 View synthetic data\n",
+ "\n",
+ "After the job completes, fetch the generated synthetic dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7f25574a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Fetch the synthetic data created by the job\n",
+ "synthetic_df = job.fetch_data()\n",
+ "synthetic_df\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "472b4f38",
+ "metadata": {},
+ "source": [
+ "## 📊 View evaluation report\n",
+ "\n",
+ "An evaluation comparing the synthetic data to the input data is performed automatically.\n",
+ "\n",
+ "- Programmatically access key scores (quality and privacy).\n",
+ "- Download the full HTML report with charts and detailed metrics.\n",
+ "- Display the report inline below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7b691127",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Print selected information from the job summary\n",
+ "summary = job.fetch_summary()\n",
+ "print(\n",
+ " f\"Synthetic data quality score (0-10, higher is better): {summary.synthetic_data_quality_score}\"\n",
+ ")\n",
+ "print(f\"Data privacy score (0-10, higher is better): {summary.data_privacy_score}\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d5b1030a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Download the full evaluation report to your local machine\n",
+ "job.save_report(\"evaluation_report.html\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "45f7e22b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Fetch and display the full evaluation report inline\n",
+ "job.display_report_in_notebook()"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "kendrickb-notebooks",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/nemo/NeMo-Safe-Synthesizer/advanced/replace_pii_only.ipynb b/nemo/NeMo-Safe-Synthesizer/advanced/replace_pii_only.ipynb
new file mode 100644
index 00000000..6025a901
--- /dev/null
+++ b/nemo/NeMo-Safe-Synthesizer/advanced/replace_pii_only.ipynb
@@ -0,0 +1,250 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "630e3e17",
+ "metadata": {},
+ "source": [
+ "# 🔒 NeMo Safe Synthesizer: PII Replacement Only\n",
+ "\n",
+ "> ⚠️ **Warning**: NeMo Safe Synthesizer is in Early Access and not recommended for production use.\n",
+ "\n",
+ "
\n",
+ "\n",
+ "In this notebook, we demonstrate how to use the NeMo Microservices Python SDK to replace PII in a tabular dataset. The notebook should take about 15 minutes to run.\n",
+ "\n",
+ "After completing this notebook, you'll be able to:\n",
+ "- **Use the NeMo Microservices SDK** to interact with Safe Synthesizer\n",
+ "- **Run a job to perform PII replacement only** (no novel data generation)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8be84f5d",
+ "metadata": {},
+ "source": [
+ "#### 💾 Install dependencies\n",
+ "\n",
+ "Ensure you have a NeMo Microservices Platform deployment available. If you're using a managed or remote deployment, have the correct base URLs and tokens ready."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9f5d6f5a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from nemo_microservices import NeMoMicroservices\n",
+ "from nemo_microservices.beta.safe_synthesizer.builder import SafeSynthesizerBuilder\n",
+ "\n",
+ "import logging\n",
+ "logging.basicConfig(level=logging.WARNING)\n",
+ "logging.getLogger(\"httpx\").setLevel(logging.WARNING)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "53bb2807",
+ "metadata": {},
+ "source": [
+ "### ⚙️ Initialize the NeMo Safe Synthesizer Client\n",
+ "\n",
+ "- The Python SDK provides a wrapper around the NeMo Microservices Platform APIs.\n",
+ "- `http://localhost:8080` is the default URL for `base_url` in quickstart.\n",
+ "- If using a managed or remote deployment, ensure you use the correct base URLs and tokens."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8c15ab93",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "client = NeMoMicroservices(\n",
+ " base_url=\"http://localhost:8080\",\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3e1c5697",
+ "metadata": {},
+ "source": [
+ "NeMo DataStore is launched as one of the services. We'll use it to manage storage, so set the following:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "016213ab",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "datastore_config = {\n",
+ " \"endpoint\": \"http://localhost:3000/v1/hf\",\n",
+ " \"token\": \"\",\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2d66c819",
+ "metadata": {},
+ "source": [
+ "## 📥 Load input data\n",
+ "\n",
+ "Safe Synthesizer processes your input dataset and returns the same rows with PII replaced. For this tutorial we load a small public sample dataset. Replace it with your own data if desired.\n",
+ "\n",
+ "The dolly dataset is an open source dataset of instruction-following records. Each record contains (1) a free text prompt that could be sent to an LLM, (2) a context descriptions to help the LLM determine the answer, (3) a response that could come from the LLM, and (4) the instruction category such as classification, open QA, closed QA, information extraction, and brainstorming. The text in each of the first three fields sometimes contains Personally Identifiable Information, such as names, birth dates, and locations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7204f213",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "df = pd.read_json(\n",
+ " \"hf://datasets/databricks/databricks-dolly-15k/databricks-dolly-15k.jsonl\",\n",
+ " lines=True,\n",
+ ")\n",
+ "print(df.head())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "87d72c68",
+ "metadata": {},
+ "source": [
+ "## 🏗️ Create a Safe Synthesizer job\n",
+ "\n",
+ "The `SafeSynthesizerBuilder` provides a fluent interface to configure and submit jobs.\n",
+ "\n",
+ "This job will:\n",
+ "- Initialize the builder with the NeMo Microservices client.\n",
+ "- Use the loaded DataFrame as the input data source.\n",
+ "- Configure the job to use the specified datastore for model storage.\n",
+ "- Enable automatic replacement of personally identifiable information (PII).\n",
+ "- Submit the job to the microservices platform."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "85d9de56",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "job = (\n",
+ " SafeSynthesizerBuilder(client)\n",
+ " .from_data_source(df)\n",
+ " .with_datastore(datastore_config)\n",
+ " .with_replace_pii()\n",
+ " .create_job()\n",
+ ")\n",
+ "\n",
+ "print(f\"job_id = {job.job_id}\")\n",
+ "job.wait_for_completion()\n",
+ "\n",
+ "print(f\"Job finished with status {job.fetch_status()}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fa2eacb2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# If your notebook shuts down, it's okay, your job is still running on the microservices platform.\n",
+ "# You can get the same job object and interact with it again by uncommenting the following code\n",
+ "# snippet, and modifying it with the job id from the previous cell output.\n",
+ "\n",
+ "# from nemo_microservices.beta.safe_synthesizer.sdk.job import SafeSynthesizerJob\n",
+ "# job = SafeSynthesizerJob(job_id=\"\", client=client)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "285d4a9d",
+ "metadata": {},
+ "source": [
+ "## 👀 View output data\n",
+ "\n",
+ "After the job completes, fetch the output with PII replaced."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7f25574a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Fetch the job output data with PII replaced\n",
+ "output_df = job.fetch_data()\n",
+ "output_df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "571efc39",
+ "metadata": {},
+ "source": [
+ "## 📊 View PII report\n",
+ "\n",
+ "A report summarizing the PII replacement is created automatically for every job.\n",
+ "\n",
+ "You can download the full HTML report or display it inline below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bba96175",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Download the full evaluation report to your local machine\n",
+ "job.save_report(\"evaluation_report.html\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "45f7e22b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Fetch and display the full evaluation report inline\n",
+ "job.display_report_in_notebook()"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "kendrickb-notebooks",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/nemo/NeMo-Safe-Synthesizer/intro/safe_synthesizer_101.ipynb b/nemo/NeMo-Safe-Synthesizer/intro/safe_synthesizer_101.ipynb
new file mode 100644
index 00000000..e52612d1
--- /dev/null
+++ b/nemo/NeMo-Safe-Synthesizer/intro/safe_synthesizer_101.ipynb
@@ -0,0 +1,281 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "630e3e17",
+ "metadata": {},
+ "source": [
+ "# 🎛️ NeMo Safe Synthesizer 101: The Basics\n",
+ "\n",
+ "> ⚠️ **Warning**: NeMo Safe Synthesizer is in Early Access and not recommended for production use.\n",
+ "\n",
+ "
\n",
+ "\n",
+ "In this notebook, we demonstrate how to create a synthetic version of a tabular dataset using the NeMo Microservices Python SDK. The notebook should take about 20 minutes to run.\n",
+ "\n",
+ "After completing this notebook, you'll be able to:\n",
+ "- Use the NeMo Microservices SDK to interact with Safe Synthesizer\n",
+ "- Create novel synthetic data that follows the statistical properties of your input dataset\n",
+ "- Access an evaluation report on synthetic data quality and privacy\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8be84f5d",
+ "metadata": {},
+ "source": [
+ "#### 💾 Install dependencies\n",
+ "\n",
+ "**IMPORTANT** 👉 Ensure you have a NeMo Microservices Platform deployment available. Follow the quickstart or Helm chart instructions in your environment's setup guide. You may need to restart your kernel after installing dependencies.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9f5d6f5a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "from nemo_microservices import NeMoMicroservices\n",
+ "from nemo_microservices.beta.safe_synthesizer.builder import SafeSynthesizerBuilder\n",
+ "\n",
+ "import logging\n",
+ "logging.basicConfig(level=logging.WARNING)\n",
+ "logging.getLogger(\"httpx\").setLevel(logging.WARNING)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "53bb2807",
+ "metadata": {},
+ "source": [
+ "### ⚙️ Initialize the NeMo Safe Synthesizer Client\n",
+ "\n",
+ "- The Python SDK provides a wrapper around the NeMo Microservices Platform APIs.\n",
+ "- `http://localhost:8080` is the default url for the client's `base_url` in the quickstart.\n",
+ "- If using a managed or remote deployment, ensure correct base URLs and tokens.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8c15ab93",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "client = NeMoMicroservices(\n",
+ " base_url=\"http://localhost:8080\",\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "74d72ef7",
+ "metadata": {},
+ "source": [
+ "NeMo DataStore is launched as one of the services, and we'll use it to manage our storage. so we'll set the following:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ab037a3a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "datastore_config = {\n",
+ " \"endpoint\": \"http://localhost:3000/v1/hf\",\n",
+ " \"token\": \"\",\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2d66c819",
+ "metadata": {},
+ "source": [
+ "## 📥 Load input data\n",
+ "\n",
+ "Safe Synthesizer learns the patterns and correlations in your input dataset to produce synthetic data with similar properties. For this tutorial, we will use a small public sample dataset. Replace it with your own data if desired.\n",
+ "\n",
+ "The sample dataset used here is a set of women's clothing reviews, including age, product category, rating, and review text. Some of the reviews contain Personally Identifiable Information (PII), such as height, weight, age, and location."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "daa955b6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install kagglehub || uv pip install kagglehub"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7204f213",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import kagglehub\n",
+ "import pandas as pd\n",
+ "\n",
+ "# Download latest version\n",
+ "path = kagglehub.dataset_download(\"nicapotato/womens-ecommerce-clothing-reviews\")\n",
+ "df = pd.read_csv(f\"{path}/Womens Clothing E-Commerce Reviews.csv\", index_col=0)\n",
+ "df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "87d72c68",
+ "metadata": {},
+ "source": [
+ "## 🏗️ Create a Safe Synthesizer job\n",
+ "\n",
+ "The `SafeSynthesizerBuilder` provides a fluent interface to configure and submit jobs.\n",
+ "\n",
+ "The following code creates and submits a job:\n",
+ "- `SafeSynthesizerBuilder(client)`: initialize with the NeMo Microservices client.\n",
+ "- `.from_data_source(df)`: set the input data source.\n",
+ "- `.with_datastore(datastore_config)`: configure model artifact storage.\n",
+ "- `.with_replace_pii()`: enable automatic replacement of PII.\n",
+ "- `.synthesize()`: train and generate synthetic data.\n",
+ "- `.create_job()`: submit the job to the platform.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "85d9de56",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "job = (\n",
+ " SafeSynthesizerBuilder(client)\n",
+ " .from_data_source(df)\n",
+ " .with_datastore(datastore_config)\n",
+ " .with_replace_pii()\n",
+ " .synthesize()\n",
+ " .create_job()\n",
+ ")\n",
+ "\n",
+ "print(f\"job_id = {job.job_id}\")\n",
+ "job.wait_for_completion()\n",
+ "\n",
+ "print(f\"Job finished with status {job.fetch_status()}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fa2eacb2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# If your notebook shuts down, it's okay, your job is still running on the microservices platform.\n",
+ "# You can get the same job object and interact with it again by uncommenting the following code\n",
+ "# snippet, and modifying it with the job id from the previous cell output.\n",
+ "\n",
+ "# from nemo_microservices.beta.safe_synthesizer.sdk.job import SafeSynthesizerJob\n",
+ "# job = SafeSynthesizerJob(job_id=\"\", client=client)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "285d4a9d",
+ "metadata": {},
+ "source": [
+ "## 👀 View synthetic data\n",
+ "\n",
+ "After the job completes, fetch the generated synthetic dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7f25574a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Fetch the synthetic data created by the job\n",
+ "synthetic_df = job.fetch_data()\n",
+ "synthetic_df\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2b25f152",
+ "metadata": {},
+ "source": [
+ "## 📊 View evaluation report\n",
+ "\n",
+ "An evaluation comparing the synthetic data to the input data is performed automatically. You can:\n",
+ "\n",
+ "- **Inspect key scores**: overall synthetic data quality and privacy.\n",
+ "- **Download the full HTML report**: includes charts and detailed metrics.\n",
+ "- **Display the report inline**: useful when viewing in notebook environments.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7b691127",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Print selected information from the job summary\n",
+ "summary = job.fetch_summary()\n",
+ "print(\n",
+ " f\"Synthetic data quality score (0-10, higher is better): {summary.synthetic_data_quality_score}\"\n",
+ ")\n",
+ "print(f\"Data privacy score (0-10, higher is better): {summary.data_privacy_score}\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "39e62ea9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Download the full evaluation report to your local machine\n",
+ "job.save_report(\"evaluation_report.html\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "45f7e22b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Fetch and display the full evaluation report inline\n",
+ "job.display_report_in_notebook()"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "kendrickb-notebooks",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}