Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,7 @@
"source": [
"# 🎨 NeMo Data Designer 101: The Basics\n",
"\n",
"> ⚠️ **Warning**: NeMo Data Designer is current in Early Release and is not recommended for production use.\n",
"\n",
"<br>\n",
"\n",
"In this notebook, we will demonstrate the basics of Data Designer by generating a simple product review dataset."
"In this notebook, we will demonstrate the basics of Data Designer by generating a simple product review dataset.\n"
]
},
{
Expand All @@ -24,7 +20,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -41,7 +37,7 @@
" SubcategorySamplerParams,\n",
" UniformSamplerParams,\n",
" ModelConfig,\n",
" InferenceParameters\n",
" InferenceParameters,\n",
")\n"
]
},
Expand All @@ -55,9 +51,9 @@
"- In this notebook, we connect to the [managed service of data designer](https://build.nvidia.com/nemo/data-designer). Alternatively, you can connect to your own instance of data designer by following the deployment instructions [here](https://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-microservices/data-designer/docker-compose.html).\n",
"- If you have an instance of data designer running locally, you can connect to it as follows\n",
"\n",
" ```python\n",
" data_designer_client = DataDesignerClient(client=NeMoMicroservices(base_url=\"http://localhost:8080\"))\n",
" ```\n"
" ```python\n",
" data_designer_client = DataDesignerClient(client=NeMoMicroservices(base_url=\"http://localhost:8080\"))\n",
" ```\n"
]
},
{
Expand All @@ -83,7 +79,7 @@
"source": [
"data_designer_client = NeMoDataDesignerClient(\n",
" base_url=\"https://ai.api.nvidia.com/v1/nemo/dd\",\n",
" default_headers={\"Authorization\": f\"Bearer {api_key}\"} # auto-generated API KEY\n",
" default_headers={\"Authorization\": f\"Bearer {api_key}\"}, # auto-generated API KEY\n",
")\n"
]
},
Expand All @@ -106,10 +102,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: \n",
"**Note**:\n",
"The NeMo Data Designer Managed service has models available for you to use as well. You can use these models by referencing the appropriate model_alias for them.\n",
"\n",
"Please visit https://build.nvidia.com/nemo/data-designer to see the full list of models and their model aliases."
"Please visit https://build.nvidia.com/nemo/data-designer to see the full list of models and their model aliases.\n"
]
},
{
Expand Down Expand Up @@ -138,7 +134,7 @@
" max_tokens=1024,\n",
" temperature=0.6,\n",
" top_p=0.95,\n",
" )\n",
" ),\n",
" ),\n",
" ]\n",
")\n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,13 @@
"source": [
"# 🎨 NeMo Data Designer 101: Structured Outputs and Jinja Expressions\n",
"\n",
"> ⚠️ **Warning**: NeMo Data Designer is current in Early Release and is not recommended for production use.\n",
">\n",
"> **Note**: In order to run this notebook, you must have the NeMo Data Designer microservice deployed locally via docker compose. See the [deployment guide](http://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-microservices/data-designer/docker-compose.html) for more details.\n",
"\n",
"<br>\n",
"\n",
"In this notebook, we will continue our exploration of Data Designer, demonstrating more advanced data generation using structured outputs and Jinja expressions.\n",
"\n",
"If this is your first time using Data Designer, we recommend starting with the [first notebook](./1-the-basics.ipynb) in this 101 series."
"If this is your first time using Data Designer, we recommend starting with the [first notebook](./1-the-basics.ipynb) in this 101 series.\n"
]
},
{
Expand Down Expand Up @@ -57,9 +55,9 @@
"- In this notebook, we connect to the [managed service of data designer](https://build.nvidia.com/nemo/data-designer). Alternatively, you can connect to your own instance of data designer by following the deployment instructions [here](https://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-microservices/data-designer/docker-compose.html).\n",
"- If you have an instance of data designer running locally, you can connect to it as follows\n",
"\n",
" ```python\n",
" data_designer_client = DataDesignerClient(client=NeMoMicroservices(base_url=\"http://localhost:8080\"))\n",
" ```\n"
" ```python\n",
" data_designer_client = DataDesignerClient(client=NeMoMicroservices(base_url=\"http://localhost:8080\"))\n",
" ```\n"
]
},
{
Expand All @@ -85,7 +83,7 @@
"source": [
"data_designer_client = NeMoDataDesignerClient(\n",
" base_url=\"https://ai.api.nvidia.com/v1/nemo/dd\",\n",
" default_headers={\"Authorization\": f\"Bearer {api_key}\"} # auto-generated API KEY\n",
" default_headers={\"Authorization\": f\"Bearer {api_key}\"}, # auto-generated API KEY\n",
")\n"
]
},
Expand All @@ -108,10 +106,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: \n",
"**Note**:\n",
"The NeMo Data Designer Managed service has models available for you to use as well. You can use these models by referencing the appropriate model_alias for them.\n",
"\n",
"Please visit https://build.nvidia.com/nemo/data-designer to see the full list of models and their model aliases."
"Please visit https://build.nvidia.com/nemo/data-designer to see the full list of models and their model aliases.\n"
]
},
{
Expand Down Expand Up @@ -282,11 +280,11 @@
" sampler_type=SamplerType.CATEGORY,\n",
" params=CategorySamplerParams(\n",
" values=[\"rambling\", \"brief\", \"detailed\", \"structured with bullet points\"],\n",
" weights=[1, 2, 2, 1]\n",
" weights=[1, 2, 2, 1],\n",
" ),\n",
" conditional_params={\n",
" \"target_age_range == '18-25'\": CategorySamplerParams(values=[\"rambling\"]),\n",
" }\n",
" },\n",
" )\n",
")\n",
"\n",
Expand Down Expand Up @@ -402,8 +400,7 @@
"\n",
"- [Seeding synthetic data generation with an external dataset](./3-seeding-with-a-dataset.ipynb)\n",
"\n",
"- [Using Custom Model Configs](./4-custom-model-configs.ipynb)\n",
"\n"
"- [Using Custom Model Configs](./4-custom-model-configs.ipynb)\n"
]
}
],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,13 @@
"source": [
"# 🎨 NeMo Data Designer 101: Seeding Synthetic Data Generation with an External Dataset\n",
"\n",
"> ⚠️ **Warning**: NeMo Data Designer is current in Early Release and is not recommended for production use.\n",
">\n",
"> **Note**: In order to run this notebook, you must have the NeMo Data Designer microservice deployed locally via docker compose. See the [deployment guide](http://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-microservices/data-designer/docker-compose.html) for more details.\n",
"\n",
"<br>\n",
"\n",
"In this notebook, we will demonstrate how to seed synthetic data generation in Data Designer with an external dataset.\n",
"\n",
"If this is your first time using Data Designer, we recommend starting with the [first notebook](./1-the-basics.ipynb) in this 101 series."
"If this is your first time using Data Designer, we recommend starting with the [first notebook](./1-the-basics.ipynb) in this 101 series.\n"
]
},
{
Expand Down Expand Up @@ -51,9 +49,9 @@
"- In this notebook, we connect to the [managed service of data designer](https://build.nvidia.com/nemo/data-designer). Alternatively, you can connect to your own instance of data designer by following the deployment instructions [here](https://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-microservices/data-designer/docker-compose.html).\n",
"- If you have an instance of data designer running locally, you can connect to it as follows\n",
"\n",
" ```python\n",
" data_designer_client = DataDesignerClient(client=NeMoMicroservices(base_url=\"http://localhost:8080\"))\n",
" ```\n"
" ```python\n",
" data_designer_client = DataDesignerClient(client=NeMoMicroservices(base_url=\"http://localhost:8080\"))\n",
" ```\n"
]
},
{
Expand All @@ -79,7 +77,7 @@
"source": [
"data_designer_client = NeMoDataDesignerClient(\n",
" base_url=\"https://ai.api.nvidia.com/v1/nemo/dd\",\n",
" default_headers={\"Authorization\": f\"Bearer {api_key}\"} # auto-generated API KEY\n",
" default_headers={\"Authorization\": f\"Bearer {api_key}\"}, # auto-generated API KEY\n",
")\n"
]
},
Expand All @@ -102,10 +100,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: \n",
"**Note**:\n",
"The NeMo Data Designer Managed service has models available for you to use as well. You can use these models by referencing the appropriate model_alias for them.\n",
"\n",
"Please visit https://build.nvidia.com/nemo/data-designer to see the full list of models and their model aliases."
"Please visit https://build.nvidia.com/nemo/data-designer to see the full list of models and their model aliases.\n"
]
},
{
Expand Down Expand Up @@ -138,7 +136,7 @@
"\n",
"- In this dataset, the `input_text` represents the `patient_summary` and the `output_text` represents the `diagnosis`\n",
"\n",
"**Note**: At this time, we only support using a single file as the seed. If you have multiple files you would like to use as seeds, it is recommended you consolidated these into a single file. \n"
"**Note**: At this time, we only support using a single file as the seed. If you have multiple files you would like to use as seeds, it is recommended you consolidated these into a single file.\n"
]
},
{
Expand All @@ -155,7 +153,7 @@
"config_builder.with_seed_dataset(\n",
" dataset_reference=SeedDatasetReference(\n",
" dataset=\"gretelai/symptom_to_diagnosis/train.jsonl\",\n",
" datastore_settings={\"endpoint\": \"https://huggingface.co\"}\n",
" datastore_settings={\"endpoint\": \"https://huggingface.co\"},\n",
" ),\n",
" sampling_strategy=\"shuffle\",\n",
")\n"
Expand Down
Loading