From d25c71512c8ef48314aa06c412221839bd2660c4 Mon Sep 17 00:00:00 2001 From: Gaurav Rajguru Date: Thu, 2 May 2024 11:52:29 +0530 Subject: [PATCH] Dreambooth text to image finetuning notebook examples (#3148) * db finetuning notebook examples * notebbok draft * notebbok draft * notebbok draft * sanity * rename example file; updates * SD DB FT Update --------- Co-authored-by: grajguru Co-authored-by: Rupal Jain --- ...ffusers-dreambooth-dog-text-to-image.ipynb | 825 ++++++++++++++++++ 1 file changed, 825 insertions(+) create mode 100644 sdk/python/foundation-models/system/finetune/text-to-image/diffusers-dreambooth-dog-text-to-image.ipynb diff --git a/sdk/python/foundation-models/system/finetune/text-to-image/diffusers-dreambooth-dog-text-to-image.ipynb b/sdk/python/foundation-models/system/finetune/text-to-image/diffusers-dreambooth-dog-text-to-image.ipynb new file mode 100644 index 0000000000..03ff4ceb55 --- /dev/null +++ b/sdk/python/foundation-models/system/finetune/text-to-image/diffusers-dreambooth-dog-text-to-image.ipynb @@ -0,0 +1,825 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Text-to-image dreambooth finetuning using transformers specific pipeline component\n", + "\n", + "DreamBooth is a method for personalizing text-to-image models. It fine-tunes these models using 5-10 images of a specific subject, allowing them to generate personalized images based on textual prompts.\n", + "\n", + "This sample shows how to use `diffusers_text_to_image_dreambooth_finetuning_pipeline` component from the `azureml` system registry to fine tune a model for text to image task using dog Dataset. We then deploy the fine tuned model to an online endpoint for real time inference.\n", + "\n", + "### Training data\n", + "We will use the [dog-example](https://huggingface.co/datasets/diffusers/dog-example) dataset.\n", + "\n", + "### Model\n", + "We will use the `stabilityai-stable-diffusion-2-1` model in this notebook. If you need to fine tune a model that is available on HuggingFace, but not available in `azureml` system registry, you can either register the model and use the registered model or use the `model_name` parameter to instruct the components to pull the model directly from HuggingFace.\n", + "\n", + "### Outline\n", + "1. Install dependencies\n", + "2. Setup pre-requisites such as compute\n", + "3. Pick a model to fine tune\n", + "4. Prepare dataset for finetuning the model\n", + "5. Submit the fine tuning job using diffusers specific text-to-image dreambooth fine tuning component\n", + "6. Review training metrics\n", + "7. Register the fine tuned model\n", + "8. Deploy the fine tuned model for real time inference\n", + "9. Test deployed end point\n", + "9. Clean up resources" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Install dependencies\n", + "Before starting off, if you are running the notebook on Azure Machine Learning Studio or running first time locally, you will need the following packages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "! pip install azure-ai-ml==1.8.0\n", + "! pip install azure-identity==1.13.0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Setup pre-requisites" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 2.1 Connect to Azure Machine Learning workspace\n", + "\n", + "Before we dive in the code, you'll need to connect to your workspace. The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning.\n", + "\n", + "We are using `DefaultAzureCredential` to get access to workspace. `DefaultAzureCredential` should be capable of handling most scenarios. If you want to learn more about other available credentials, go to [set up authentication doc](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk), [azure-identity reference doc](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).\n", + "\n", + "Replace `AML_WORKSPACE_NAME`, `RESOURCE_GROUP` and `SUBSCRIPTION_ID` with their respective values in the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.ml import MLClient\n", + "from azure.identity import DefaultAzureCredential\n", + "\n", + "\n", + "experiment_name = \"AzureML-Train-Finetune-MultiModal-TextToImage-DreamBooth-Samples\" # can rename to any valid name\n", + "\n", + "credential = DefaultAzureCredential()\n", + "workspace_ml_client = None\n", + "try:\n", + " workspace_ml_client = MLClient.from_config(credential)\n", + " subscription_id = workspace_ml_client.subscription_id\n", + " resource_group = workspace_ml_client.resource_group_name\n", + " workspace_name = workspace_ml_client.workspace_name\n", + "except Exception as ex:\n", + " print(ex)\n", + " # Enter details of your AML workspace\n", + " subscription_id = \"SUBSCRIPTION_ID\"\n", + " resource_group = \"RESOURCE_GROUP\"\n", + " workspace_name = \"AML_WORKSPACE_NAME\"\n", + "\n", + "workspace_ml_client = MLClient(credential, subscription_id, resource_group, workspace_name)\n", + "registry_ml_client = MLClient(\n", + " credential,\n", + " subscription_id,\n", + " resource_group,\n", + " registry_name=\"azureml\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 2.2 Create compute\n", + "\n", + "In order to finetune a model on Azure Machine Learning studio, you will need to create a compute resource first. **Creating a compute will take 3-4 minutes.** \n", + "\n", + "For additional references, see [Azure Machine Learning in a Day](https://github.com/Azure/azureml-examples/blob/main/tutorials/azureml-in-a-day/azureml-in-a-day.ipynb). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.ml.entities import AmlCompute\n", + "from azure.core.exceptions import ResourceNotFoundError\n", + "\n", + "cluster_name = \"sample-finetune-cluster-gpu\"\n", + "\n", + "try:\n", + " _ = workspace_ml_client.compute.get(cluster_name)\n", + " print(\"Found existing compute target.\")\n", + "except ResourceNotFoundError:\n", + " print(\"Creating a new compute target...\")\n", + " compute_config = AmlCompute(\n", + " name=cluster_name,\n", + " type=\"amlcompute\",\n", + " size=\"Standard_NC6s_v3\",\n", + " idle_time_before_scale_down=120,\n", + " min_instances=0,\n", + " max_instances=4,\n", + " )\n", + " workspace_ml_client.begin_create_or_update(compute_config).result()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. Pick a foundation model to fine tune\n", + "\n", + "We will use the `stabilityai-stable-diffusion-2-1` model in this notebook. If you need to fine tune a model that is available on HuggingFace, but not available in `azureml` system registry, you can either register the model and use the registered model or provide huggingface model_id in the `model_name` parameter to instruct the components to pull the model directly from HuggingFace.\n", + "\n", + "Currently following models are supported:\n", + "\n", + "| Model Name | Source |\n", + "| ------ | ---------- |\n", + "| [runwayml-stable-diffusion-v1-5](https://ml.azure.com/registries/azureml/models/runwayml-stable-diffusion-v1-5/version/8) | azureml registry |\n", + "| [stabilityai-stable-diffusion-2-1](https://ml.azure.com/registries/azureml/models/stabilityai-stable-diffusion-2-1/version/8) | azureml registry |\n", + "| [compvis-stable-diffusion-v1-4](https://ml.azure.com/registries/azureml/models/compvis-stable-diffusion-v1-4/version/8) | azureml registry |\n", + "| [deci-decidiffusion-v1-0](https://ml.azure.com/registries/azureml/models/deci-decidiffusion-v1-0/version/4) | azureml registry |\n", + "| [Text to Image models from Huggingface's Transformer library](https://huggingface.co/models?pipeline_tag=text-to-image&library=transformers)| HuggingFace |" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "huggingface_model_name = \"stabilityai/stable-diffusion-2-1\"\n", + "\n", + "aml_registry_model_name = \"stabilityai-stable-diffusion-2-1\"\n", + "foundation_models = registry_ml_client.models.list(aml_registry_model_name)\n", + "foundation_model = max(foundation_models, key=lambda x: int(x.version))\n", + "print(\n", + " f\"\\n\\nUsing model name: {foundation_model.name}, version: {foundation_model.version}, id: {foundation_model.id} for fine tuning\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. Prepare the dataset for fine-tuning the model\n", + "\n", + "We will use the [dog-example](https://huggingface.co/datasets/diffusers/dog-example) dataset. It consists of 5 dog images." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 4.1 Download the Data\n", + "\n", + "For dreambooth training, we need few images in the folder. We will download the dog-example dataset locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "dataset_dir = \"dog-example\"\n", + "if os.path.isdir(dataset_dir) == False:\n", + " !git clone https://huggingface.co/datasets/diffusers/dog-example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from IPython.display import Image\n", + "\n", + "files = os.listdir(dataset_dir)\n", + "image_file = [file for file in files if file.endswith((\".jpg\", \".jpeg\", \".png\"))][0]\n", + "sample_image = os.path.join(dataset_dir, image_file)\n", + "Image(filename=sample_image, width=400, height=400)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 4.2 Upload the images to Datastore through an AML Data asset (URI Folder)\n", + "\n", + "In order to use the data for training in Azure ML, we upload it to our default Azure Blob Storage of our Azure ML Workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Uploading image files by creating a 'data asset URI FOLDER':\n", + "\n", + "from azure.ai.ml.entities import Data\n", + "from azure.ai.ml.constants import AssetTypes\n", + "\n", + "instance_data = Data(\n", + " path=dataset_dir,\n", + " type=AssetTypes.URI_FOLDER,\n", + " description=\"Dog images for text to image dreambooth training\",\n", + " name=\"dog-images-text-to-image\",\n", + ")\n", + "\n", + "instance_data_uri_folder = workspace_ml_client.data.create_or_update(instance_data)\n", + "\n", + "print(instance_data_uri_folder)\n", + "print(\"\")\n", + "print(\"Path to folder in Blob Storage:\")\n", + "print(instance_data_uri_folder.path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 5. Submit the fine tuning job using `diffusers_text_to_image_dreambooth_finetuning_pipeline` component\n", + " \n", + "Create the job that uses the `diffusers_text_to_image_dreambooth_finetuning_pipeline` component for `stable-diffusion-text-to-image` task. Learn more in 5.2 about all the parameters supported for fine tuning." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 5.1 Get pipeline component" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "FINETUNE_PIPELINE_COMPONENT_NAME = \"diffusers_text_to_image_dreambooth_finetuning_pipeline\"\n", + "\n", + "pipeline_component_transformers_func = registry_ml_client.components.get(\n", + " name=FINETUNE_PIPELINE_COMPONENT_NAME, label=\"latest\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 5.2 Create arguments to be passed to `diffusers_text_to_image_dreambooth_finetuning_pipeline` component\n", + "\n", + "The `diffusers_text_to_image_dreambooth_finetuning_pipeline` component consists of model selection and finetuning components.\n", + "\n", + "*Uncomment one or more parameters below to provide specific values, if you wish you override the autoselected default values. Please visit this pipeline component in Azure ML studio to know more about arguments and their allowed values.*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_component_args = {\n", + " # # Model import args\n", + " \"model_family\": \"HuggingFaceImage\",\n", + " \"download_from_source\": False, # True for downloading a model directly from HuggingFace\n", + " \"mlflow_model\": foundation_model.id, # foundation_model.id is provided, only foundation_model gives UserErrorException: only path input is supported now but get: ...\n", + " # \"model_name\": huggingface_model_name, # specify the model_name instead of mlflow_model if you want to use a model from the huggingface hub\n", + "\n", + " # # Finetune_args\n", + " \"task_name\": \"stable-diffusion-text-to-image\",\n", + " # # Instance prompt - The instance_prompt argument is a text prompt that contains a unique identifier, such as sks, and the class the image belongs to.\n", + " \"instance_prompt\": \"\\\"A photo of a sks dog\\\"\", # Please note that we need to escape the double inverted comma.\n", + " \"resolution\": 512,\n", + "\n", + " # # Prior preservation loss\n", + " \"with_prior_preservation\": True,\n", + " # Class prompt - a prompt without the unique identifier. This prompt is used for generating \"class images\" for prior preservation\n", + " \"class_prompt\": \"\\\"a photo of dog\\\"\", # Please note that we need to escape the double inverted comma.\n", + " \"num_class_images\": 100, # Number of images to generate with the class prompt for prior preservation.\n", + " \"class_data_dir\": None, # Specify Datastore URI of existing uri_folder containing class images if you have, and the training job will generate any additional images so that num_class_images are present in class_data_dir during training time.\n", + " \"prior_generation_precision\": \"fp32\",\n", + " \"prior_loss_weight\": 1.0,\n", + " \"sample_batch_size\": 2, # Number of samples to generate class images in each batch.\n", + "\n", + " # # Lora parameters\n", + " # # LoRA reduces the number of trainable parameters by learning pairs of rank-decompostion matrices while freezing the original weights. This vastly reduces the storage requirement for large models adapted to specific tasks and enables efficient task-switching during deployment all without introducing inference latency. LoRA also outperforms several other adaptation methods including adapter, prefix-tuning, and fine-tuning.\n", + " \"apply_lora\": True,\n", + " # \"lora_alpha\": 128,\n", + " # \"lora_r\": 16,\n", + " # \"lora_dropout\": 0.0,\n", + " # \"tokenizer_max_length\": 77,\n", + "\n", + " # # Text Encoder\n", + " \"pre_compute_text_embeddings\": True,\n", + " \"train_text_encoder\": False,\n", + " # \"text_encoder_type\": \"CLIPTextModel\",\n", + " # \"text_encoder_name\": \"openai/clip-vit-base-patch32\", # Huggingface id of text encoder.\n", + " # \"text_encoder_use_attention_mask\": False,\n", + "\n", + " # # UNET related\n", + " # \"class_labels_conditioning\": \"timesteps\",\n", + "\n", + " # # Noise Scheduler\n", + " \"noise_scheduler_name\": \"DDPMScheduler\", # Optional, default is used from the base model. If following scheduler related parameters are not provided, it is taken from model's scheduler config.\n", + " # \"noise_scheduler_num_train_timesteps\": 1000,\n", + " # \"noise_scheduler_variance_type\": \"fixed_small\",\n", + " # \"noise_scheduler_prediction_type\": \"epsilon\",\n", + " # \"noise_scheduler_timestep_spacing\": \"leading\",\n", + " # \"extra_noise_scheduler_args\": \"clip_sample_range=1.0; clip_sample=True\" # Optional additional arguments that are supplied to noise scheduler. The arguments should be semi-colon separated key value pairs and should be enclosed in double quotes.\n", + " # \"offset_noise\": False\n", + "\n", + " # # Training related\n", + " \"num_validation_images\": 3, # Number of images to generate using instance_prompt. Images are stored in the output/checkpoint-* directories. Please note that this will increase the training time.\n", + " \"number_of_workers\": 3,\n", + " \"number_of_epochs\": 15,\n", + " \"max_steps\": -1,\n", + " \"training_batch_size\": 3,\n", + " \"auto_find_batch_size\": False,\n", + " \"learning_rate\": 1e-4, # Learning rate is recommended to be set to a lower value, if not fine-tuning with Lora\n", + " # \"learning_rate_scheduler\": \"warmup_linear\",\n", + " # \"warmup_steps\": 0,\n", + " # \"optimizer\": \"adamw_hf\",\n", + " # \"weight_decay\": 0.0,\n", + " # \"gradient_accumulation_step\": 1,\n", + " # \"max_grad_norm\": 1.0,\n", + " \"precision\": \"32\",\n", + " \"random_seed\": 42,\n", + " \"logging_strategy\": \"epoch\",\n", + " # \"logging_steps\": 500, # Number of update steps between two logs if logging_strategy='steps'.\n", + " \"save_total_limit\": -1, # If you face issues related to disk space, you can limit the number of checkpoints saved.\n", + " \"save_as_mlflow_model\": True,\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Ensure that the user provides only one of mlflow_model or model_name\n", + "if pipeline_component_args.get(\"mlflow_model\") is None and pipeline_component_args.get(\"model_name\") is None:\n", + " raise ValueError(\"You must specify either mlflow_model or model_name for the model to finetune\")\n", + "if pipeline_component_args.get(\"mlflow_model\") is not None and pipeline_component_args.get(\"model_name\") is not None:\n", + " raise ValueError(\"You must specify ONLY one of mlflow_model and model_name for the model to finetune\")\n", + "elif pipeline_component_args.get(\"mlflow_model\") is None and pipeline_component_args.get(\"model_name\") is not None:\n", + " use_model_name = huggingface_model_name\n", + "elif pipeline_component_args.get(\"mlflow_model\") is not None and pipeline_component_args.get(\"model_name\") is None:\n", + " use_model_name = aml_registry_model_name\n", + "print(f\"Finetuning model {use_model_name}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 5.3 Utility function to create pipeline using `transformers_text_to_image_dreambooth_finetuning_pipeline` component" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.ml.dsl import pipeline\n", + "from azure.ai.ml.entities import PipelineComponent\n", + "from azure.ai.ml import Input\n", + "from azure.ai.ml.constants import AssetTypes\n", + "\n", + "\n", + "@pipeline()\n", + "def create_pipeline_transformers():\n", + " \"\"\"Create pipeline.\"\"\"\n", + "\n", + " diffusers_pipeline_component: PipelineComponent = pipeline_component_transformers_func(\n", + " compute_model_import=cluster_name,\n", + " compute_finetune=cluster_name,\n", + " instance_data_dir=Input(type=AssetTypes.URI_FOLDER, path=instance_data_uri_folder.path),\n", + " **pipeline_component_args,\n", + " )\n", + " return {\n", + " # Map the output of the fine tuning job to the output of pipeline job so that we can easily register the fine tuned model. Registering the model is required to deploy the model to an online or batch endpoint.\n", + " \"trained_model\": diffusers_pipeline_component.outputs.mlflow_model_folder,\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 5.4 Run the fine tuning job" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "diffusers_pipeline_object = create_pipeline_transformers()\n", + "\n", + "# don't use cached results from previous jobs\n", + "diffusers_pipeline_object.settings.force_rerun = True\n", + "\n", + "# set continue on step failure to False\n", + "diffusers_pipeline_object.settings.continue_on_step_failure = False\n", + "\n", + "\n", + "diffusers_pipeline_object.display_name = (\n", + " use_model_name + \"_diffusers_pipeline_component_run_\" + \"text2image\"\n", + ")\n", + "# Don't use cached results from previous jobs\n", + "diffusers_pipeline_object.settings.force_rerun = True\n", + "\n", + "print(\"Submitting pipeline\")\n", + "\n", + "diffusers_pipeline_run = workspace_ml_client.jobs.create_or_update(\n", + " diffusers_pipeline_object, experiment_name=experiment_name\n", + ")\n", + "\n", + "print(f\"Pipeline created. URL: {diffusers_pipeline_run.studio_url}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "diffusers_pipeline_run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "workspace_ml_client.jobs.stream(diffusers_pipeline_run.name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 6. Get metrics from finetune component\n", + "\n", + "The model training happens as part of the finetune component. Please follow below steps to extract validation metrics from the run." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### 6.1 Initialize MLFlow Client\n", + "\n", + "The models and artifacts that are produced by AutoML can be accessed via the MLFlow interface.\n", + "Initialize the MLFlow client here, and set the backend as Azure ML, via. the MLFlow Client.\n", + "\n", + "IMPORTANT - You need to have installed the latest MLFlow packages with:\n", + "\n", + " pip install azureml-mlflow\n", + " pip install mlflow" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import mlflow\n", + "\n", + "# Obtain the tracking URL from MLClient\n", + "MLFLOW_TRACKING_URI = workspace_ml_client.workspaces.get(\n", + " name=workspace_ml_client.workspace_name\n", + ").mlflow_tracking_uri\n", + "\n", + "print(MLFLOW_TRACKING_URI)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set the MLFLOW TRACKING URI\n", + "mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)\n", + "print(f\"\\nCurrent tracking uri: {mlflow.get_tracking_uri()}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from mlflow.tracking.client import MlflowClient\n", + "\n", + "# Initialize MLFlow client\n", + "mlflow_client = MlflowClient()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.2 Get the training run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Concat 'tags.mlflow.rootRunId=' and pipeline_job.name in single quotes as filter variable\n", + "filter = \"tags.mlflow.rootRunId='\" + diffusers_pipeline_run.name + \"'\"\n", + "runs = mlflow.search_runs(\n", + " experiment_names=[experiment_name], filter_string=filter, output_format=\"list\"\n", + ")\n", + "\n", + "# Get the training and evaluation runs.\n", + "# Using a hacky way till 'Bug 2320997: not able to show eval metrics in FT notebooks - mlflow client now showing display names' is fixed\n", + "for run in runs:\n", + " # Check if run.data.metrics.epoch exists\n", + " if \"epoch\" in run.data.metrics:\n", + " training_run = run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 6.3 Get training metrics\n", + "\n", + "Access the results (such as Models, Artifacts, Metrics) of a previously completed run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "pd.DataFrame(training_run.data.metrics, index=[0]).T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 7. Register the fine tuned model with the workspace\n", + "\n", + "We will register the model from the output of the fine tuning job. This will track lineage between the fine tuned model and the fine tuning job. The fine tuning job, further, tracks lineage to the foundation model, data and training code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "\n", + "# Generating a unique timestamp that can be used for names and versions that need to be unique\n", + "timestamp = str(int(time.time()))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.ml.entities import Model\n", + "from azure.ai.ml.constants import AssetTypes\n", + "\n", + "# Check if the `trained_model` output is available\n", + "print(f\"Pipeline job outputs: {workspace_ml_client.jobs.get(diffusers_pipeline_run.name).outputs}\")\n", + "\n", + "model_path_from_job = f\"azureml://jobs/{diffusers_pipeline_run.name}/outputs/trained_model\"\n", + "print(f\"Path to register model: {model_path_from_job}\")\n", + "\n", + "finetuned_model_name = f\"{use_model_name.replace('/', '-')}-dog-text-to-image\"\n", + "finetuned_model_description = f\"{use_model_name.replace('/', '-')} fine tuned model for dog objects text to image\"\n", + "prepare_to_register_model = Model(\n", + " path=model_path_from_job,\n", + " type=AssetTypes.MLFLOW_MODEL,\n", + " name=finetuned_model_name,\n", + " version=timestamp, # use timestamp as version to avoid version conflict\n", + " description=finetuned_model_description,\n", + ")\n", + "print(f\"Prepare to register model: \\n{prepare_to_register_model}\")\n", + "\n", + "# Register the model from pipeline job output\n", + "registered_model = workspace_ml_client.models.create_or_update(prepare_to_register_model)\n", + "print(f\"Registered model: {registered_model}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 7. Deploy the fine tuned model to an online endpoint\n", + "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import datetime\n", + "from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment\n", + "\n", + "# Endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n", + "online_endpoint_name = \"text-to-image-dog-\" + datetime.datetime.now().strftime(\"%m%d%H%M\")\n", + "online_endpoint_description = (\n", + " f\"Online endpoint for {registered_model.name}, finetuned for dog text to image generation task.\"\n", + ")\n", + "# Create an online endpoint\n", + "endpoint = ManagedOnlineEndpoint(\n", + " name=online_endpoint_name,\n", + " description=online_endpoint_description,\n", + " auth_mode=\"key\",\n", + " tags={\"foo\": \"bar\"},\n", + ")\n", + "workspace_ml_client.begin_create_or_update(endpoint).result()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.ml.entities import OnlineRequestSettings, ProbeSettings\n", + "\n", + "deployment_name = \"text2img-dog-mlflow-deploy\"\n", + "\n", + "# Create a deployment\n", + "demo_deployment = ManagedOnlineDeployment(\n", + " name=deployment_name,\n", + " endpoint_name=online_endpoint_name,\n", + " model=registered_model.id,\n", + " instance_type=\"Standard_NC6s_v3\",\n", + " instance_count=1,\n", + " request_settings=OnlineRequestSettings(\n", + " max_concurrent_requests_per_instance=1,\n", + " request_timeout_ms=90000,\n", + " max_queue_wait_ms=500,\n", + " ),\n", + " liveness_probe=ProbeSettings(\n", + " failure_threshold=49,\n", + " success_threshold=1,\n", + " timeout=299,\n", + " period=180,\n", + " initial_delay=180,\n", + " ),\n", + " readiness_probe=ProbeSettings(\n", + " failure_threshold=10,\n", + " success_threshold=1,\n", + " timeout=10,\n", + " period=10,\n", + " initial_delay=2000,\n", + " ),\n", + ")\n", + "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n", + "endpoint.traffic = {deployment_name: 100}\n", + "workspace_ml_client.begin_create_or_update(endpoint).result()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 8. Test the endpoint with sample data\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create request json\n", + "import json\n", + "\n", + "request_json = {\n", + " \"input_data\": {\"columns\": [\"prompt\"], \"index\": [0], \"data\": [\"a photo of sks dog in a bucket\"]},\n", + " \"params\": {\n", + " \"height\": 512,\n", + " \"width\": 512,\n", + " \"num_inference_steps\": 50,\n", + " \"guidance_scale\": 7.5,\n", + " \"negative_prompt\": [\"blurry; three legs\"],\n", + " \"num_images_per_prompt\": 2,\n", + " },\n", + "}\n", + "\n", + "request_file_name = \"sample_request_data.json\"\n", + "with open(request_file_name, \"w\") as request_file:\n", + " json.dump(request_json, request_file)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "responses = workspace_ml_client.online_endpoints.invoke(\n", + " endpoint_name=online_endpoint_name,\n", + " deployment_name=deployment_name,\n", + " request_file=request_file_name,\n", + ")\n", + "responses = json.loads(responses)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Visualize detections\n", + "Now that we have scored a test image, we can visualize the prediction for this image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import base64\n", + "from io import BytesIO\n", + "from PIL import Image\n", + "for response in responses:\n", + " base64_string = response[\"generated_image\"]\n", + " image_stream = BytesIO(base64.b64decode(base64_string))\n", + " image = Image.open(image_stream)\n", + " display(image)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 9. Clean up resources - delete the online endpoint\n", + "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}