Santiagxf/aml batch fixes (#1936)

* test fixes
Azure · Dec 5, 2022 · c6b3ef0 · c6b3ef0
1 parent ec9ae58
commit c6b3ef0
Show file tree

Hide file tree

Showing 25 changed files with 77,076 additions and 75,677 deletions.
diff --git a/sdk/python/dev-requirements.txt b/sdk/python/dev-requirements.txt
@@ -8,3 +8,4 @@ matplotlib
 tensorflow
 tensorflow-hub
 transformers
+keras==2.9
diff --git a/...-summarization/code/transformer_scorer.py → ...t-text-summarization/code/batch_driver.py b/...-summarization/code/transformer_scorer.py → ...t-text-summarization/code/batch_driver.py
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-0.csv b/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-0.csv
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-1.csv b/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-1.csv
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-2.csv b/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-2.csv
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-3.csv b/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-3.csv
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-4.csv b/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-4.csv
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-5.csv b/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-5.csv
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-6.csv b/sdk/python/endpoints/batch/bart-text-summarization/data/billsum-6.csv
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/data/billsum.csv b/sdk/python/endpoints/batch/bart-text-summarization/data/billsum.csv
diff --git a/sdk/python/endpoints/batch/bart-text-summarization/environment/conda.yml b/sdk/python/endpoints/batch/bart-text-summarization/environment/conda.yml
@@ -2,10 +2,11 @@ name: huggingface-env
 channels:
   - conda-forge
 dependencies:
-  - python=3.7
+  - python=3.8
   - pip
   - pip:
     - tensorflow
+    - keras==2.9
     - transformers
     - datasets
     - azureml-core

diff --git a/sdk/python/endpoints/batch/custom-output-batch.ipynb b/sdk/python/endpoints/batch/custom-output-batch.ipynb
@@ -10,11 +10,26 @@
     }
    },
    "source": [
-    "# Batch deployments with a custom output\n",
+    "# Customize outputs in batch deployments\n",
     "\n",
     "Sometimes you need to execute inference having a higher control of what is being written as output of the batch job. Batch Deployments allow you to take control of the output of the jobs by allowing you to write directly to the output of the batch deployment job. In this tutorial, we'll see how to deploy a model to perform batch inference and writes the outputs in parquet format by appending the predictions to the original input data."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook requires:\n",
+    "\n",
+    "* `azure-ai-ml`\n",
+    "* `mlflow`\n",
+    "* `azureml-mlflow`\n",
+    "* `lightgbm==1.5.2`\n",
+    "* `numpy`\n",
+    "* `pandas`\n",
+    "* `pyarrow`"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -32,6 +47,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "from time import sleep\n",
     "from azure.ai.ml import MLClient, Input\n",
     "from azure.ai.ml.entities import (\n",
     "    BatchEndpoint,\n",
@@ -72,6 +88,7 @@
    },
    "outputs": [],
    "source": [
+    "# enter details of your AML workspace\n",
     "subscription_id = \"<SUBSCRIPTION_ID>\"\n",
     "resource_group = \"<RESOURCE_GROUP>\"\n",
     "workspace = \"<AML_WORKSPACE_NAME>\""
@@ -209,7 +226,33 @@
     "\n",
     "### 3.1 Configure the endpoint\n",
     "\n",
-    "First, let's create the endpoint that is going to host the batch deployments. Remember that each endpoint can host multiple deployments at any time."
+    "First, let's create the endpoint that is going to host the batch deployments. To ensure that our endpoint name is unique, let's create a random suffix to append to it. \n",
+    "\n",
+    "> In general, you won't need to use this technique but you will use more meaningful names. Please skip the following cell if your case:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import random\n",
+    "import string\n",
+    "\n",
+    "# Creating a unique endpoint name by including a random suffix\n",
+    "allowed_chars = string.ascii_lowercase + string.digits\n",
+    "endpoint_suffix = \"\".join(random.choice(allowed_chars) for x in range(5))\n",
+    "endpoint_name = \"heart-classifier-\" + endpoint_suffix\n",
+    "\n",
+    "print(f\"Endpoint name: {endpoint_name}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's configure the endpoint:"
    ]
   },
   {
@@ -228,15 +271,8 @@
    },
    "outputs": [],
    "source": [
-    "import random\n",
-    "import string\n",
-    "\n",
-    "# Creating a unique endpoint name by including a random suffix\n",
-    "allowed_chars = string.ascii_lowercase + string.digits\n",
-    "endpoint_suffix = \"\".join(random.choice(allowed_chars) for x in range(5))\n",
-    "\n",
     "endpoint = BatchEndpoint(\n",
-    "    name=\"heart-classifier-\" + endpoint_suffix,\n",
+    "    name=endpoint_name,\n",
     "    description=\"A heart condition classifier for batch inference\",\n",
     ")"
    ]
@@ -402,7 +438,7 @@
    "source": [
     "from time import sleep\n",
     "\n",
-    "print(\"Waiting for compute\", end=\"\")\n",
+    "print(f\"Waiting for compute {compute_name}\", end=\"\")\n",
     "while ml_client.compute.get(name=compute_name).provisioning_state == \"Creating\":\n",
     "    sleep(1)\n",
     "    print(\".\", end=\"\")\n",
@@ -517,6 +553,40 @@
     "ml_client.batch_deployments.begin_create_or_update(deployment).result()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once created, let's configure this new deployment as the default one:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = ml_client.batch_endpoints.get(endpoint_name)\n",
+    "endpoint.defaults.deployment_name = deployment.name\n",
+    "ml_client.batch_endpoints.begin_create_or_update(endpoint).result()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can see the endpoint URL as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f\"The default deployment is {endpoint.defaults.deployment_name}\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -579,6 +649,20 @@
     "Let's get a reference of the new data asset:"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f\"Waiting for data asset {dataset_name}\", end=\"\")\n",
+    "while not any(filter(lambda m: m.name == dataset_name, ml_client.data.list())):\n",
+    "    sleep(10)\n",
+    "    print(\".\", end=\"\")\n",
+    "\n",
+    "print(\" [DONE]\")"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -697,6 +781,27 @@
     "ml_client.jobs.get(job.name)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can wait for the job to finish using the following code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f\"Waiting for batch deployment job {job.name}\", end=\"\")\n",
+    "while ml_client.jobs.get(name=job.name).status not in [\"Completed\", \"Failed\"]:\n",
+    "    sleep(10)\n",
+    "    print(\".\", end=\"\")\n",
+    "\n",
+    "print(\" [DONE]\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -711,6 +816,22 @@
     "\n",
     "#### 4.7.1 Download the results\n",
     "\n",
+    "The deployment creates a child job that executes the scoring. We can get the details of it using the following code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scoring_job = list(ml_client.jobs.list(parent_job_name=job.name))[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "The outputs generated by the deployment job will be placed in an output named `score`:"
    ]
   },
@@ -730,16 +851,7 @@
    },
    "outputs": [],
    "source": [
-    "while ml_client.jobs.get(job.name).status not in [\n",
-    "    \"Completed\",\n",
-    "    \"Failed\",\n",
-    "    \"Paused\",\n",
-    "    \"NotResponding\",\n",
-    "    \"Canceled\",\n",
-    "]:\n",
-    "    sleep(10)\n",
-    "    print(\".\", end=\"\")\n",
-    "ml_client.jobs.download(name=job.name, download_path=\".\", output_name=\"score\")"
+    "ml_client.jobs.download(name=scoring_job.name, download_path=\".\", output_name=\"score\")"
    ]
   },
   {
@@ -768,20 +880,38 @@
     "import pandas as pd\n",
     "import glob\n",
     "\n",
-    "output_files = glob.glob(\"./*.parquet\")\n",
+    "output_files = glob.glob(\"named-outputs/score/*.parquet\")\n",
     "score = pd.concat((pd.read_parquet(f) for f in output_files))\n",
     "score"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Clean up resources\n",
+    "\n",
+    "Clean-up the resources created. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ml_client.batch_endpoints.begin_delete(endpoint_name)"
+   ]
   }
  ],
  "metadata": {
   "kernel_info": {
    "name": "amlv2"
   },
   "kernelspec": {
-   "display_name": "Python 3.10 - SDK V2",
+   "display_name": "Python 3.8.15 ('aml-py38')",
    "language": "python",
-   "name": "python310-sdkv2"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -793,14 +923,14 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.7"
+   "version": "3.8.15"
   },
   "nteract": {
    "version": "nteract-front-end@1.0.0"
   },
   "vscode": {
    "interpreter": {
-    "hash": "97244a77a34cba5e2b1e39283dc2fb6fff847d184db3eb48522848cc6c29e077"
+    "hash": "8d732042e5e620df2ddb4aad7f460808ed1754fa045785bdb47941e58456b253"
    }
   }
  },

diff --git a/...tch/heart-classifier-mlflow/code/score.py → ...rt-classifier-mlflow/code/batch_driver.py b/...tch/heart-classifier-mlflow/code/score.py → ...rt-classifier-mlflow/code/batch_driver.py
diff --git a/sdk/python/endpoints/batch/heart-classifier-mlflow/model/MLmodel b/sdk/python/endpoints/batch/heart-classifier-mlflow/model/MLmodel
@@ -12,11 +12,11 @@ flavors:
 model_uuid: 04bd660a1b8b4b1e84b9198c46cfd117
 run_id: 22874b7e-b069-43f0-b90c-1c57793c7854
 signature:
-  inputs: '[{"name": "age", "type": "long"}, {"name": "sex", "type": "long"}, {"name":
-    "cp", "type": "long"}, {"name": "trestbps", "type": "long"}, {"name": "chol",
-    "type": "long"}, {"name": "fbs", "type": "long"}, {"name": "restecg", "type":
-    "long"}, {"name": "thalach", "type": "long"}, {"name": "exang", "type": "long"},
-    {"name": "oldpeak", "type": "double"}, {"name": "slope", "type": "long"}, {"name":
-    "ca", "type": "long"}, {"name": "thal", "type": "string"}]'
+  inputs: '[{"name": "age", "type": "long"}, {"name": "sex", "type": "long"},
+    {"name": "cp", "type": "long"}, {"name": "trestbps", "type": "long"}, {"name":
+    "chol", "type": "long"}, {"name": "fbs", "type": "long"}, {"name": "restecg",
+    "type": "long"}, {"name": "thalach", "type": "long"}, {"name": "exang", "type":
+    "long"}, {"name": "oldpeak", "type": "double"}, {"name": "slope", "type": "long"},
+    {"name": "ca", "type": "long"}, {"name": "thal", "type": "string"}]'
   outputs: '[{"type": "long"}]'
 utc_time_created: '2022-10-13 00:55:57.543663'
diff --git a/sdk/python/endpoints/batch/heart-classifier-mlflow/model/requirements.txt b/sdk/python/endpoints/batch/heart-classifier-mlflow/model/requirements.txt
@@ -11,5 +11,5 @@ tblib==1.7.0
 toolz==0.11.2
 typing-extensions==4.1.1
 uuid==1.30
-xgboost==1.3.3
+xgboost==1.4.2
 xxhash==3.0.0