version 1.0.39

Azure · May 14, 2019 · 2d41c00 · 2d41c00
1 parent 8b1bffc
commit 2d41c00
Show file tree

Hide file tree

Showing 76 changed files with 4,428 additions and 3,717 deletions.
diff --git a/NBSETUP.md b/NBSETUP.md
@@ -24,8 +24,8 @@ pip install azureml-sdk
 git clone https://github.com/Azure/MachineLearningNotebooks.git
 
 # below steps are optional
-# install the base SDK and a Jupyter notebook server
-pip install azureml-sdk[notebooks]
+# install the base SDK, Jupyter notebook server and tensorboard
+pip install azureml-sdk[notebooks,tensorboard]
 
 # install model explainability component
 pip install azureml-sdk[explain]

diff --git a/README.md b/README.md
@@ -11,8 +11,7 @@ pip install azureml-sdk
 Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
 
 ## How to navigate and use the example notebooks?
-If you are using an Azure Machine Learning Notebook VM, you are all set.  Otherwise, go through the [Configuration](./configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. 
-It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples. 
+If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples. 
 
 If you want to...
 
@@ -21,7 +20,7 @@ If you want to...
  * ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
  * ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
  * ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
- * ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-mpi-batch-prediction.ipynb).
+ * ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
  * ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
 
 ## Tutorials
@@ -55,9 +54,5 @@ Visit following repos to see projects contributed by Azure ML users:
 
  - [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
  - [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
-
-
- ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/README.png)
 
-
-
+ ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/README.png)
diff --git a/configuration.ipynb b/configuration.ipynb
@@ -32,7 +32,6 @@
         "    1. Workspace parameters\n",
         "    1. Access your workspace\n",
         "    1. Create a new workspace\n",
-        "    1. Create compute resources\n",
         "1. [Next steps](#Next%20steps)\n",
         "\n",
         "---\n",
@@ -235,97 +234,6 @@
         "ws.write_config()"
       ]
     },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Create compute resources for your training experiments\n",
-        "\n",
-        "Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
-        "\n",
-        "To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors.  Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
-        "\n",
-        "The cluster parameters are:\n",
-        "* vm_size - this describes the virtual machine type and size used in the cluster.  All machines in the cluster are the same type.  You can get the list of vm sizes available in your region by using the CLI command\n",
-        "\n",
-        "```shell\n",
-        "az vm list-skus -o tsv\n",
-        "```\n",
-        "* min_nodes - this sets the minimum size of the cluster.  If you set the minimum to 0 the cluster will shut down all nodes while note in use.  Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
-        "* max_nodes - this sets the maximum size of the cluster.  Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
-        "\n",
-        "\n",
-        "To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
-        "from azureml.core.compute_target import ComputeTargetException\n",
-        "\n",
-        "# Choose a name for your CPU cluster\n",
-        "cpu_cluster_name = \"cpucluster\"\n",
-        "\n",
-        "# Verify that cluster does not exist already\n",
-        "try:\n",
-        "    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
-        "    print(\"Found existing cpucluster\")\n",
-        "except ComputeTargetException:\n",
-        "    print(\"Creating new cpucluster\")\n",
-        "    \n",
-        "    # Specify the configuration for the new cluster\n",
-        "    compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
-        "                                                           min_nodes=0,\n",
-        "                                                           max_nodes=4)\n",
-        "\n",
-        "    # Create the cluster with the specified name and configuration\n",
-        "    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
-        "    \n",
-        "    # Wait for the cluster to complete, show the output log\n",
-        "    cpu_cluster.wait_for_completion(show_output=True)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
-        "from azureml.core.compute_target import ComputeTargetException\n",
-        "\n",
-        "# Choose a name for your GPU cluster\n",
-        "gpu_cluster_name = \"gpucluster\"\n",
-        "\n",
-        "# Verify that cluster does not exist already\n",
-        "try:\n",
-        "    gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
-        "    print(\"Found existing gpu cluster\")\n",
-        "except ComputeTargetException:\n",
-        "    print(\"Creating new gpucluster\")\n",
-        "    \n",
-        "    # Specify the configuration for the new cluster\n",
-        "    compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
-        "                                                           min_nodes=0,\n",
-        "                                                           max_nodes=4)\n",
-        "    # Create the cluster with the specified name and configuration\n",
-        "    gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
-        "\n",
-        "    # Wait for the cluster to complete, show the output log\n",
-        "    gpu_cluster.wait_for_completion(show_output=True)"
-      ]
-    },
     {
       "cell_type": "markdown",
       "metadata": {},

diff --git a/...utomated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.ipynb b/...utomated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.ipynb
@@ -249,7 +249,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "from azureml.train.automl._vendor.automl.client.core.common.onnx_convert import OnnxConverter\n",
+        "from azureml.automl.core.onnx_convert import OnnxConverter\n",
         "onnx_fl_path = \"./best_model.onnx\"\n",
         "OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
       ]

diff --git a/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb b/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb
@@ -328,6 +328,12 @@
         "            print()\n",
         "            for estimator in step[1].estimators:\n",
         "                print_model(estimator[1], estimator[0]+ ' - ')\n",
+        "        elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):\n",
+        "            print(\"\\nMeta Learner\")\n",
+        "            pprint(step[1]._meta_learner)\n",
+        "            print()\n",
+        "            for estimator in step[1]._base_learners:\n",
+        "                print_model(estimator[1], estimator[0]+ ' - ')\n",
         "        else:\n",
         "            pprint(step[1].get_params())\n",
         "            print()\n",

diff --git a/...omated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb b/...omated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb
@@ -117,21 +117,34 @@
       "outputs": [],
       "source": [
         "# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
-        "# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
-        "simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
-        "X = dprep.auto_read_file(simple_example_data_root + 'X.csv').skip(1)  # Remove the header row.\n",
-        "\n",
+        "# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
         "# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
         "# and convert column types manually.\n",
-        "# Here we read a comma delimited file and convert all columns to integers.\n",
-        "y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
+        "example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
+        "dflow = dprep.auto_read_file(example_data).skip(1)  # Remove the header row.\n",
+        "dflow.get_profile()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
+        "dflow = dflow.drop_nulls('Primary Type')\n",
+        "dflow.head(5)"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
+        "### Review the Data Preparation Result\n",
+        "\n",
+        "You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
+        "\n",
+        "`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
       ]
     },
     {
@@ -140,7 +153,8 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "X.skip(1).head(5)"
+        "X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
+        "y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
       ]
     },
     {
@@ -162,9 +176,8 @@
         "    \"iteration_timeout_minutes\" : 10,\n",
         "    \"iterations\" : 2,\n",
         "    \"primary_metric\" : 'AUC_weighted',\n",
-        "    \"preprocess\" : False,\n",
-        "    \"verbosity\" : logging.INFO,\n",
-        "    \"n_cross_validations\": 3\n",
+        "    \"preprocess\" : True,\n",
+        "    \"verbosity\" : logging.INFO\n",
         "}"
       ]
     },
@@ -181,7 +194,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "dsvm_name = 'mydsvmc'\n",
+        "dsvm_name = 'mydsvmb'\n",
         "\n",
         "try:\n",
         "    while ws.compute_targets[dsvm_name].provisioning_state == 'Creating':\n",
@@ -257,6 +270,23 @@
         "remote_run"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Pre-process cache cleanup\n",
+        "The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "remote_run.clean_preprocessor_cache()"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {},
@@ -376,7 +406,8 @@
       "source": [
         "## Test\n",
         "\n",
-        "#### Load Test Data"
+        "#### Load Test Data\n",
+        "For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
       ]
     },
     {
@@ -385,20 +416,16 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "from sklearn import datasets\n",
-        "\n",
-        "digits = datasets.load_digits()\n",
-        "X_test = digits.data[:10, :]\n",
-        "y_test = digits.target[:10]\n",
-        "images = digits.images[:10]"
+        "dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
+        "dflow_test = dflow_test.drop_nulls('Primary Type')"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
         "#### Testing Our Best Fitted Model\n",
-        "We will try to predict 2 digits and see how our model works."
+        "We will use confusion matrix to see how our model works."
       ]
     },
     {
@@ -407,65 +434,19 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "#Randomly select digits and test\n",
-        "from matplotlib import pyplot as plt\n",
-        "import numpy as np\n",
+        "from pandas_ml import ConfusionMatrix\n",
         "\n",
-        "for index in np.random.choice(len(y_test), 2, replace = False):\n",
-        "    print(index)\n",
-        "    predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
-        "    label = y_test[index]\n",
-        "    title = \"Label value = %d  Predicted value = %d \" % (label, predicted)\n",
-        "    fig = plt.figure(1, figsize=(3,3))\n",
-        "    ax1 = fig.add_axes((0,0,.8,.8))\n",
-        "    ax1.set_title(title)\n",
-        "    plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
-        "    plt.show()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## Appendix"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Capture the `Dataflow` Objects for Later Use in AutoML\n",
+        "y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
+        "X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
         "\n",
-        "`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# sklearn.digits.data + target\n",
-        "digits_complete = dprep.auto_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "print(digits_complete.to_pandas_dataframe().shape)\n",
-        "labels_column = 'Column64'\n",
-        "dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
-        "dflow_y = digits_complete.keep_columns(columns = [labels_column])"
+        "\n",
+        "ypred = fitted_model.predict(X_test)\n",
+        "\n",
+        "cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
+        "\n",
+        "print(cm)\n",
+        "\n",
+        "cm.plot()"
       ]
     }
   ],