Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.png)

# Automated Machine Learning
_**Text Classification Using Deep Learning**_

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Data](#Data)
1. [Train](#Train)
1. [Evaluate](#Evaluate)

## Introduction
This notebook demonstrates classification with text data using deep learning in AutoML.

AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data. Depending on the compute cluster the user provides, AutoML tried out Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used, and Bidirectional Long-Short Term neural network (BiLSTM) when a CPU compute is used, thereby optimizing the choice of DNN for the uesr's setup.

Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.

Notebook synopsis:

1. Creating an Experiment in an existing Workspace
2. Configuration and remote run of AutoML for a text dataset (20 Newsgroups dataset from scikit-learn) for classification
3. Registering the best model for future use
4. Evaluating the final model on a test set

## Setup

In [1]:
from azureml.core import Dataset

subscription_id = '381b38e9-9840-4719-a5a0-61d9585e1e91'
resource_group_name = 'gasi_rg_centraleuap'
workspace_name = "gasi_ws_centraleuap"

from azureml.core import Workspace as WorkspaceV1
ws = WorkspaceV1(workspace_name=workspace_name, resource_group=resource_group_name, subscription_id=subscription_id)

ds = Dataset.get(ws, name='beertrain')

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.
"Dataset.get" is deprecated after version 1.0.69. Please use "Dataset.get_by_name" and "Dataset.get_by_id" to retrieve dataset. See Dataset API change notice at https://aka.ms/dataset-deprecation.


In [3]:
ds.to_pandas_dataframe()

Unnamed: 0,DATE,BeerProduction
0,1992-01-01,3459
1,1992-02-01,3458
2,1992-03-01,4002
3,1992-04-01,4564
4,1992-05-01,4221
...,...,...
235,2011-08-01,10469
236,2011-09-01,10085
237,2011-10-01,9612
238,2011-11-01,10328


In [2]:
%env AZURE_EXTENSION_DIR=/home/schrodinger/automl/sdk-cli-v2/src/cli/src
%env AZURE_ML_CLI_PRIVATE_FEATURES_ENABLED=true

env: AZURE_EXTENSION_DIR=/home/schrodinger/automl/sdk-cli-v2/src/cli/src
env: AZURE_ML_CLI_PRIVATE_FEATURES_ENABLED=true


In [3]:
import azure.ml
from azure.ml import MLClient

from azure.core.exceptions import ResourceExistsError

from azure.ml.entities import Workspace
from azure.ml.entities import AmlCompute
from azure.ml.entities import Data

import pandas as pd
import os
from sklearn.datasets import fetch_20newsgroups

This sample notebook may use features that are not available in previous versions of the Azure ML SDK.

In [4]:
print("This notebook was created using version 1.31.0 of the Azure ML SDK")
print("You are currently using SDK version", azure.ml.version.VERSION, "of the Azure ML SDK")

This notebook was created using version 1.31.0 of the Azure ML SDK
You are currently using SDK version 0.0.86 of the Azure ML SDK


As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem.

In [5]:
subscription_id = '381b38e9-9840-4719-a5a0-61d9585e1e91'
resource_group_name = 'gasi_rg_centraleuap'
workspace_name = "gasi_ws_centraleuap"
experiment_name = 'automl-classification-text-dnn'

client = MLClient(subscription_id, resource_group_name, default_workspace_name=workspace_name)

client

<azure.ml._ml_client.MLClient at 0x7fbc19e41f90>

### Initialize MLFlowClient

Create an MLFlowClient to interact with the resources that the AutoML job creates, such as models, metrics.

In [6]:
import mlflow

########
# TODO: The API to get tracking URI is not yet available on Worksapce object.
from azureml.core import Workspace as WorkspaceV1
ws = WorkspaceV1(workspace_name=workspace_name, resource_group=resource_group_name, subscription_id=subscription_id)
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
########

# Not sure why this doesn't work w/o the double + single quotes
# mlflow.set_tracking_uri("azureml://northeurope.experiments.azureml.net/mlflow/v1.0/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_neu/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_neu?")
mlflow.set_experiment(experiment_name)

print("\nCurrent tracking uri: {}".format(mlflow.get_tracking_uri()))

If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.



Current tracking uri: azureml://master.experiments.azureml-test.net/mlflow/v1.0/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_centraleuap/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_centraleuap?


## Set up a compute cluster
This section uses a user-provided compute cluster (named "dnntext-cluster" in this example). If a cluster with this name does not exist in the user's workspace, the below code will create a new cluster. You can choose the parameters of the cluster as mentioned in the comments.

> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

Whether you provide/select a CPU or GPU cluster, AutoML will choose the appropriate DNN for that setup - BiLSTM or BERT text featurizer will be included in the candidate featurizers on CPU and GPU respectively.  If your goal is to obtain the most accurate model, we recommend you use GPU clusters since BERT featurizers usually outperform BiLSTM featurizers.

In [7]:
# Set or create compute

num_nodes = 2
cpu_cluster_name = "gpu-cluster"
compute = AmlCompute(
    name=cpu_cluster_name, size="STANDARD_NC6",
    min_instances=0, max_instances=num_nodes,
    idle_time_before_scale_down=120
)

try:
    client.compute.create(compute)
except ResourceExistsError as re:
    print(re)
except Exception as e:
    import traceback
    
    print("Could not create compute.", str(e))
    # traceback.print_exc()
    # Reload an existing compute target
    compute = client.compute.get(cpu_cluster_name)

compute

AmlCompute({'name': 'gpu-cluster', 'id': None, 'description': None, 'tags': {}, 'properties': {}, 'base_path': './', 'location': 'centraluseuap', 'type': 'amlcompute', 'enable_public_ip': False, 'resource_id': None, 'provisioning_state': None, 'provisioning_errors': None, 'created_on': None, 'size': 'STANDARD_NC6', 'min_instances': 0, 'max_instances': 2, 'idle_time_before_scale_down': 120, 'identity_type': None, 'user_assigned_identities': None, 'admin_username': 'azureuser', 'admin_password': None, 'ssh_key_value': None, 'vnet_name': None, 'subnet': None, 'priority': None})

### Get data
For this notebook we will use 20 Newsgroups data from scikit-learn. We filter the data to contain four classes and take a sample as training data. Please note that for accuracy improvement, more data is needed. For this notebook we provide a small-data example so that you can use this template to use with your larger sized data.

In [8]:
data_dir = "text-dnn-data" # Local directory to store data
blobstore_datadir = data_dir # Blob store directory to store data in
target_column_name = 'y'
feature_column_name = 'X'

def get_20newsgroups_data():
    '''Fetches 20 Newsgroups data from scikit-learn
       Returns them in form of pandas dataframes
    '''
    remove = ('headers', 'footers', 'quotes')
    categories = [
        'rec.sport.baseball',
        'rec.sport.hockey',
        'comp.graphics',
        'sci.space',
        ]

    data = fetch_20newsgroups(subset = 'train', categories = categories,
                                    shuffle = True, random_state = 42,
                                    remove = remove)
    data = pd.DataFrame({feature_column_name: data.data, target_column_name: data.target})

    data_train = data[:200]
    data_test = data[200:300]    

    data_train = remove_blanks_20news(data_train, feature_column_name, target_column_name)
    data_test = remove_blanks_20news(data_test, feature_column_name, target_column_name)
    
    return data_train, data_test
    
def remove_blanks_20news(data, feature_column_name, target_column_name):
    
    data[feature_column_name] = data[feature_column_name].replace(r'\n', ' ', regex=True).apply(lambda x: x.strip())
    data = data[data[feature_column_name] != '']
    
    return data

#### Fetch data and upload to datastore for use in training


In [9]:
data_train, data_test = get_20newsgroups_data()

if not os.path.isdir(data_dir):
    os.mkdir(data_dir)
    
train_data_fname = data_dir + '/train_data.csv'
test_data_fname = data_dir + '/test_data.csv'

data_train.to_csv(train_data_fname, index=False)
data_test.to_csv(test_data_fname, index=False)

datastore = ws.get_default_datastore()
datastore.upload(src_dir=data_dir, target_path=blobstore_datadir,
                    overwrite=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Uploading an estimated of 2 files
Uploading text-dnn-data/train_data.csv
Uploaded text-dnn-data/train_data.csv, 1 files out of an estimated total of 2
Uploading text-dnn-data/test_data.csv
Uploaded text-dnn-data/test_data.csv, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_c9fcbb8e1d6a445bb3487e2156229a7a

#### Once the data is uploaded to the blob store, please create the tabular dataset manually from the UI

The files should already be present under the default workspace storage account, under {{data_dir}} specified above.

In [17]:
dataset_name = "textdnn"    # use the name chosen on the UI
dataset_version = 1

try:
    train_dataset = client.data.get(dataset_name, dataset_version)
#     training_data = Data(name=dataset_name, version=dataset_version, local_path="./data")
#     training_data = client.data.create_or_update(training_data)
#     print("Uploaded to path  : ", data.path)
#     print("Datastore location: ", data.datastore)
except Exception as e:
    print("Could not create dataset. ", str(e))

train_dataset

Data({'is_anonymous': False, 'name': 'textdnn', 'id': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_centraleuap/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_centraleuap/data/textdnn/versions/1', 'description': None, 'tags': {}, 'properties': {}, 'base_path': './', 'creation_context': <azure.ml._restclient.v2021_03_01_preview.models._models_py3.SystemData object at 0x7f1dc4395750>, 'version': 1, 'datastore': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_centraleuap/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_centraleuap/datastores/workspaceblobstore', 'path': 'text-dnn-data/train_data.csv', 'local_path': None})

### Prepare AutoML run

This notebook uses the blocked_models parameter to exclude some models that can take a longer time to train on some text datasets. You can choose to remove models from the blocked_models list but you may need to increase the experiment_timeout_hours parameter value to get results.

In [None]:
automl_settings = {
    "experiment_timeout_minutes": 30,
    "primary_metric": 'accuracy',
    "max_concurrent_iterations": num_nodes, 
    "max_cores_per_iteration": -1,
    "enable_dnn": True,
    "enable_early_stopping": True,
    "validation_size": 0.3,
    "verbosity": logging.INFO,
    "enable_voting_ensemble": False,
    "enable_stack_ensemble": False,
}

automl_config = AutoMLConfig(task = 'classification',
                             debug_log = 'automl_errors.log',
                             compute_target=compute_target,
                             training_data=train_dataset,
                             label_column_name=target_column_name,
                             blocked_models = ['LightGBM', 'XGBoostClassifier'],
                             **automl_settings
                            )

In [26]:
from azure.ml._restclient.v2020_09_01_preview.models import (
    GeneralSettings,
    DataSettings,
    LimitSettings,
    TrainingDataSettings,
    ValidationDataSettings,
    TestDataSettings,
    FeaturizationSettings,
)

from azure.ml.entities._job.automl.training_settings import TrainingSettings
from azure.ml.entities._job.automl.featurization import FeaturizationSettings
from azure.ml.entities import AutoMLJob, ComputeConfiguration


compute_settings = ComputeConfiguration(target=compute.name)

general_settings = GeneralSettings(
    task_type="classification",
    primary_metric= "accuracy",
    log_verbosity="Info")

limit_settings = LimitSettings(
    timeout=30,
    trial_timeout=10,    # without a default trial timeout, job creation results in a validation error
    max_concurrent_trials=num_nodes,
    enable_early_termination=True)

training_data_settings = TrainingDataSettings(
    dataset_arm_id="{}:{}".format(train_dataset.name, train_dataset.version)
)
validation_data_settings = ValidationDataSettings(validation_data_size=0.3)

data_settings = DataSettings(
    training_data=training_data_settings,
    target_column_name=target_column_name,
    validation_data=validation_data_settings
)

featurization_settings = FeaturizationSettings(
    featurization_config="auto"
)

training_settings = TrainingSettings(
    block_list_models=['LightGBM', 'XGBoostClassifier'],
    enable_dnn_training=True,
    enable_vote_ensemble=False,
    enable_stack_ensemble=False
)

extra_automl_settings = {"save_mlflow": True}

automl_job = AutoMLJob(
    compute=compute_settings,
    general_settings=general_settings,
    limit_settings=limit_settings,
    data_settings=data_settings,
    training_settings=training_settings,
    featurization_settings=featurization_settings,
    properties=extra_automl_settings,
)

automl_job

AutoMLJob({'name': 'f0164207-1d4b-4c23-8e17-44ec43d773d3', 'id': None, 'description': None, 'tags': {}, 'properties': {'save_mlflow': True}, 'base_path': './', 'type': 'automl_job', 'creation_context': None, 'experiment_name': 'classification-text-dnn', 'status': None, 'interaction_endpoints': None, 'log_files': None, 'output': None, 'general_settings': <azure.ml._restclient.v2020_09_01_preview.models._models_py3.GeneralSettings object at 0x7f1dc3f39e10>, 'data_settings': <azure.ml._restclient.v2020_09_01_preview.models._models_py3.DataSettings object at 0x7f1dc3f39f50>, 'limit_settings': <azure.ml._restclient.v2020_09_01_preview.models._models_py3.LimitSettings object at 0x7f1dc3f39fd0>, 'forecasting_settings': None, 'training_settings': <azure.ml.entities._job.automl.training_settings.TrainingSettings object at 0x7f1dc3f39a10>, 'featurization_settings': <azure.ml.entities._job.automl.featurization.FeaturizationSettings object at 0x7f1dc3f39e90>, 'compute': {'instance_count': None, 't

In [22]:
automl_job.dump("./automl_dnn_job.yaml")

#### Submit AutoML Run

In [6]:
created_job = client.jobs.create_or_update(automl_job)
created_job

AutoMLJob({'name': 'f0164207-1d4b-4c23-8e17-44ec43d773d3', 'id': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_centraleuap/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_centraleuap/jobs/f0164207-1d4b-4c23-8e17-44ec43d773d3', 'description': None, 'tags': {}, 'properties': {'save_mlflow': 'True', 'errors': 'Setup iteration failed: {"run_error": {"exception": {"message": "FitException:\\n\\tMessage: Encountered an internal AutoML error.                                      Exception raised while initializing PretrainedTextDnn model for text dnn.                                     Please try the experiment again, and contact support if the error persists.                                      Error Message/Code: {error_details}\\n\\tInnerException: AttributeError\\n\\tErrorResponse \\n{\\n    \\"error\\": {\\n        \\"code\\": \\"SystemError\\",\\n        \\"message\\": \\"Encountered an internal AutoML error.                                

In [28]:
print("Studio URL: ", created_job.interaction_endpoints["Studio"].endpoint)

Studio URL:  https://ml.azure.com/runs/f0164207-1d4b-4c23-8e17-44ec43d773d3?wsid=/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourcegroups/gasi_rg_centraleuap/workspaces/gasi_ws_centraleuap&tid=72f988bf-86f1-41af-91ab-2d7cd011db47


Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!

### Retrieve the Best Model
Below we select the best model pipeline from our iterations, use it to test on test data on the same compute cluster.

You can test the model locally to get a feel of the input/output. When the model contains BERT, this step will require pytorch and pytorch-transformers installed in your local environment. The exact versions of these packages can be found in the **automl_env.yml** file located in the local copy of your MachineLearningNotebooks folder here:
MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/automl_env.yml

In [8]:
from mlflow.tracking import MlflowClient

# TODO: Use this run, as it has MLFlow model stored on the run
job_name = "AutoML_51844331-1691-4c7f-b65f-4be28f2c68b5"
# job_name = created_job.name

mlflow_client = MlflowClient()
mlflow_parent_run = mlflow_client.get_run(job_name)

best_child_run_id = mlflow_parent_run.data.tags["automl_best_child_run_id"]
print("Found best child run id: ", best_child_run_id)

best_run_customized = mlflow_client.get_run(best_child_run_id)
best_run_customized

Found best child run id:  AutoML_51844331-1691-4c7f-b65f-4be28f2c68b5_12


<Run: data=<RunData: metrics={'AUC_macro': 0.9075892857142858,
 'AUC_micro': 0.9130003963535473,
 'AUC_weighted': 0.904351395730706,
 'accuracy': 0.7586206896551724,
 'average_precision_score_macro': 0.8230804014367542,
 'average_precision_score_micro': 0.8139691450897437,
 'average_precision_score_weighted': 0.8161927680357136,
 'balanced_accuracy': 0.765625,
 'f1_score_macro': 0.7685983397190295,
 'f1_score_micro': 0.7586206896551724,
 'f1_score_weighted': 0.7601400449200687,
 'log_loss': 0.754499859554939,
 'matthews_correlation': 0.6800317795420728,
 'norm_macro_recall': 0.6875,
 'precision_score_macro': 0.781547619047619,
 'precision_score_micro': 0.7586206896551724,
 'precision_score_weighted': 0.7722906403940887,
 'recall_score_macro': 0.765625,
 'recall_score_micro': 0.7586206896551724,
 'recall_score_weighted': 0.7586206896551724,
 'weighted_accuracy': 0.7529411764705882}, params={}, tags={'_aml_system_ComputeTargetStatus': '{"AllocationState":"steady","PreparingNodeCount":0,"

#### Loading the model locally requires Torch to be installed

In [33]:
%pip install torch==1.9.0+cpu torchvision==0.10.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.9.0+cpu
  Downloading https://download.pytorch.org/whl/cpu/torch-1.9.0%2Bcpu-cp37-cp37m-linux_x86_64.whl (175.5 MB)
[K     |████████████████████████████████| 175.5 MB 15 kB/s  eta 0:00:011    |████▍                           | 23.9 MB 12.6 MB/s eta 0:00:13
[?25hCollecting torchvision==0.10.0+cpu
  Downloading https://download.pytorch.org/whl/cpu/torchvision-0.10.0%2Bcpu-cp37-cp37m-linux_x86_64.whl (15.7 MB)
[K     |████████████████████████████████| 15.7 MB 12.9 MB/s eta 0:00:01    |███████████                     | 5.4 MB 13.6 MB/s eta 0:00:01
[?25hCollecting torchaudio==0.9.0
  Downloading torchaudio-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)
[K     |████████████████████████████████| 1.9 MB 9.1 MB/s eta 0:00:01
Collecting pillow>=5.3.0
  Downloading Pillow-8.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0 MB)
[K     |████████████████████████████████| 3.0 MB 11.7 MB/s eta 

In [9]:
!pip install pytorch-transformers==1.0.0

Collecting pytorch-transformers==1.0.0
  Downloading pytorch_transformers-1.0.0-py3-none-any.whl (137 kB)
[K     |████████████████████████████████| 137 kB 13.7 MB/s eta 0:00:01
Collecting regex
  Downloading regex-2021.7.6-cp37-cp37m-manylinux2014_x86_64.whl (721 kB)
[K     |████████████████████████████████| 721 kB 12.1 MB/s eta 0:00:01
Collecting sentencepiece
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 11.8 MB/s eta 0:00:01
Installing collected packages: sentencepiece, regex, pytorch-transformers
Successfully installed pytorch-transformers-1.0.0 regex-2021.7.6 sentencepiece-0.1.96


In [9]:
import mlflow.sklearn

fitted_model = mlflow.sklearn.load_model("runs:/{}/outputs".format(best_run_customized.info.run_id))

You can now see what text transformations are used to convert text data to features for this dataset, including deep learning transformations based on BiLSTM or Transformer (BERT is one implementation of a Transformer) models.

In [10]:
text_transformations_used = []
for column_group in fitted_model.named_steps['datatransformer'].get_featurization_summary():
    text_transformations_used.extend(column_group['Transformations'])
text_transformations_used

['StringCast-CharGramTfIdf',
 'StringCast-WordGramTfIdf',
 'StringCast-StringConcatTransformer-PretrainedTextDNNTransformer']

### Registering the best model
We now register the best fitted model from the AutoML Run for use in future deployments.  

Get results stats, extract the best model from AutoML run, download and register the resultant best model

In [14]:
# TODO: This is v1 Run dependencies, won't work on v2

# summary_df = get_result_df(automl_run)
# best_dnn_run_id = summary_df['run_id'].iloc[0]
# best_dnn_run = Run(experiment, best_dnn_run_id)

In [15]:
def download_outputs_via_mlflow_client(mlflow_client, run_id, path) -> str:
    """Download the `path` (file or dir) from the run artifacts, returns the local path download"""
    local_dir = "/tmp/artifact_downloads/{}".format(run_id)
    local_path = os.path.join(local_dir, path)
    if os.path.exists(local_path):
        print("Directory {} already exists. Skipping download.".format(os.path.join(local_path, path)))
    else:
        # download outputs
        if not os.path.exists(local_path):
            os.makedirs(local_path, exist_ok = False) 

        local_path = mlflow_client.download_artifacts(run_id, path, local_path)
        print("Artifacts downloaded to: {}".format(local_path))
        print("Artifacts: {}".format(os.listdir(local_path)))
    return local_path

output_path = download_outputs_via_mlflow_client(mlflow_client, best_run_customized.info.run_id, "outputs")

Directory /tmp/artifact_downloads/AutoML_51844331-1691-4c7f-b65f-4be28f2c68b5_12/outputs/outputs already exists. Skipping download.


Register the model in your Azure Machine Learning Workspace. If you previously registered a model, please make sure to delete it so as to replace it with this new model.

In [20]:
# Register the model
from azure.ml.entities._assets import Model

model_name = 'textDNN-20News'

# Note: This is not using MLFlow's deployment mechanism at all (flavors, scoring script / examples etc.)
# Create / register the model
# TODO: This doesn't track the lineage (run id) from which the model is created. 
azure_model = Model(name=model_name, version=1, local_path=os.path.join(output_path, "model.pkl"))
azure_model = client.models.create_or_update(azure_model)
azure_model

Uploading model.pkl: 126MB [00:42, 2.99MB/s]                            


Model({'is_anonymous': False, 'name': 'textDNN-20News', 'id': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_centraleuap/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_centraleuap/models/textDNN-20News/versions/1', 'description': None, 'tags': {}, 'properties': {'azureml.modelFormat': 'CUSTOM'}, 'base_path': './', 'creation_context': <azure.ml._restclient.v2021_03_01_preview.models._models_py3.SystemData object at 0x7fbbb695ad50>, 'version': 1, 'datastore': '/subscriptions/381b38e9-9840-4719-a5a0-61d9585e1e91/resourceGroups/gasi_rg_centraleuap/providers/Microsoft.MachineLearningServices/workspaces/gasi_ws_centraleuap/datastores/workspaceblobstore', 'path': 'LocalUpload/d1bf7479845ea42773f0dcc4588ee132/model.pkl', 'local_path': None, 'utc_time_created': None, 'flavors': {}})

## Evaluate on Test Data

We now use the best fitted model from the AutoML Run to make predictions on the test set.  

Test set schema should match that of the training set.

In [21]:
test_dataset = pd.read_csv("./text-dnn-data/test_data.csv")

test_dataset.head()

Unnamed: 0,X,y
0,DFW was designed with the STS in mind (which r...,3
1,"Johnny Mize had six three-HR games, which is t...",1
2,Actually I admired the spirit of the fan at th...,1
3,"I don't know a whole lot on Proton, but given ...",3
4,SPECIFIC: Basically to be able to do the thing...,3


In [None]:
# TODO: Inference run / job is currently not possible

# test_experiment = Experiment(ws, experiment_name + "_test")

# script_folder = os.path.join(os.getcwd(), 'inference')
# os.makedirs(script_folder, exist_ok=True)
# shutil.copy('infer.py', script_folder)

# test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run,
#                          test_dataset, target_column_name, model_name)

Display computed metrics

In [22]:
# test_run

In [23]:
# RunDetails(test_run).show()

In [24]:
# test_run.wait_for_completion()

In [25]:
# pd.Series(test_run.get_metrics())