Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.png)

# Automated Machine Learning
_**Multiclass Text Classification Using AutoML NLP**_

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Data](#Data)
1. [Train](#Train)
1. [Inference](#Inference)

## Introduction
This notebook demonstrates classification with text data using AutoML NLP.

AutoML highlights here include using end to end deep learning for NLP tasks like multiclass text classification.

Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.

Notebook synopsis:

1. Creating an Experiment in an existing Workspace
2. Configuration and remote run of AutoML for a multiclass text dataset from scikit-learn, [20 Newsgroups dataset](https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html)
3. Evaluating the trained model on a test set

## Setup

In [6]:
# !pip install ipywidgets

In [1]:
import logging
import os

import pandas as pd
import numpy as np

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.run import Run
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.automlnlpconfig import AutoNLPConfig
from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split

import re

This sample notebook may use features that are not available in previous versions of the Azure ML SDK.

In [3]:
print("This notebook was created using version 1.39.0 of the Azure ML SDK")
print("You are currently using version", azureml.core.VERSION, "of the Azure ML SDK")

This notebook was created using version 1.39.0 of the Azure ML SDK
You are currently using version 0.1.0.66679059 of the Azure ML SDK


As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem.

In [3]:
ws = Workspace.from_config()

# Choose an experiment name.
experiment_name = "automl-nlp-text-multiclass-smartConcat"

experiment = Experiment(ws, experiment_name)

output = {}
output["Subscription ID"] = ws.subscription_id
output["Workspace Name"] = ws.name
output["Resource Group"] = ws.resource_group
output["Location"] = ws.location
output["Experiment Name"] = experiment.name
pd.set_option("display.max_colwidth", None)
outputDf = pd.DataFrame(data=output, index=[""])
outputDf.T

Unnamed: 0,Unnamed: 1
Subscription ID,2dbd7833-129e-4a48-b976-e6dd28a92c29
Workspace Name,sodexo-nlp
Resource Group,sodexo-nlp
Location,westeurope
Experiment Name,automl-nlp-text-multiclass-smartConcat


## Set up a compute cluster
This section uses a user-provided compute cluster (named "dist-compute" in this example). If a cluster with this name does not exist in the user's workspace, the below code will create a new cluster. You can choose the parameters of the cluster as mentioned in the comments.

In [10]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

num_nodes = 1

# Choose a name for your cluster.
amlcompute_cluster_name = "gpu-cluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print("Found existing cluster, use it.")
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(
        vm_size="STANDARD_NC6", max_nodes=num_nodes  # use GPU Nodes
    )
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [11]:
compute_target

AmlCompute(workspace=Workspace.create(name='sodexo-nlp', subscription_id='2dbd7833-129e-4a48-b976-e6dd28a92c29', resource_group='sodexo-nlp'), name=gpu-cluster, id=/subscriptions/2dbd7833-129e-4a48-b976-e6dd28a92c29/resourceGroups/sodexo-nlp/providers/Microsoft.MachineLearningServices/workspaces/sodexo-nlp/computes/gpu-cluster, type=AmlCompute, provisioning_state=Succeeded, location=westeurope, tags={})

## Registering Environment Using Build Index

In [8]:
import json
from azureml.core.environment import Environment
envdef_file_name = './envdef.json'
with open(envdef_file_name) as f:
    envdef = json.load(f)

# Register the environment in the workspace in centraluseuap
env = Environment._deserialize_and_add_to_object(envdef)

env.name = "AutoML-DNN-Text-GPU-Candidate"
env.register(workspace = ws)

Environment version is set. Attempting to register desired version. To auto-version, reset version to None.


{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.0.3-cudnn8-ubuntu18.04",
        "baseImageRegistry": null,
        "enabled": true,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": "16g"
    },
    "environmentVariables": {},
    "inferencingStackVersion": null,
    "name": "AutoML-DNN-Text-GPU-Candidate",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "dependencies": [
                "python=3.7",
                {
                    "pip": [
                        "--extra-index-url https://azuremlsdktestpypi.azureedge.net/automl-nlp-tests/66679059",
                        

In [9]:
ws

Workspace.create(name='sodexo-nlp', subscription_id='2dbd7833-129e-4a48-b976-e6dd28a92c29', resource_group='sodexo-nlp')

# Data

In [22]:
sent_rel_df = pd.read_csv('./andreas-labelled-raw.csv')

In [23]:
# get unique list of labels and drop entries with multiple labels in one line (separated by comma)
labels = list(pd.unique(sent_rel_df['Label']))
labels = list(set([element for element in labels if ',' not in element]))

# combine 'good food quality' and 'good taste' to a single label 'good quality or taste'
gf = 'Good food quality'
gt = 'Good taste'
gqt = 'Good quality or taste'

labels.remove(gf)
labels = [gqt if item == gt else item for item in labels]

# build dict with label names according to the naming conventions of AutoML for NLP
replace_values = dict((label, '') for label in labels)

for key, value in replace_values.items():
    new_value = key.lower().replace(' ', '_')
    replace_values[key] = new_value

In [24]:
def adjust_labels(input):
    input_list = input.split(',')

    # combine 'good food quality' and 'good taste' to a single label 'good quality or taste'
    if gf in input_list or gt in input_list:
        input_list.append(gqt)
        if gf in input_list:
            input_list.remove(gf)
        if gt in input_list:
            input_list.remove(gt)

    # rename labels according to AutoML for NLP convention
    shortnames = [replace_values[item] for item in input_list]
    
    # add double quotes (chr 34) to the whole list and convert to string
    result = str(shortnames)
    
    return result

sent_rel_df['multilabel'] = sent_rel_df['Label'].apply(adjust_labels)

In [25]:
# remove special characters from comments
def remove_special_chars(text):
    regex = '[^0-9a-zA-Z$£!.,:;/& ]+'
    result = re.sub(regex, '', text)
    return result

# remove_special_chars('a£b c$def!')

sent_rel_df['comments'] = sent_rel_df['comments'].apply(remove_special_chars)


In [26]:
# Generate train and validation splits
cols_of_interest = ['comments', 'multilabel']

# train_df, val_df = train_test_split(sentiment_df[cols_of_interest], test_size=0.2, stratify=sentiment_df['sentiment'], random_state=123)
train_df, val_df = train_test_split(sent_rel_df[cols_of_interest], test_size=0.2, random_state=123)

In [27]:
train_df.sample(10)

Unnamed: 0,comments,multilabel
4390,Bread was chewy on the sandwich and soup was watery and lacked flavor.,['bad_quality_or_taste']
3168,Amazing coffee and service.,"['good_services', 'good_quality_or_taste']"
1038,Portion was about half the size of the previous orders,['portion_too_small']
1673,It&8217;d be nice ito have staff wearing masks that are preparing and serving food.,"['bad_service', 'global_negative_feedbacks']"
2806,It was great,['global_positive_feedbacks']
2778,Outstanding barista service from Callum again! :,['good_services']
1954,Oatmeal was watery; plastic spoons too small to hold in hand and scoop up oatmeal.,"['bad_quality_or_taste', 'global_negative_feedbacks']"
3918,Loose coffee grounds ruined both drinks,['bad_quality_or_taste']
3075,OMG these tacos are so good!,['good_quality_or_taste']
1747,"Hi, my order was not delivered as the team said there were not salads and it was a system fault. So a refund of this order should be donde to my bank account. Many thanks.","['product_not_available', 'wrong_or_missing_order']"


In [28]:
train_df.to_csv('./ms_sentiment_multilabel_train.csv', index = False)
val_df.to_csv('./ms_sentiment_multilabel_val.csv', index = False)

In [29]:
ee = pd.read_csv('ms_sentiment_multilabel_train.csv')
ee

Unnamed: 0,comments,multilabel
0,How about random surveys,['app_to_improve']
1,"Lovely Mocha, thank you",['good_quality_or_taste']
2,Got the food. But Im not sure the ticket process worked today. I had to ask if they got my order.,['app_to_improve']
3,Great like always,['global_positive_feedbacks']
4,Excellent food and service. I put in a special request and all accommodations were met. Thanks,"['good_services', 'good_quality_or_taste']"
...,...,...
3583,Great as always!!!,['global_positive_feedbacks']
3584,I didnt get what I exactly ordered and the wait was pretty long,"['wrong_or_missing_order', 'waited_too_long']"
3585,Liked the clam chowder a lot...great for this raw Monday.,['good_quality_or_taste']
3586,Need a salad with grilled chicken everyday.,['good_quality_or_taste']


In [6]:
train_dataset = Dataset.get_by_name(ws, name='sent-multilabel-ms-train')
train_dataset.to_pandas_dataframe()

Unnamed: 0,comments,multilabel
0,How about random surveys,['app_to_improve']
1,"Lovely Mocha, thank you",['good_quality_or_taste']
2,Got the food. But Im not sure the ticket process worked today. I had to ask if they got my order.,['app_to_improve']
3,Great like always,['global_positive_feedbacks']
4,Excellent food and service. I put in a special request and all accommodations were met. Thanks,"['good_services', 'good_quality_or_taste']"
...,...,...
3583,Great as always!!!,['global_positive_feedbacks']
3584,I didnt get what I exactly ordered and the wait was pretty long,"['wrong_or_missing_order', 'waited_too_long']"
3585,Liked the clam chowder a lot...great for this raw Monday.,['good_quality_or_taste']
3586,Need a salad with grilled chicken everyday.,['good_quality_or_taste']


In [7]:
val_dataset = Dataset.get_by_name(ws, name='sent-multilabel-ms-val')
val_dataset.to_pandas_dataframe()

Unnamed: 0,comments,multilabel
0,Great staff and the sandwich tasted,"['good_services', 'good_quality_or_taste']"
1,Great service from Adrian,['good_services']
2,Excellent experience. Thanks for the wonderful Meals every time I ordered. I really them.,"['global_positive_feedbacks', 'good_quality_or_taste']"
3,Food was just done as I got there and the cook was very friendly. The burger also tasted great!,"['good_services', 'arrived_on_time', 'good_quality_or_taste']"
4,Great service from Callum!,['good_services']
...,...,...
893,"Plz clean ketchup packets, they are sticky",['bad_service']
894,Great first experience. Took a bit long to set up the app but next time it will be easier. :,['global_positive_feedbacks']
895,"Please include the default toppings so in future orders I can ask them to hold these toppings i.e. lettuce, tomato, onions, etc.",['app_to_improve']
896,Food was amazing. You guys are awesome. Have a great day.,"['good_services', 'good_quality_or_taste']"


# Train

## Submit AutoML run

Now we can start the run with the prepared compute resource and datasets. This should only take a few minutes.

Here we do not set `primary_metric` parameter as we only train one model and we do not need to rank trained models. The run will use default primary metrics, `accuracy`. But it is only for reporting purpose.

In [12]:
from azureml.train.hyperdrive import BanditPolicy, RandomParameterSampling, GridParameterSampling
from azureml.train.hyperdrive import choice, uniform


# Example config with single non-BERT model and wide hyperparameter space.
# parameter_space = {
#     "model": choice(
#         {
#             "model_name": choice("roberta-base"),
#             "learning_rate": uniform(1e-6, 1e-4),
#             "weight_decay": uniform(0.0, 0.1),
#             "train_batch_size": choice(8, 16, 32),
#             "num_train_epochs": choice(3, 4, 5),
#             "warmup_ratio": uniform(0.0, 0.2),
#             "lr_scheduler_type": choice("linear", "cosine_with_restarts", "cosine", "polynomial", "constant", "constant_with_warmup")
#         },
#     ),
# }

# Example config with different conditional model parameter spaces.
# parameter_space = {
#     "model": choice(
#         {
#             "model_name": choice("bert-base-multilingual-cased"),
#             "train_batch_size": choice(16),
#             "gradient_accumulation_steps": choice(2)
#         },
#         {
#             "model_name": choice("xlm-roberta-base")
#         }
#     )
# }

# Example config for just model sweeping.
# parameter_space = {
#     "model_name": choice("distilbert-base-cased", "distilroberta-base")
# }
# Example config for just model sweeping.
# parameter_space = {
#     "model_name": choice("bert-base-cased", "roberta-base", "xlnet-base-cased", 
#                          "distilbert-base-cased", "xlm-roberta-base", "distilroberta-base")
# }

# Example config for sweeping over large models.
parameter_space = {
    "model_name": choice("bert-large-cased", "roberta-large", "xlnet-large-cased", "xlm-roberta-large")
}

# If early termination is desired, uncomment and set as necessary. Note that in the latest indices,
# the model only evaluates every 2000 steps, so a large dataset will be needed in order for this to be actually useful.
tuning_settings = {
    "iterations": 4,
#     "max_concurrent_iterations": 2,
    "hyperparameter_sampling": GridParameterSampling(parameter_space),
#     "early_termination_policy": BanditPolicy(
#         evaluation_interval=2, slack_amount=0.03, delay_evaluation=6
#     ),
}
label_column_name = "multilabel"
dataset_language = "eng"
autonlp_config = AutoNLPConfig(
    primary_metric="accuracy",
    task="text-classification-multilabel",
    compute_target=compute_target,
    training_data=train_dataset,
    validation_data=val_dataset,
    dataset_language=dataset_language,
    label_column_name=label_column_name,
    scenario="TextDNN-Candidate",
#     enable_distributed_dnn_training=True,
    **tuning_settings
)

# For now, tuning_settings["max_concurrent_iterations"] controls how many trials we run concurrently.
# The below setting is our original max_concurrent_iterations lever which affects multi-node distribution.
# Set as necessary for distributed training.
# autonlp_config.user_settings['max_concurrent_iterations'] = 2



#### Submit AutoML Run

In [13]:
automl_run = experiment.submit(autonlp_config, show_output=False)
_ = automl_run.wait_for_completion(show_output=False)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
automl-nlp-text-multiclass-smartConcat,AutoML_035ca838-51fb-4d00-bd6d-ac8ca4bad77e,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Download Metrics

These metrics logged with the training run are computed with the trained model on validation dataset

In [1]:
validation_metrics = automl_run.get_metrics()
pd.DataFrame(
    {"metric_name": validation_metrics.keys(), "value": validation_metrics.values()}
)

NameError: name 'automl_run' is not defined

You can also get the best run id and the best model with `get_output` method.

In [13]:
(
    best_run,
    best_model,
) = (
    automl_run.get_output()
)  # You might see a warning about "enable_distributed_dnn_training". Please simply ignore.
best_run



Experiment,Id,Type,Status,Details Page,Docs Page
automl-nlp-text-multiclass-smartConcat,AutoML_7be6866d-c756-4427-88af-686f865bb7a3_HD_0,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


# Local inference

In [5]:
import pandas as pd
import numpy as np
import pickle
import os

from sklearn.metrics import classification_report

In [2]:
val_df = pd.read_csv('./ms_sentiment_multilabel_val.csv')

model_path = './models/xlm-roberta-large.pkl'

with open(model_path, 'rb') as f:
    model = pickle.load(f)

labels = list(model.y_transformer.classes_)

print(labels)

data = val_df.copy()

['app_to_improve', 'app_works_well', 'arrived_on_time', 'bad_food_temperature', 'bad_quality_or_taste', 'bad_service', 'global_negative_feedbacks', 'global_positive_feedbacks', 'good_food_temperature', 'good_quality_or_taste', 'good_services', 'inappropriate_packaging', 'misleading_images', 'not_enough_options', 'other', 'packaging_not_sustainable', 'portion_too_small', 'product_not_available', 'too_expensive', 'waited_too_long', 'wrong_or_missing_order']


In [16]:
outputs = model.predict(data, threshold = 0.142105)

In [17]:
y_true_series = val_df['multilabel']
y_pred_list = outputs

y_pred_arr = np.zeros([len(y_pred_list), len(labels)])
y_true_arr = np.zeros([len(y_true_series), len(labels)])

# build encoded matrix for y_pred_arr
for row_idx, observation in enumerate(y_pred_list):
    for label_idx, label in enumerate(labels):
        if label in observation:
            y_pred_arr[row_idx, label_idx] = 1

# build encoded matrix for y_true_arr
for row_idx, observation in enumerate(y_true_series):
    for label_idx, label in enumerate(labels):
        if observation.find(label) != -1:
            y_true_arr[row_idx, label_idx] = 1



In [18]:
# calculate percentage of samples where all labels are predicted correctly
np.sum(np.all(y_pred_arr == y_true_arr, axis=1)) / y_true_arr.shape[0]

0.643652561247216

In [19]:
# y_pred_arr = np.random.randint(2, size=(len(y_pred_list), len(labels)))

print(classification_report(y_true_arr, y_pred_arr, target_names=labels, zero_division=True))

                           precision    recall  f1-score   support

           app_to_improve       0.65      0.82      0.72        56
           app_works_well       0.25      0.11      0.15         9
          arrived_on_time       0.58      1.00      0.73        33
     bad_food_temperature       0.82      1.00      0.90        32
     bad_quality_or_taste       0.85      0.99      0.92       136
              bad_service       0.71      0.29      0.42        17
global_negative_feedbacks       0.29      0.65      0.40        37
global_positive_feedbacks       0.53      0.93      0.68       107
    good_food_temperature       0.75      0.38      0.50         8
    good_quality_or_taste       0.79      1.00      0.88       229
            good_services       0.86      1.00      0.92       174
  inappropriate_packaging       0.74      0.87      0.80        23
        misleading_images       0.00      0.00      0.00         5
       not_enough_options       0.73      0.95      0.83     

In [9]:
val_df['predicted'] = outputs
val_df.sample(21)

Unnamed: 0,comments,multilabel,predicted
645,I luv coke,['global_positive_feedbacks'],"(global_positive_feedbacks, good_quality_or_ta..."
323,I appreciated a fully vegetarian meal. Every d...,"['not_enough_options', 'good_quality_or_taste']","(not_enough_options,)"
866,Phenomenal!,['global_positive_feedbacks'],"(global_positive_feedbacks,)"
209,Meat was too salty/dry,['bad_quality_or_taste'],"(bad_quality_or_taste,)"
830,"No salad bar, food is cold, no personal service.","['not_enough_options', 'bad_food_temperature',...","(bad_food_temperature, not_enough_options)"
595,Instead of large I got a medium we corrected a...,['wrong_or_missing_order'],"(wrong_or_missing_order,)"
708,Kitchen team needs to remove the shrimp shells...,['bad_quality_or_taste'],"(bad_quality_or_taste,)"
819,"Meal was made well, but it did not appeal to me.","['good_services', 'bad_quality_or_taste']","(bad_quality_or_taste, good_services)"
137,First time ordering and it was so easy and the...,"['global_positive_feedbacks', 'good_quality_or...","(global_positive_feedbacks, good_quality_or_ta..."
378,missing,['global_negative_feedbacks'],()


In [10]:
from sklearn.metrics import f1_score

f1_score(y_true_arr, y_pred_arr, average='macro')

0.472933636763225

In [12]:
thresholds = np.linspace(0.1, 0.5, 20)
print(thresholds)

[0.1        0.12105263 0.14210526 0.16315789 0.18421053 0.20526316
 0.22631579 0.24736842 0.26842105 0.28947368 0.31052632 0.33157895
 0.35263158 0.37368421 0.39473684 0.41578947 0.43684211 0.45789474
 0.47894737 0.5       ]


In [15]:
for threshold in thresholds:
    outputs = model.predict(data, threshold = threshold)

    y_true_series = val_df['multilabel']
    y_pred_list = outputs

    y_pred_arr = np.zeros([len(y_pred_list), len(labels)])
    y_true_arr = np.zeros([len(y_true_series), len(labels)])

    # build encoded matrix for y_pred_arr
    for row_idx, observation in enumerate(y_pred_list):
        for label_idx, label in enumerate(labels):
            if label in observation:
                y_pred_arr[row_idx, label_idx] = 1

    # build encoded matrix for y_true_arr
    for row_idx, observation in enumerate(y_true_series):
        for label_idx, label in enumerate(labels):
            if observation.find(label) != -1:
                y_true_arr[row_idx, label_idx] = 1

    f1_macro = f1_score(y_true_arr, y_pred_arr, average='macro')
    f1_weighted = f1_score(y_true_arr, y_pred_arr, average='weighted')
    
    print (f'Threshold: {threshold:.6f} - f1-macro : {f1_macro:.4f} - f1-weighted : {f1_weighted:.4f}')


Threshold: 0.100000 - f1-macro : 0.6607 - f1-weighted : 0.7635
Threshold: 0.121053 - f1-macro : 0.6284 - f1-weighted : 0.7900
Threshold: 0.142105 - f1-macro : 0.6305 - f1-weighted : 0.8058
Threshold: 0.163158 - f1-macro : 0.6102 - f1-weighted : 0.8146
Threshold: 0.184211 - f1-macro : 0.5861 - f1-weighted : 0.8104
Threshold: 0.205263 - f1-macro : 0.5711 - f1-weighted : 0.8125
Threshold: 0.226316 - f1-macro : 0.5567 - f1-weighted : 0.8100
Threshold: 0.247368 - f1-macro : 0.5404 - f1-weighted : 0.8113
Threshold: 0.268421 - f1-macro : 0.5267 - f1-weighted : 0.8120
Threshold: 0.289474 - f1-macro : 0.5189 - f1-weighted : 0.8108
Threshold: 0.310526 - f1-macro : 0.5163 - f1-weighted : 0.8133
Threshold: 0.331579 - f1-macro : 0.5173 - f1-weighted : 0.8157
Threshold: 0.352632 - f1-macro : 0.5057 - f1-weighted : 0.8155
Threshold: 0.373684 - f1-macro : 0.5058 - f1-weighted : 0.8167
Threshold: 0.394737 - f1-macro : 0.4876 - f1-weighted : 0.8114
Threshold: 0.415789 - f1-macro : 0.4850 - f1-weighted :