Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

### Intel NLP-Architect ABSA on AzureML 

### INSTRUCTOR VERSION

> **This instructor version of the notebook gives additional instructions as to which cells should be run in demo mode, and which should not. It assumes that before the demo you will execute the complete notebook, and then during the demo certain cells would be re-run to demonstrate working process.**

This notebook contains an end-to-end walkthrough of using Azure Machine Learning Service to train, finetune and test [Aspect Based Sentiment Analysis Models using Intel's NLP Architect](http://nlp_architect.nervanasys.com/absa.html)

### Prerequisites

* Understand the architecture and terms introduced by Azure Machine Learning (AML)
* Have working Jupyter Notebook Environment. You can:
    - Install Python environment locally, as described below in **Local Installation**
    - Use [Azure Notebooks](https://docs.microsoft.com/ru-ru/azure/notebooks/azure-notebooks-overview/?wt.mc_id=absa-notebook-abornst). In this case you should upload the `absa.ipynb` file to a new Azure Notebooks project, or just clone the [GitHub Repo](https://github.com/microsoft/ignite-learning-paths/tree/master/aiml/aiml40).
* Azure Machine Learning Workspace in your Azure Subscription

#### Local Installation

Install the Python SDK: make sure to install notebook, and contrib:

```shell
conda create -n azureml -y Python=3.6
source activate azureml
pip install --upgrade azureml-sdk[notebooks,contrib] 
conda install ipywidgets
jupyter nbextension install --py --user azureml.widgets
jupyter nbextension enable azureml.widgets --user --py
```

You will need to restart jupyter after this Detailed instructions are [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python/?WT.mc_id=absa-notebook-abornst)

If you need a free trial account to get started you can get one [here](https://azure.microsoft.com/en-us/offers/ms-azr-0044p/?WT.mc_id=absa-notebook-abornst)

#### Creating Azure ML Workspace

Azure ML Workspace can be created by using one of the following ways:
* Manually through [Azure Portal](http://portal.azure.com/?WT.mc_id=absa-notebook-abornst) - [here is the complete walkthrough](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace/?wt.mc_id=absa-notebook-abornst)
* Using [Azure CLI](https://docs.microsoft.com/ru-ru/cli/azure/?view=azure-cli-latest&wt.mc_id=absa-notebook-abornst), using the following commands:

```shell
az extension add -n azure-cli-ml
az group create -n absa -l westus2
az ml workspace create -w absa_space -g absa
```

## Initialize workspace

To access an Azure ML Workspace, you will need to import the AML library and the following information:
* A name for your workspace (in our example - `absa_space`)
* Your subscription id (can be obtained by running `az account list`)
* The resource group name (in our case `absa`)

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace/?WT.mc_id=absa-notebook-abornst) object from the existing workspace you created in the Prerequisites step or create a new one. 

> **This cell can be run without problem, because it will just create a connection object for the workspace. Make sure to insert the correct `subscription_id` value before use, or have `config.json` file ready.**

In [3]:
from azureml.core import Workspace

#subscription_id = ''
#resource_group  = 'absa'
#workspace_name  = 'absa_space'
#ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
#ws.write_config()

try:
    ws = Workspace.from_config()
    print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')
    print('Library configuration succeeded')
except:
    print('Workspace not found')

abla_space	westeurope	abla	westeurope
Library configuration succeeded


## Compute

There are two computer option run once(preview) and persistent compute for this demo we will use persistent compute to learn more about run once compute check out the [docs](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute?WT.mc_id=absa-notebook-abornst).

> **This cell can be run because it will not re-create a cluster. Although it does not make much sense to run it**

In [4]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cluster_name = "absa-cluster"

# Verify that cluster does not exist already
try:
    cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2',
                                                           vm_priority='lowpriority',
                                                           min_nodes=1,
                                                           max_nodes=1)
    cluster = ComputeTarget.create(ws, cluster_name, compute_config)

cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned


## Upload Data

The dataset we are using comes from trip advisor and is in the open domain, this can be replaced with any csv file with rows of text as the absa model is unsupervised. 

> **You do not need to re-run this code during demo, as the file will be already downloaded**

In [4]:
!wget https://raw.githubusercontent.com/NervanaSystems/nlp-architect/master/datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv

--2019-09-20 10:26:47--  https://raw.githubusercontent.com/NervanaSystems/nlp-architect/master/datasets/absa/tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv
Resolving webproxy (webproxy)... 10.36.6.1
Connecting to webproxy (webproxy)|10.36.6.1|:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 961388 (939K) [text/plain]
Saving to: ‘tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv’


2019-09-20 10:26:48 (2.39 MB/s) - ‘tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv’ saved [961388/961388]



> **You do not need to re-run this code during demo as the file will be already uploaded**

In [13]:
import os                            
lib_root = os.path.dirname(os.path.abspath("__file__"))
ds = ws.get_default_datastore()
ds. upload_files([os.path.join(lib_root,'tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv')], 
                relative_root=lib_root)

## Train File

> **It does not matter if you execute this cell or not, because it will just overwrite the file. You may execute it, just to make the demo more live**

In [9]:
%%writefile train.py
import argparse
import os 
from azureml.core import Run
from spacy.cli.download import download as spacy_download
from nlp_architect.models.absa.train.train import TrainSentiment
from nlp_architect.models.absa import TRAIN_OUT
from nlp_architect.utils.io import download_unzip

spacy_download('en')
EMBEDDING_URL = 'http://nlp.stanford.edu/data', 'glove.840B.300d.zip'
EMBEDDING_PATH = TRAIN_OUT / 'word_emb_unzipped' / 'glove.840B.300d.txt'
download_unzip(*EMBEDDING_URL, EMBEDDING_PATH)

parser = argparse.ArgumentParser(description='ABSA Train')
parser.add_argument('--data_folder', type=str, dest='data_folder', help='data folder mounting point')
parser.add_argument('--learning_rate', type=float, default=3e-5, help='learning rate')
parser.add_argument('--epochs', type=int, default=5)
args = parser.parse_args()


rerank_model = None # Path to rerank model .h5 file
parsed_data = None

tripadvisor_train = os.path.join(args.data_folder, 
                                 'tripadvisor_co_uk-travel_restaurant_reviews_sample_2000_train.csv')

os.makedirs('outputs', exist_ok=True)
    

train = TrainSentiment(parse=not parsed_data, rerank_model=rerank_model)


opinion_lex, aspect_lex = train.run(data=tripadvisor_train,
                                    out_dir = './outputs',
                                    parsed_data=parsed_data)

# get hold of the current run
run = Run.get_context()

run.log('Aspect Lexicon Size:', len(aspect_lex))
run.log('Opinion Lexicon Size:', len(opinion_lex))

Overwriting train.py


## Create An Expierment

Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment/?WT.mc_id=absa-notebook-abornst) to track all the runs in your workspace for this distributed PyTorch tutorial. 

> **In most of the cases, you want to skip the following 3 cells during the demo, in order not to run the experiment again. However, you may also start another experiment if time permists, in which case you can run them**

In [10]:
from azureml.core import Experiment
experiment_name = 'absa'

exp = Experiment(workspace=ws, name=experiment_name)

In [14]:
from azureml.train.estimator import Estimator

script_params = {
    '--data_folder': ds,
}

# find a way to integrate nlp architect 
nlp_est = Estimator(source_directory='.',
                   script_params=script_params,
                   compute_target=cluster,
                   environment_variables = {'NLP_ARCHITECT_BE':'CPU'},
                   entry_script='train.py',
                   pip_packages=['git+https://github.com/NervanaSystems/nlp-architect.git@absa'])


In [15]:
run = exp.submit(nlp_est)
run_id = run.id
print(run_id)

'absa_1568985331_df076c3c'

> **To retrieve the run, we use run id here. It can either be hard-coded from the previous pre-demo run, or you can rely on the jupyter kernel not restarting, in which case it will be saved in the `run_id` variable. So, if the jupyter engine has not been restarted, you may run cell 2, otherwise run cell 1** 

In [16]:
run = [r for r in exp.get_runs() if r.id == 'absa_1568985331_df076c3c'][0]

In [None]:
run = [r for r in exp.get_runs() if r.id == run_id][0]

> **Run this to show the result of the run, either in progress or completed**

In [17]:
from azureml.widgets import RunDetails

RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

## Fine-Tuning NLP Archictect  with AzureML HyperDrive
Although ABSA is an unsupervised method it can be fined tuned if provided with a small sample of labeled data

> **It probably makes sense to skip the whole hyperdrive section, and just go through the code overview**

In [23]:
from azureml.train.hyperdrive import *
import math

param_sampling = RandomParameterSampling( {
        'asp_thresh': list(range(1,5)),
         'op_thresh': 2, 
         'max_iter': list(range(1,5))
    }
)

hyperdrive_run_config = HyperDriveRunConfig(estimator=nlp_est,
                                            hyperparameter_sampling=param_sampling, 
                                            primary_metric_name='f1', # This requires a modification of script to finetune on supervised data
                                            primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                            max_total_runs=16,
                                            max_concurrent_runs=4)

HyperDriveRunConfig is deprecated. Please use the new HyperDriveConfig class.


Finally, lauch the hyperparameter tuning job.

In [24]:
experiment = Experiment(workspace=ws, name='hyperdrive')
hyperdrive_run = experiment.submit(hyperdrive_run_config)

### Monitor HyperDrive runs
We can monitor the progress of the runs with the following Jupyter widget. 

In [25]:
from azureml.widgets import RunDetails

RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

### Find and register the best model
Once all the runs complete, we can find the run that produced the model with the highest evaluation (METRIC TBD).

In [None]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
print(best_run)
print('Best Run is:\n  F1: {0:.5f} \n  Learning rate: {1:.8f}'.format(
        best_run_metrics['eval_f1'][-1],
        best_run_metrics['lr']
     ))

## Register Model Outputs

In [56]:
aspect_lex = run.register_model(model_name='aspect_lex', model_path='outputs/train_out/generated_aspect_lex.csv')
opinion_lex = run.register_model(model_name='opinion_lex', model_path='outputs/train_out/generated_opinion_lex_reranked.csv')

# Deploy as web service
Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/?WT.mc_id=bert-notebook-abornst).

To build the correct environment for ACI, provide the following:

A scoring script to show how to use the model
An environment file to show what packages need to be installed
A configuration file to build the ACI
The model you trained before

## Create scoring script
Create the scoring script, called score.py, used by the web service call to show how to use the model.

You must include two required functions into the scoring script:

The init() function, which typically loads the model into a global object. This function is run only once when the Docker container is started.

The run(input_data) function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

In [113]:
%%writefile score.py
from azureml.core.model import Model
from nlp_architect.models.absa.inference.inference import SentimentInference
from spacy.cli.download import download as spacy_download


def init():
    """
    Set up the ABSA model for Inference  
    """
    global inference
    spacy_download('en')
    aspect_lex = Model.get_model_path('aspect_lex')
    opinion_lex = Model.get_model_path('opinion_lex')    
    inference = SentimentInference(aspect_lex, opinion_lex)

def run(raw_data):
    """
    Evaluate the model and return JSON string
    """
    sentiment_doc = inference.run(doc=raw_data)
    return sentiment_doc.json()

Overwriting score.py


## Create configuration files


### ACI Config
Create a ACI configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service.`

In [99]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1,  
                                               tags={"data": "text",  "method" : "NLP Architcet ABSA"}, 
                                               description='Predict ABSA with NLP Architect')

### Create Enviorment File
create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs nlp-architect and the azureml-sdk. 

In [100]:
from azureml.core.conda_dependencies import CondaDependencies 

pip = ["azureml-defaults", "azureml-monitoring", "git+https://github.com/NervanaSystems/nlp-architect.git@absa"]

myenv = CondaDependencies.create(pip_packages=pip)

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

### Create Environment Config
Create a Enviorment configuration file and specify the enviroment and enviormental variables required for the application

In [101]:
from azureml.core import Environment
deploy_env = Environment.from_conda_specification('absa_env', "myenv.yml")
deploy_env.environment_variables={'NLP_ARCHITECT_BE': 'CPU'}

### Inference Config 
Create an inference configuration that recieves the deployment enviorment and the entry script

In [102]:
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(environment=deploy_env,
                                   entry_script="score.py")

# Deploy in ACI
Estimated time to complete: about 7-8 minutes

Configure the image and deploy. The following code goes through these steps:

Build an image using:
The scoring file (score.py)
The environment file (myenv.yml)
The model file
Register that image under the workspace.
Send the image to the ACI container.
Start up a container in ACI using the image.
Get the web service HTTP endpoint.
https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-azure-container-instance

In [114]:
%%time

from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.model import Model

aspect_lex = Model(ws, 'aspect_lex')
opinion_lex = Model(ws, 'opinion_lex')    

service = Model.deploy(workspace=ws,
                       name='absa-srvc', 
                       models=[aspect_lex, opinion_lex],
                       inference_config=inference_config, 
                       deployment_config=aciconfig)
service.wait_for_deployment(show_output = True)
print(service.state)

Creating service
Running.........................................
SucceededACI service creation operation finished, operation "Succeeded"
Healthy
CPU times: user 397 ms, sys: 69.6 ms, total: 466 ms
Wall time: 3min 45s


View service logs: This is powerful for debugging

In [115]:
print(service.get_logs())


/bin/bash: /azureml-envs/azureml_e748c621598b8a5948a2f7276c7bb60c/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_e748c621598b8a5948a2f7276c7bb60c/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_e748c621598b8a5948a2f7276c7bb60c/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_e748c621598b8a5948a2f7276c7bb60c/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2019-08-26T13:07:50,262247639+00:00 - gunicorn/run 
2019-08-26T13:07:50,262797544+00:00 - iot-server/run 
2019-08-26T13:07:50,262247539+00:00 - rsyslog/run 
2019-08-26T13:07:50,263256549+00:00 - nginx/run 
bash: /azureml-envs/azureml_e748c621598b8a5948a2f7276c7bb60c/lib/libtinfo.so.5: no version information available (required by bash)
/usr/sbin/nginx: /azureml-envs/azureml_e748c621598b8a5948a2f7276c7bb60c/lib/libcrypto.so.1.0.0

In [116]:
service = ws.webservices['absa-srvc']

Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.


In [117]:
print(service.scoring_uri)

http://401d4329-3c4c-4187-97ec-3cad2d439708.eastus.azurecontainer.io/score


## Test Deployed ACI Service

In [121]:
import requests
import json
from nlp_architect.models.absa.inference.data_types import TermType

# send a random row from the test set to score
input_data = "The ambiance is charming. Uncharacteristically, the service was DREADFUL.\
              When we wanted to pay our bill at the end of the evening, our waitress was nowhere to be found..."

headers = {'Content-Type':'application/json'}

resp = requests.post(service.scoring_uri, input_data, headers=headers)
resp.json()

'{"_doc_text": "The ambiance is charming. Uncharacteristically, the service was DREADFUL.              When we wanted to pay our bill at the end of the evening, our waitress was nowhere to be found...", "_sentences": [{"_start": 0, "_end": 24, "_events": [[{"_text": "ambiance", "_type": "ASPECT", "_polarity": "POS", "_score": 1.0, "_start": 4, "_len": 8}, {"_text": "charming", "_type": "OPINION", "_polarity": "POS", "_score": 1.0, "_start": 16, "_len": 8}]]}, {"_start": 26, "_end": 72, "_events": [[{"_text": "service", "_type": "ASPECT", "_polarity": "NEG", "_score": -1.0, "_start": 52, "_len": 7}, {"_text": "DREADFUL", "_type": "OPINION", "_polarity": "NEG", "_score": -1.0, "_start": 64, "_len": 8}]]}, {"_start": 87, "_end": 183, "_events": [[{"_text": "waitress", "_type": "ASPECT", "_polarity": "NEG", "_score": -0.98065746, "_start": 149, "_len": 8}, {"_text": "waitress", "_type": "OPINION", "_polarity": "NEG", "_score": -0.98065746, "_start": 149, "_len": 8}]]}]}'

### Render the response using [Displacy](https://spacy.io/usage/visualizers/)
Note ```Spacy``` Must be installed on the local machine for this to work can be installed with ```pip install spacy```

In [122]:
import spacy
from spacy import displacy

if resp.text:
    doc = json.loads(resp.json()) # load response as dictionary
    doc_viz = {'text':doc["_doc_text"], 'ents':[]}
    for s in doc["_sentences"]:
        for e in s["_events"][0]:
            if e["_type"] == "ASPECT":
                doc_viz['ents'].append({'start': e["_start"], 'end': e["_start"] + e["_len"], 'label':str(e["_polarity"])})
    doc_viz['ents'].sort(key=lambda m: m["start"])
    displacy.render(doc_viz, style="ent", options={'colors':{'POS':'#7CFC00', 'NEG':'#FF0000'}}, manual=True)