# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
from azureml.core import Workspace, Experiment, Datastore, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy, MedianStoppingPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, normal, choice
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment

import os
import joblib
import pandas as pd
import sys
import numpy as np
import json

from sklearn.model_selection import train_test_split




## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'capstone-automl'

experiment=Experiment(ws, experiment_name)



### Data Preprocessing

In [3]:
df = pd.read_csv("Corona_NLP_train.csv")

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,UserName,ScreenName,Location,TweetAt,OriginalTweet,Sentiment
0,0,3799,48751.0,London,16-03-2020,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,Neutral
1,1,3800,48752.0,UK,16-03-2020,advice Talk to your neighbours family to excha...,Positive
2,2,3801,48753.0,Vagabonds,16-03-2020,Coronavirus Australia: Woolworths to give elde...,Positive
3,3,3802,48754.0,,16-03-2020,My food stock is not the only one which is emp...,Positive
4,4,3803,48755.0,,16-03-2020,"Me, ready to go at supermarket during the #COV...",Extremely Negative


In [5]:
df.Sentiment.unique()

array(['Neutral', 'Positive', 'Extremely Negative', 'Negative',
       'Extremely Positive', nan], dtype=object)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41159 entries, 0 to 41158
Data columns (total 7 columns):
Unnamed: 0       41158 non-null object
UserName         41158 non-null object
ScreenName       41157 non-null float64
Location         32567 non-null object
TweetAt          41157 non-null object
OriginalTweet    41157 non-null object
Sentiment        41155 non-null object
dtypes: float64(1), object(6)
memory usage: 2.2+ MB


In [7]:
df_modified = df[["OriginalTweet", "Sentiment"]].copy()
df_modified["OriginalTweet"] = df_modified["OriginalTweet"].astype("str")
df_modified["Sentiment"] = df_modified["Sentiment"].astype("str")
df_modified = df_modified[df_modified["Sentiment"] != "nan"]

       
positive = df_modified["Sentiment"].isin(["Neutral", "Positive", 'Extremely Positive'])
negative = df_modified["Sentiment"].isin(['Extremely Negative', 'Negative'])

df_modified.loc[positive, "Sentiment"] = 1
df_modified.loc[negative, "Sentiment"] = 0

df_modified["Sentiment"] = df_modified["Sentiment"].astype("int")

In [8]:
df_modified["Sentiment"].unique()

array([1, 0])

In [9]:
df_modified.head()

Unnamed: 0,OriginalTweet,Sentiment
0,@MeNyrbie @Phil_Gahan @Chrisitv https://t.co/i...,1
1,advice Talk to your neighbours family to excha...,1
2,Coronavirus Australia: Woolworths to give elde...,1
3,My food stock is not the only one which is emp...,1
4,"Me, ready to go at supermarket during the #COV...",0


In [10]:
df_modified.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 41155 entries, 0 to 41158
Data columns (total 2 columns):
OriginalTweet    41155 non-null object
Sentiment        41155 non-null int64
dtypes: int64(1), object(1)
memory usage: 964.6+ KB


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [11]:
# TODO: Put your automl settings here
automl_settings = {"experiment_timeout_minutes":30,
    "task":"classification",
    "primary_metric":"accuracy",
    "training_data":df_modified,
    "label_column_name":"Sentiment",
    "n_cross_validations":3}

# TODO: Put your automl config here
automl_config = AutoMLConfig(**automl_settings)

## Submit Experiment & Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [12]:
# TODO: Submit your experiment
automl_run = experiment.submit(automl_config, show_output=True)
RunDetails(automl_run).show()

Running on local machine
Parent Run ID: AutoML_2a9a1e55-b3a6-40e9-a09d-9463a869d0a9

Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetFeaturizationCompleted. Completed fit featurizers and featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

********************************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

*************************************************************************************************************



_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [13]:
best_run_automl, best_model_automl = automl_run.get_output()
best_run_metrics = best_run_automl.get_metrics()
parameter_values = best_run_automl.get_details()

print("Best Run Id: ", best_run_automl.id)
print("\n")
print("\n")
print("Accuracy: ", best_run_metrics["accuracy"])
print("\n")
print("\n")
print("Parameters: ", parameter_values)
print("\n")
print("\n")
print("Metrics for best run: ", best_run_automl.get_metrics())



Best Run Id:  AutoML_2a9a1e55-b3a6-40e9-a09d-9463a869d0a9_8




Accuracy:  0.7237275985160293




Parameters:  {'runId': 'AutoML_2a9a1e55-b3a6-40e9-a09d-9463a869d0a9_8', 'status': 'Completed', 'startTimeUtc': '2020-11-02T17:29:32.51884Z', 'endTimeUtc': '2020-11-02T17:31:39.762616Z', 'properties': {'runTemplate': 'automl_child', 'pipeline_id': '__AutoML_Stack_Ensemble__', 'pipeline_spec': '{"pipeline_id":"__AutoML_Stack_Ensemble__","objects":[{"module":"azureml.train.automl.stack_ensemble","class_name":"StackEnsemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'capstone-automl\',\'compute_target\':\'local\',\'subscription_id\':\'30d182b7-c8c4-421c-8fa0-d3037ecfe6d2\',\'region\':\'southcentralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_2a9a1e55-b3a6-40e9-a09d-9463a869d0a9_8","experiment_name":"capstone-a

In [14]:
# referencce - https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.run.automlrun?view=azure-ml-py
print(best_model_automl.steps[-1])

('stackensembleclassifier', StackEnsembleClassifier(base_learners=[('0',
                                        Pipeline(memory=None,
                                                 steps=[('maxabsscaler',
                                                         MaxAbsScaler(copy=True)),
                                                        ('lightgbmclassifier',
                                                         LightGBMClassifier(boosting_type='gbdt',
                                                                            class_weight=None,
                                                                            colsample_bytree=1.0,
                                                                            importance_type='split',
                                                                            learning_rate=0.1,
                                                                            max_depth=-1,
                                                     

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [15]:
# Personal ToDo - Check why can't register model from get_output() directly
best_model_automl = best_run_automl.register_model(model_name="capstone-automl.pkl", model_path="./")

In [17]:
best_model_automl.download(exist_ok=True)

''

In [21]:
env = Environment.get(ws, "AzureML-AutoML").clone("automl-env")

inference_config = InferenceConfig(entry_script='score.py', environment=env)

aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

service = best_model_automl.deploy(
    workspace=ws,
    name="automl-deployment",
    models=[best_model_automl],
    inference_config=inference_config,
    deployment_config=aci_config,
    overwrite=True
)
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.............................................................................................................................................................................

KeyboardInterrupt: 

TODO: In the cell below, send a request to the web service you deployed to test it.

In [None]:
service.update(enable_app_insights=True)

In [None]:
input_tweet_json = json.dumps({'data':df_modified["OriginalTweet"][0]})
output = service.run(input_tweet_json)
print("Predicted Sentiment: ", output)
print("Actual Sentiment: ", df_modified["Sentiment"][0])

TODO: In the cell below, print the logs of the web service and delete the service

In [23]:
print(service.get_logs())

2020-11-02T18:35:24,983853750+00:00 - gunicorn/run 
2020-11-02T18:35:24,987252066+00:00 - iot-server/run 
/usr/sbin/nginx: /azureml-envs/azureml_0e3a8a6dba181476a2523c12c58dfc97/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_0e3a8a6dba181476a2523c12c58dfc97/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_0e3a8a6dba181476a2523c12c58dfc97/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_0e3a8a6dba181476a2523c12c58dfc97/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_0e3a8a6dba181476a2523c12c58dfc97/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
2020-11-02T18:35:24,996758091+00:00 - rsyslog/run 
2020-11-02T18:35:24,997432814+00:00 - nginx/run 
rsyslogd

In [None]:
# service.delete()