# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [2]:
from azureml.core import Workspace, Experiment

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.train.automl import AutoMLConfig

from azureml.widgets import RunDetails

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split


## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [3]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'mental-health-classification'

experiment=Experiment(ws, experiment_name)

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code ENUWEHLQM to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


### Creating the compute

In [4]:
cpu_cluster_name = "cpu-cluster"

   # Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',max_nodes=7)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


### Data Preparation
#### Access the data

In [5]:
# Create a project_folder if it doesn't exist
if not os.path.isdir('data'):
    os.mkdir('data')
    
if not os.path.exists('project_folder'):
    os.makedirs('project_folder')



In [6]:
project_folder="./project_folder/"

In [7]:
df = pd.read_csv("Train.csv")
train_data, valid_data = train_test_split(df, test_size=0.1, random_state=42)
label = "label"

In [8]:
train_data.head()

Unnamed: 0,ID,text,label
56,6T8HVMG3,what can I do to minimize my alcoholism,Alcohol
335,458NTYKE,"Had a problem with my personal looks,feel sad",Depression
439,43COYA4O,What is the benefit of alcohol,Alcohol
292,V0C7LKYN,"I am very nervous and bothered,",Depression
611,BOHSNXCN,What should I do to stop alcoholism?,Alcohol


In [9]:
train=train_data.iloc[:, 1:]
valid=valid_data.iloc[:, 1:]


train.to_csv('data/train.csv', index=False)
valid.to_csv('data/valid.csv', index=False)

In [10]:
train.head()

Unnamed: 0,text,label
56,what can I do to minimize my alcoholism,Alcohol
335,"Had a problem with my personal looks,feel sad",Depression
439,What is the benefit of alcohol,Alcohol
292,"I am very nervous and bothered,",Depression
611,What should I do to stop alcoholism?,Alcohol


In [11]:
datastore=ws.get_default_datastore()

In [12]:
datastore.upload(src_dir="./data", target_path="mental_health_clf", show_progress=True)

Uploading an estimated of 2 files
Uploading ./data/train.csv
Uploaded ./data/train.csv, 1 files out of an estimated total of 2
Uploading ./data/valid.csv
Uploaded ./data/valid.csv, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_8d9abfcf1a8b4f59a5618baf7b75dd05

In [13]:
train_set =TabularDatasetFactory.from_delimited_files(path=datastore.path("mental_health_clf/train.csv"))

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [14]:
# TODO: Put your automl settings here
import time
automl_settings = {
    "name": "AutoML_mental_health".format(time.time()),
    "enable_early_stopping" : True,
    "experiment_timeout_minutes" : 40,
    "iteration_timeout_minutes": 10,
    "n_cross_validations": 5,
    "primary_metric": 'accuracy',
    "max_concurrent_iterations": 10,
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(task="classification",
                             training_data=train_set,
                             label_column_name="label",
                             compute_target=cpu_cluster,
                             debug_log='automl_errors.log',
                             path=project_folder,
                             model_explainability=True,
                             **automl_settings,
                            )

In [15]:
# TODO: Submit your experiment
remote_run = experiment.submit(config=automl_config, show_output=True)

Running on remote.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster
Parent Run ID: AutoML_2aadb2b0-9913-4585-ac5f-caa597a91ef1

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+-------------------------

In [39]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [40]:
best_run, fitted_model= remote_run.get_output()

In [41]:
print(best_run)

Run(Experiment: mental-health-classification,
Id: AutoML_2aadb2b0-9913-4585-ac5f-caa597a91ef1_48,
Type: azureml.scriptrun,
Status: Completed)


In [19]:
print(fitted_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                intercept_scaling=1,
                                                                                                l1_ratio=None,
                                                                                                max_iter=100,
                                                      

In [42]:
fitted_model.named_steps["datatransformer"].get_engineered_feature_names()

['text_CharGramTfIdf_ , ',
 'text_CharGramTfIdf_ ,h',
 'text_CharGramTfIdf_ ,n',
 'text_CharGramTfIdf_ -d',
 'text_CharGramTfIdf_ -h',
 'text_CharGramTfIdf_ a ',
 'text_CharGramTfIdf_ ab',
 'text_CharGramTfIdf_ ac',
 'text_CharGramTfIdf_ ad',
 'text_CharGramTfIdf_ af',
 'text_CharGramTfIdf_ ag',
 'text_CharGramTfIdf_ ah',
 'text_CharGramTfIdf_ al',
 'text_CharGramTfIdf_ am',
 'text_CharGramTfIdf_ an',
 'text_CharGramTfIdf_ ap',
 'text_CharGramTfIdf_ ar',
 'text_CharGramTfIdf_ as',
 'text_CharGramTfIdf_ at',
 'text_CharGramTfIdf_ av',
 'text_CharGramTfIdf_ aw',
 'text_CharGramTfIdf_ ba',
 'text_CharGramTfIdf_ be',
 'text_CharGramTfIdf_ bh',
 'text_CharGramTfIdf_ bo',
 'text_CharGramTfIdf_ br',
 'text_CharGramTfIdf_ bu',
 'text_CharGramTfIdf_ by',
 'text_CharGramTfIdf_ ca',
 'text_CharGramTfIdf_ ce',
 'text_CharGramTfIdf_ ch',
 'text_CharGramTfIdf_ cl',
 'text_CharGramTfIdf_ co',
 'text_CharGramTfIdf_ cr',
 'text_CharGramTfIdf_ cu',
 'text_CharGramTfIdf_ d ',
 'text_CharGramTfIdf_ da',
 

In [43]:
fitted_model.named_steps["datatransformer"].get_featurization_summary()

[{'RawFeatureName': 'text',
  'TypeDetected': 'Text',
  'Dropped': 'No',
  'EngineeredFeatureCount': 5090,
  'Transformations': ['StringCast-CharGramTfIdf', 'StringCast-WordGramTfIdf']}]

In [44]:
fitted_model.steps

[('datatransformer',
  DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                  feature_sweeping_config=None, feature_sweeping_timeout=None,
                  featurization_config=None, force_text_dnn=None,
                  is_cross_validation=None, is_onnx_compatible=None, logger=None,
                  observer=None, task=None, working_dir=None)),
 ('prefittedsoftvotingclassifier',
  PreFittedSoftVotingClassifier(classification_labels=None,
                                estimators=[('12',
                                             Pipeline(memory=None,
                                                      steps=[('sparsenormalizer',
                                                              <azureml.automl.runtime.shared.model_wrappers.SparseNormalizer object at 0x7f5a52598390>),
                                                             ('xgboostclassifier',
                                                              XGBoostClassifier(base_score=0

In [45]:
test=pd.read_csv("./Test.csv")
test.head()

Unnamed: 0,ID,text
0,02V56KMO,How to overcome bad feelings and emotions
1,03BMGTOK,I feel like giving up in life
2,03LZVFM6,I was so depressed feel like got no strength t...
3,0EPULUM5,I feel so low especially since I had no one to...
4,0GM4C5GD,can i be successful when I am a drug addict?


In [46]:
test.columns

Index(['ID', 'text'], dtype='object')

In [93]:
fitted_model.predict(pd.DataFrame({"text": ["drugs"]}))

array(['Drugs'], dtype=object)

In [47]:
from sklearn.metrics import accuracy_score

y_pred=fitted_model.predict(valid.iloc[:, :1])

print("Model accuracy: ", accuracy_score(valid['label'], y_pred))

Model accuracy:  0.7258064516129032


In [48]:
y_proba=fitted_model.predict_proba(test[["text"]])

In [49]:
y_proba

Unnamed: 0,Alcohol,Depression,Drugs,Suicide
0,0.06,0.82,0.03,0.09
1,0.02,0.95,0.01,0.02
2,0.02,0.94,0.02,0.03
3,0.03,0.89,0.03,0.04
4,0.18,0.43,0.14,0.24
...,...,...,...,...
304,0.07,0.81,0.05,0.07
305,0.07,0.74,0.05,0.14
306,0.21,0.62,0.06,0.10
307,0.03,0.05,0.88,0.04


In [50]:
sub=pd.DataFrame()

In [51]:
sub["ID"]=test.ID
sub["Depression"]=y_proba.Depression
sub["Alcohol"]=y_proba.Alcohol
sub["Suicide"]=y_proba.Suicide
sub["Drugs"]=y_proba.Drugs

In [52]:
sub.head()

Unnamed: 0,ID,Depression,Alcohol,Suicide,Drugs
0,02V56KMO,0.82,0.06,0.09,0.03
1,03BMGTOK,0.95,0.02,0.02,0.01
2,03LZVFM6,0.94,0.02,0.03,0.02
3,0EPULUM5,0.89,0.03,0.04,0.03
4,0GM4C5GD,0.43,0.18,0.24,0.14


In [53]:
sub.to_csv("./automl_3.csv", index=False)

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [78]:
description="AutoML model. 0.85 training accuracy valid 0.70"
model=remote_run.register_model(model_name="automl-clf", description=description,tags={"area": "mental_health", "type":"classification"})

In [82]:
model

Model(workspace=Workspace.create(name='quick-starts-ws-133629', subscription_id='a24a24d5-8d87-4c8a-99b6-91ed2d2df51f', resource_group='aml-quickstarts-133629'), name=automl-clf, id=automl-clf:2, version=2, tags={'area': 'mental_health', 'type': 'classification'}, properties={})

In [83]:
print(remote_run.model_id)

automl-clf


In [80]:
model_id=remote_run.model_id

In [81]:
print(model.name, model.id, model.version, sep='\t')

automl-clf	automl-clf:2	2


### Create scoring file

In [96]:
%%writefile score.py
import json
import numpy as np
import os
import pickle
import joblib
import pandas as pd

def init():
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment.
    # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
    # For multiple models, it points to the folder containing all deployed models (./azureml-models)
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'),'model.pkl')
    model = joblib.load(model_path)

def run(raw_data):
    try: 
        data = pd.DataFrame({"text": np.array(json.loads(raw_data)['data'])})
        # make prediction
        y_hat = model.predict(data)
        # you can return any data type as long as it is JSON-serializable
        return y_hat.tolist()
    except Exception as e:
        error=str(e)
        return error

Overwriting score.py


### Inference Config

In [85]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "mental health",  "type" : "classification"}, 
                                               description='classify mental health respones')


In [103]:
%%time
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model

ws = Workspace.from_config()
model = Model(ws, model_id)


myenv = Environment.get(workspace=ws, name="AzureML-AutoML")
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

service = Model.deploy(workspace=ws, 
                       name='mental-health-classification', 
                       models=[model], 
                       inference_config=inference_config, 
                       deployment_config=aciconfig)

service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running.................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
CPU times: user 4.57 s, sys: 194 ms, total: 4.77 s
Wall time: 4min 37s


In [77]:
model


Model(workspace=Workspace.create(name='quick-starts-ws-133629', subscription_id='a24a24d5-8d87-4c8a-99b6-91ed2d2df51f', resource_group='aml-quickstarts-133629'), name=automl-clf, id=automl-clf:1, version=1, tags={'area': 'mental_health', 'type': 'classification'}, properties={})

TODO: In the cell below, send a request to the web service you deployed to test it.

In [124]:
import json
test = json.dumps({"data": list(valid["text"])})
test = bytes(test, encoding='utf8')
y_hat = service.run(input_data=test)

In [125]:
from sklearn.metrics import confusion_matrix

conf_mx = confusion_matrix(valid["label"], y_hat)
print(conf_mx)
print('Overall accuracy:',accuracy_score(y_hat, valid["label"]))

[[11  2  0  0]
 [ 0 29  0  2]
 [ 1  4  4  0]
 [ 0  8  0  1]]
Overall accuracy: 0.7258064516129032


In [111]:
valid.head()

Unnamed: 0,text,label
78,How can I overcome alcoholism?,Alcohol
208,what are ways of dealing with depression,Depression
570,Is it right to smoke weed in the name of doing...,Drugs
181,How to avoid the thoughts?,Suicide
101,I am thinking about school expenses,Depression


#### HTTP request

In [120]:
json.dumps({"data": [str(valid.loc[random_index, "text"])]})

'{"data": ["How to avoid the thoughts?"]}'

In [123]:
import requests

# send a random row from the test set to score
random_index=181
# input_data = "{\"data\": [" + str(list(valid.loc[random_index, "text"])) + "]}"
input_data=json.dumps({"data": [str(valid.loc[random_index, "text"])]})

headers = {'Content-Type': 'application/json'}

# for AKS deployment you'd need to the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(service.scoring_uri, input_data, headers=headers)

print("POST to url", service.scoring_uri)
#print("input data:", input_data)
print("label:", valid.loc[random_index,"label"])
print("prediction:", resp.text)

POST to url http://74cf312d-3e19-4e00-b426-df26289fc3af.southcentralus.azurecontainer.io/score
label: Suicide
prediction: ["Depression"]


TODO: In the cell below, print the logs of the web service and delete the service

In [122]:
print(service.get_logs())

2021-01-06T13:12:26,832465100+00:00 - gunicorn/run 
2021-01-06T13:12:26,840431600+00:00 - rsyslog/run 
2021-01-06T13:12:26,849077200+00:00 - iot-server/run 
2021-01-06T13:12:26,878760100+00:00 - nginx/run 
rsyslogd: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libuuid.so.1: no version information available (required by rsyslogd)
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8eff28b157f42edcd2424a5aae6c8074/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml

In [126]:
##deleting the service
service.delete()

In [128]:
##delete the compute resource that we created -
cpu_cluster.delete()
