# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.train.automl import AutoMLConfig

from azureml.widgets import RunDetails

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split


## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'mental-health-classification'

experiment=Experiment(ws, experiment_name)

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code EYPDDGZU8 to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


### Creating the compute

In [3]:
cpu_cluster_name = "cpu-cluster"

   # Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',max_nodes=7)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


### Data Preparation
#### Access the data

In [4]:
# Create a project_folder if it doesn't exist
if not os.path.isdir('data'):
    os.mkdir('data')
    
if not os.path.exists('project_folder'):
    os.makedirs('project_folder')



In [5]:
project_folder="./project_folder/"

In [6]:
df = pd.read_csv("Train.csv")
train_data, valid_data = train_test_split(df, test_size=0.1, random_state=42)
label = "label"

In [7]:
train_data.head()

Unnamed: 0,ID,text,label
56,6T8HVMG3,what can I do to minimize my alcoholism,Alcohol
335,458NTYKE,"Had a problem with my personal looks,feel sad",Depression
439,43COYA4O,What is the benefit of alcohol,Alcohol
292,V0C7LKYN,"I am very nervous and bothered,",Depression
611,BOHSNXCN,What should I do to stop alcoholism?,Alcohol


In [13]:
train=train_data.iloc[:, 1:]
valid=valid_data.iloc[:, 1:]


train.to_csv('data/train.csv', index=False)
valid.to_csv('data/valid.csv', index=False)

In [10]:
train.head()

Unnamed: 0,text,label
56,what can I do to minimize my alcoholism,Alcohol
335,"Had a problem with my personal looks,feel sad",Depression
439,What is the benefit of alcohol,Alcohol
292,"I am very nervous and bothered,",Depression
611,What should I do to stop alcoholism?,Alcohol


In [11]:
datastore=ws.get_default_datastore()

In [14]:
datastore.upload(src_dir="./data", target_path="mental_health_clf", show_progress=True)

Uploading an estimated of 3 files
Target already exists. Skipping upload for mental_health_clf/test.csv
Target already exists. Skipping upload for mental_health_clf/train.csv
Uploading ./data/valid.csv
Uploaded ./data/valid.csv, 1 files out of an estimated total of 1
Uploaded 1 files


$AZUREML_DATAREFERENCE_ddd459687f7246e486be4cee2bb34abe

In [15]:
train_set =TabularDatasetFactory.from_delimited_files(path=datastore.path("mental_health_clf/train.csv"))

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [16]:
# TODO: Put your automl settings here
import time
automl_settings = {
    "name": "AutoML_mental_health".format(time.time()),
    "enable_early_stopping" : True,
    "experiment_timeout_minutes" : 40,
    "iteration_timeout_minutes": 10,
    "n_cross_validations": 5,
    "primary_metric": 'accuracy',
    "max_concurrent_iterations": 10,
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(task="classification",
                             training_data=train_set,
                             label_column_name="label",
                             compute_target=cpu_cluster,
                             debug_log='automl_errors.log',
                             path=project_folder,
                             model_explainability=True,
                             **automl_settings,
                            )

In [17]:
# TODO: Submit your experiment
remote_run = experiment.submit(config=automl_config, show_output=True)

Running on remote.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster
Parent Run ID: AutoML_1e2f15f6-8fc8-4d3d-9050-a36fd6bdb60c

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetBalancing. Performing class balancing sweeping
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       ALERTED
DESCRIPTION:  To decrease model bias, please cancel the current run and fix balancing problem.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData
DETAILS:      Imbalanced data can lead to a falsely perceived positive effect of a model's accuracy because the input data has bias towards one class.
+---------------------------------+---------------------------------+------------

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [18]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [19]:
best_run, fitted_model= remote_run.get_output()

In [20]:
print(best_run)

Run(Experiment: mental-health-classification,
Id: AutoML_1e2f15f6-8fc8-4d3d-9050-a36fd6bdb60c_47,
Type: azureml.scriptrun,
Status: Completed)


In [21]:
print(fitted_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                    min_samples_split=0.2442105263157895,
                                                                                                    min_weight_fraction_leaf=0.0,
                                                                                                    n_estimators=10,
       

In [22]:
fitted_model.named_steps["datatransformer"].get_engineered_feature_names()

['text_CharGramTfIdf_ , ',
 'text_CharGramTfIdf_ ,h',
 'text_CharGramTfIdf_ ,n',
 'text_CharGramTfIdf_ -d',
 'text_CharGramTfIdf_ -h',
 'text_CharGramTfIdf_ a ',
 'text_CharGramTfIdf_ ab',
 'text_CharGramTfIdf_ ac',
 'text_CharGramTfIdf_ ad',
 'text_CharGramTfIdf_ af',
 'text_CharGramTfIdf_ ag',
 'text_CharGramTfIdf_ ah',
 'text_CharGramTfIdf_ al',
 'text_CharGramTfIdf_ am',
 'text_CharGramTfIdf_ an',
 'text_CharGramTfIdf_ ap',
 'text_CharGramTfIdf_ ar',
 'text_CharGramTfIdf_ as',
 'text_CharGramTfIdf_ at',
 'text_CharGramTfIdf_ av',
 'text_CharGramTfIdf_ aw',
 'text_CharGramTfIdf_ ba',
 'text_CharGramTfIdf_ be',
 'text_CharGramTfIdf_ bh',
 'text_CharGramTfIdf_ bo',
 'text_CharGramTfIdf_ br',
 'text_CharGramTfIdf_ bu',
 'text_CharGramTfIdf_ by',
 'text_CharGramTfIdf_ ca',
 'text_CharGramTfIdf_ ce',
 'text_CharGramTfIdf_ ch',
 'text_CharGramTfIdf_ cl',
 'text_CharGramTfIdf_ co',
 'text_CharGramTfIdf_ cr',
 'text_CharGramTfIdf_ cu',
 'text_CharGramTfIdf_ d ',
 'text_CharGramTfIdf_ da',
 

In [23]:
fitted_model.named_steps["datatransformer"].get_featurization_summary()

[{'RawFeatureName': 'text',
  'TypeDetected': 'Text',
  'Dropped': 'No',
  'EngineeredFeatureCount': 5090,
  'Transformations': ['StringCast-CharGramTfIdf', 'StringCast-WordGramTfIdf']}]

In [24]:
fitted_model.steps

[('datatransformer',
  DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                  feature_sweeping_config=None, feature_sweeping_timeout=None,
                  featurization_config=None, force_text_dnn=None,
                  is_cross_validation=None, is_onnx_compatible=None, logger=None,
                  observer=None, task=None, working_dir=None)),
 ('prefittedsoftvotingclassifier',
  PreFittedSoftVotingClassifier(classification_labels=None,
                                estimators=[('17',
                                             Pipeline(memory=None,
                                                      steps=[('sparsenormalizer',
                                                              <azureml.automl.runtime.shared.model_wrappers.SparseNormalizer object at 0x7f6d725f7048>),
                                                             ('xgboostclassifier',
                                                              XGBoostClassifier(base_score=0

In [25]:
test=pd.read_csv("./Test.csv")
test.head()

Unnamed: 0,ID,text
0,02V56KMO,How to overcome bad feelings and emotions
1,03BMGTOK,I feel like giving up in life
2,03LZVFM6,I was so depressed feel like got no strength t...
3,0EPULUM5,I feel so low especially since I had no one to...
4,0GM4C5GD,can i be successful when I am a drug addict?


In [26]:
test.columns

Index(['ID', 'text'], dtype='object')

In [30]:
from sklearn.metrics import accuracy_score

y_pred=fitted_model.predict(valid.iloc[:, :1])

print("Model accuracy: ", accuracy_score(valid['label'], y_pred))

Model accuracy:  0.7096774193548387


In [31]:
y_proba=fitted_model.predict_proba(test[["text"]])

In [32]:
y_proba

Unnamed: 0,Alcohol,Depression,Drugs,Suicide
0,0.09,0.71,0.08,0.12
1,0.05,0.83,0.05,0.08
2,0.05,0.79,0.06,0.10
3,0.09,0.71,0.06,0.14
4,0.13,0.39,0.20,0.28
...,...,...,...,...
304,0.13,0.58,0.13,0.16
305,0.11,0.57,0.10,0.22
306,0.20,0.52,0.12,0.16
307,0.07,0.11,0.72,0.10


In [33]:
sub=pd.DataFrame()

In [34]:
sub["ID"]=test.ID
sub["Depression"]=y_proba.Depression
sub["Alcohol"]=y_proba.Alcohol
sub["Suicide"]=y_proba.Suicide
sub["Drugs"]=y_proba.Drugs

In [35]:
sub.head()

Unnamed: 0,ID,Depression,Alcohol,Suicide,Drugs
0,02V56KMO,0.71,0.09,0.12,0.08
1,03BMGTOK,0.83,0.05,0.08,0.05
2,03LZVFM6,0.79,0.05,0.1,0.06
3,0EPULUM5,0.71,0.09,0.14,0.06
4,0GM4C5GD,0.39,0.13,0.28,0.2


In [36]:
sub.to_csv("./automl_2.csv", index=False)

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [37]:
description="AutoML model. 0.85 training accuracy valid 0.70"
model=remote_run.register_model(description=description,tags={"area": "mental_health", "type":"classification"})

In [38]:
print(remote_run.model_id)

AutoML1e2f15f6847


Inference Config

In [40]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig

env_name="Inference Environment"

env = Environment.get(ws, "AzureML-Minimal").clone(env_name)

for pip_package in ["scikit-learn"]:
    env.python.conda_dependencies.add_pip_package(pip_package)

inference_config = InferenceConfig(entry_script='path-to-score.py',
                                    environment=env)

ERROR:azureml._model_management._util:entry_script path-to-score.py doesn't exist. entry_script should be path relative to current working directory



WebserviceException: WebserviceException:
	Message: entry_script path-to-score.py doesn't exist. entry_script should be path relative to current working directory
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "entry_script path-to-score.py doesn't exist. entry_script should be path relative to current working directory"
    }
}

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service