# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
from azureml.core.workspace import Workspace
from azureml.core.datastore import Datastore
from azureml.core.compute import ComputeTarget
from azureml.core.compute.amlcompute import AmlCompute
from azureml.exceptions import ComputeTargetException
from azureml.core.experiment import Experiment
from azureml.core.run import Run
from azureml.core.dataset import Dataset
from azureml.core.model import Model

from azureml.core import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice


from azureml.core.webservice import Webservice
from azureml.core.authentication import InteractiveLoginAuthentication

import pandas as pd

from azureml.pipeline.core.pipeline import Pipeline
from azureml.pipeline.core import PipelineData
from azureml.pipeline.core import TrainingOutput
from azureml.pipeline.core.run import PipelineRun
from azureml.pipeline.steps.automl_step import AutoMLStep

from azureml.train.automl.automlconfig import AutoMLConfig
from azureml.data import TabularDataset
from azureml.widgets.run_details import RunDetails

from azureml.automl.core.shared import constants

import json
import pickle
import requests

from pprint import pprint

import logging
import joblib

from train import clean_data, get_dataset
import capstone_constants as c_constants



In [None]:
TABULAR_BREAST_CANCER_DATA_URI = 'https://github.com/dntrply/nd00333-capstone/blob/master/dataset/Breast_cancer_data.csv'

## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.

Dat:[ Breast Cancer Prediction Dataset](https://www.kaggle.com/merishnasuwal/breast-cancer-prediction-dataset)

This machine learning program detects the presence (or absence) of breast cancer from pertinent data regarding physical characteristics.
An understanding of the data can be had at https://www.kaggle.com/merishnasuwal/breast-cancer-prediction-dataset/discussion/66975#509394


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

The dataset is external. It is manually downloaded as a csv and then uploaded to a publicly acccessible github account:
'https://github.com/dntrply/nd00333-capstone/blob/master/dataset/Breast_cancer_data.csv'

In [None]:
ds = TabularDatasetFactory.from_delimited_files(https://github.com/dntrply/nd00333-capstone/blob/master/dataset/Breast_cancer_data.csv)
df = ds.to_pandas_dataframe()
df

In [None]:
df.describe()

In [None]:
# Split the dtaaset so that a small fraction may be used for prediction
train_ds, _ = ds.random_split(percentage=99, seed=42)

In [None]:
# Next, let's use if it exists, or create if required, a compute cluster to be used by the ML

# Access the compute cluster. If it exists, we will have the compute object. 
# If it does not exist, an exception will be thrown upon which the compute cluster is created
try:
    cc = ComputeTarget(workspace=ws, name='COMPUTE-CLUSTER-AUTOML')
except ComputeTargetException:
    # Failed to obtain the compute cluster object
    # In all likelihood, a compute cluster of that name has not been created
    # Attempt to create the compute cluster
    # First set up the configuration

    # Specify the configuration of the compute cluster
    cc_cfg = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_v2', min_nodes=1, max_nodes=6)
    cc = ComputeTarget.create(workspace=ws, name='COMPUTE-CLUSTER-AUTOML', provisioning_configuration=cc_cfg)

# At this point - we have access to the compute cluster object. Wait for the compute target to complete provisioing
cc.wait_for_completion(show_output='True')

In [None]:
ws = Workspace.from_config()

# choose a name for experiment
experiment=Experiment(ws, 'experiment-capstone-automl')  // Experiment name in Azure ML

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

This project is a classification issue. More so, it is a binary classification issue as teh outcome is whether the wine is of a good quality or not.

AUC_weighted is an apporpriate metric to target for a binary classification.
[Set up AutoML training with Python](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train)

It is generally recommended to enable early stopping as it is possible that after a while no further improvement in the model is feasible.

There is enrally limited to no benefit to using a large number of cross validations. In this instance, we have set it to 3.

In [None]:
# TODO: Put your automl settings here

automl_settings = {
    "iterations" : 20,
    "experiment_timeout_minutes" : 30,
    "enable_early_stopping" : True,
    "iteration_timeout_minutes" : 5,
    "max_concurrent_iterations" : 5,
    "max_cores_per_iteration" : -1,
    "n_cross_validations" : 3,
    "primary_metric" : 'AUC_weighted',
    "verbosity" : logging.INFO,
}

# Provide the remainder of the settings/configuration
# Note that we are not providing a validation data set
# 


# TODO: Put your automl config here
automl_config = AutoMLConfig(
    compute_target = cc,
    task='classification',
    training_data=train_ds,
    label_column_name='diagnosis',
    featurization='auto',
    model_explainability=True,
    debug_log='capstone_automl.log',
    **automl_settings)

In [None]:
# TODO: Submit your experiment
automl_run = experiment.submit(automl_config)

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [None]:
RunDetails(automl_run).show()

In [None]:
automl_run.wait_for_completion(show_output=True)

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [None]:
def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0]+ ' - ')
        elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):
            print("\nMeta Learner")
            pprint(step[1]._meta_learner)
            print()
            for estimator in step[1]._base_learners:
                print_model(estimator[1], estimator[0]+ ' - ')
        else:
            pprint(step[1].get_params())
            print()

In [None]:
automl_best_run, automl_best_model = automl_run.get_output()

automl_best_run_metrics = automl_best_run.get_metrics()

print(f'********** Best AutoML accuracy: {automl_best_run_metrics.get("accuracy")}')
print(f'********** printing Best AutoML run:\n{automl_best_run}\n\nPrinting model:')

print_model(automl_best_model)

In [None]:
print(automl_run.get_metrics())

In [None]:
# Create the outputs directory
if 'outputs' not in os.listdir():
    os.mkdir('outputs'))

In [None]:
#TODO: Save the best model
joblib.dump(automl_best_model, os.path.join('outputs','best_automl.pkl'))

In [None]:
# download the scoring file and the environmrnt file

automl_best_run.download_file(constants.SCORING_FILE_PATH, os.path.join('outputs', 'scoring.py'))
automl_best_run.download_file(constants.CONDA_ENV_FILE_PATH, os.path.join('outputs', 'best_run_environment.yml'))

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [None]:
# Refer - https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python

# Tutorial: Deploy an image classification model in Azure Container Instances -
# https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-deploy-models-with-aml

# Register the model
# registered_model = automl_run.register_model(model_name='wine-taste-automl', description=c_constants.DEPLOYED_AUTOML_MODEL_DESCRIPTION)
registered_model = automl_best_run.register_model(model_path=constants.MODEL_PATH, 
                                                model_name='breast-cancer-automl', 
                                                description='Breast Cancer detection using Azure AutoML',
                                                tags={'Method of execution':'AutoML'},
                                                properties={'Accuracy':automl_best_run_metrics['accuracy']})
print(f'{automl_run.model_id}')
print(f'{registered_model.name}  {registered_model.id}  {registered_model.version}')


In [None]:
# Anytime as necessary, access the registered model
retrieved_model = Model(workspace=ws, name='breast-cancer-automl')

In [None]:
# Create an inference config

inference_config = InferenceConfig(
    environment=Environment.from_conda_specification(name='myenv', file_path=os.path.join('outputs', 'best_run_env.yml')),
    source_directory='outputs',
    entry_script='best_run_environment.yml',
)

aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)


In [None]:

service = Model.deploy(workspace=ws,
                       name='breast-cancer-service',
                       models=[retrieved_model],
                       inference_config=inference_config,
                       deployment_config=aci_config,
                       overwrite=True)
service.wait_for_deployment(show_output=True)

In [None]:
logs = service.get_logs()

for line in logs.split('\n'):
    print(line)


TODO: In the cell below, send a request to the web service you deployed to test it.

In [None]:
# To enable ApplicationInsights on the service (webservice), 
# * first access the endpoint using the name assigned at the time of deployment
# * next update webservice parameters such as enabling application insights (enable_app_insights)

webservice = Webservice(
    workspace = ws,
    name='breast-cancer-service'
)

webservice.update(
    enable_app_insights=True
)

# At this point application insights (logging is enabled) and can be
# checked in the GUI in AutoML studio

In [None]:
# URL for the web service, should be similar to:
# 'http://8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io/score'

# From the tail end of the code at
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python
# - Deploy machine learning models to Azure


scoring_uri = webservice.scoring_uri

# If the service is authenticated, set the key or token
key, _ = webservice.get_keys()

# Set the appropriate headers
headers = {"Content-Type": "application/json"}
headers["Authorization"] = f"Bearer {key}"

# retrieve the data for predictions
all_ds = TabularDatasetFactory.from_delimited_files(https://github.com/dntrply/nd00333-capstone/blob/master/dataset/Breast_cancer_data.csv)
)
_, predict_ds = all_ds.random_split(percentage=99, seed=42)

predict_data = predict_ds.to_pandas_dataframe()
predict_label = predict_data.pop('diagnosis')


# Convert to JSON string
tstdatahomic2018 = json.dumps({'data': tsthomic2018.to_dict(orient='records')})

score_data = json.dumps({'data': predict_data.to_dict(orient='records')})

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'

# Make the request and display the predictions
resp = requests.post(scoring_uri, score_data, headers=headers)
print(f'{resp.json()}')

# Print the actual diagnosis
print(f'{predict_label}')

TODO: In the cell below, print the logs of the web service and delete the service

In [None]:
logs = webservice.get_logs()

for line in logs.split('\n'):
    print(line)



In [None]:
# Clean up any resources
# Delete the Webservice
# delete the compute cluster

webservice.delete()
cc.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
