# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [10]:
from azureml.core import Workspace, Experiment, Dataset, Environment, Webservice
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.core.model import InferenceConfig,Model
from azureml.core.webservice import AciWebservice

from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig

import requests
import json

In [11]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'Medical-Insurance-Premium-Prediction'

experiment=Experiment(ws, experiment_name)

In [12]:
try:
    compute_cluster = ComputeTarget(ws,"ml-clstr")
except:
    compute_cluster_config = AmlCompute.provisioning_configuration(
        vm_size="Standard_DS12_V2",
        vm_priority = 'lowpriority',
        min_nodes=0,
        max_nodes=4,
        idle_seconds_before_scaledown=240
        )
    compute_cluster = ComputeTarget.create(ws,"ml-clstr",compute_cluster_config)

In [13]:
dataset = Dataset.get_by_name(ws, name='medical_insurance')

In [19]:
df = dataset.to_pandas_dataframe()

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

### Automl Prameters
1. `n_cross_validation`: Number of cross validation required to run during training the model.
2. `primary_metric`: Using r2_score for regression ML problem.
3. `experiment_timeout_hours`: Maximum amount of time experiment run, including all child runs,, before it is terminated.
4. `max_concurrent_iterations`: Maximum number of iterations allowed to run in parallel.
5. `task`: Set to `regression` as we are predicting prices which is a continuous value.
6. `compute_target`: It is the name of cluster used to train the model. 
7. `training_data`: Denotes the medical insurance dataset used to train the model. 
8. `label_column_name`: Denotes the target column in our dataset.

In [15]:
# TODO: Put your automl settings here
automl_settings = {
    'n_cross_validations':4,
    'primary_metric':'normalized_root_mean_squared_error',
    'experiment_timeout_hours':1,
    "max_concurrent_iterations":4
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(
    task='regression',
    compute_target=compute_cluster,
    training_data=dataset,
    label_column_name='charges',
    **automl_settings
)

In [16]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
Medical-Insurance-Premium-Prediction,AutoML_a922a712-2a99-47cf-919b-0cb27f2dea0a,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [17]:
RunDetails(remote_run).show()

remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
Medical-Insurance-Premium-Prediction,AutoML_a922a712-2a99-47cf-919b-0cb27f2dea0a,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more about high cardinality feature handling: https://aka.ms/AutomatedMLFeaturization

********************************************

{'runId': 'AutoML_a922a712-2a99-47cf-919b-0cb27f2dea0a',
 'target': 'ml-clstr',
 'status': 'Completed',
 'startTimeUtc': '2022-01-02T09:32:05.059338Z',
 'endTimeUtc': '2022-01-02T09:52:03.956938Z',
 'services': {},
   'message': 'No scores improved over last 20 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'normalized_root_mean_squared_error',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '4',
  'target': 'ml-clstr',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"4fb420df-9866-49df-808b-f6898a80d86e\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'regression',
  'dependencies_versions': '{"a

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [20]:
automl_best_run, automl_best_model = remote_run.get_output()
for key,val in automl_best_run.properties.items():
    print(f"{key}: {val}\n")

runTemplate: automl_child

pipeline_id: __AutoML_Stack_Ensemble__

pipeline_spec: {"pipeline_id":"__AutoML_Stack_Ensemble__","objects":[{"module":"azureml.train.automl.stack_ensemble","class_name":"StackEnsemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{'task_type':'regression','primary_metric':'normalized_root_mean_squared_error','verbosity':20,'ensemble_iterations':15,'is_timeseries':False,'name':'Medical-Insurance-Premium-Prediction','compute_target':'ml-clstr','subscription_id':'e8e8da26-0fa0-48b5-a094-ad8115fd47b3','region':'eastus2','spark_service':None}","ensemble_run_id":"AutoML_a922a712-2a99-47cf-919b-0cb27f2dea0a_37","experiment_name":"Medical-Insurance-Premium-Prediction","workspace_name":"workspaceone","subscription_id":"e8e8da26-0fa0-48b5-a094-ad8115fd47b3","resource_group_name":"udacityamlnd"}}]}

training_percent: 100

predicted_cost: None

iteration: 37

_aml_system_scenario_identification: Remote.Child

_azureml.ComputeTargetType: amlc

In [21]:
print(automl_best_model._final_estimator)

StackEnsembleRegressor(
    base_learners=[('1', Pipeline(
        memory=None,
        steps=[('maxabsscaler', MaxAbsScaler(
            copy=True
        )), ('xgboostregressor', XGBoostRegressor(
            random_state=0,
            n_jobs=1,
            problem_info=ProblemInfo(
                gpu_training_param_dict={'processing_unit_type': 'cpu'}
            ),
            tree_method='auto'
        ))],
        verbose=False
    )), ('30', Pipeline(
        memory=None,
        steps=[('standardscalerwrapper', StandardScalerWrapper(
            copy=True,
            with_mean=False,
            with_std=False
        )), ('xgboostregressor', XGBoostRegressor(
            random_state=0,
            n_jobs=1,
            problem_info=ProblemInfo(
                gpu_training_param_dict={'processing_unit_type': 'cpu'}
            ),
            booster='gbtree',
            colsample_bytree=0.9,
            eta=0.5,
            gamma=0.01,
            max_depth=8,
           

In [22]:
automl_best_model_name = automl_best_run.properties['model_name']
automl_best_model_name

'AutoMLa922a712237'

In [23]:
#TODO: Save the best model
model = automl_best_run.register_model(
    model_name=automl_best_model_name,
    description='AutoML Best Model For Premium Prediction',
    model_path = 'outputs/model.pkl'
    )

In [24]:
automl_best_run.download_file('outputs/scoring_file_v_2_0_0.py','script.py')

In [30]:
automl_best_run.download_file('outputs/conda_env_v_1_0_0.yml','env.yml')

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [25]:
env = automl_best_run.get_environment()
automl_inf_conf = InferenceConfig(
    environment=env,
    entry_script="./script.py",
)

automl_deployment_config = AciWebservice.deploy_configuration(
    cpu_cores=2, memory_gb=4, auth_enabled=True
)

service = Model.deploy(ws,"udacity-capstone-automl-deploy",[model],automl_inf_conf,automl_deployment_config)
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-01-02 11:06:08+00:00 Creating Container Registry if not exists.
2022-01-02 11:06:08+00:00 Registering the environment.
2022-01-02 11:06:09+00:00 Use the existing image.
2022-01-02 11:06:09+00:00 Generating deployment configuration.
2022-01-02 11:06:10+00:00 Submitting deployment to compute.
2022-01-02 11:06:13+00:00 Checking the status of deployment udacity-capstone-automl-deploy..
2022-01-02 11:08:44+00:00 Checking the status of inference endpoint udacity-capstone-automl-deploy.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [26]:
deployed_service= Webservice(ws,'udacity-capstone-automl-deploy')

TODO: In the cell below, send a request to the web service you deployed to test it.

In [27]:
scoring_uri = deployed_service.scoring_uri

0,_ = deployed_service.get_keys()

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'
headers


{'Content-Type': 'application/json',
 'Authorization': 'Bearer iRVwakuRXWmGa3jQBlV5UCZmJPeIVwZH'}

In [28]:
data = {
    "Inputs": {
        "data":[
                {
                    'age': 20,
                    'sex': "female",
                    'bmi': 33.770,
                    'children': 1,
                    'smoker': "no",
                    'region': "southeast",
                },
                {
                    'age': 30,
                    'sex': "male",
                    'bmi': 22.700,
                    'children': 0,
                    'smoker': "no",
                    'region': "northwest",
                }
            ],
    },
    "GlobalParameters": {
    }
}
data = json.dumps(data)
data

'{"Inputs": {"data": [{"age": 20, "sex": "female", "bmi": 33.77, "children": 1, "smoker": "no", "region": "southeast"}, {"age": 30, "sex": "male", "bmi": 22.7, "children": 0, "smoker": "no", "region": "northwest"}]}, "GlobalParameters": {}}'

In [29]:
res = requests.post(scoring_uri, data=data, headers=headers)
res.text

'{"Results": [6532.612979776796, 5467.61439779567]}'

## TODO: In the cell below, print the logs of the web service and delete the service

In [31]:
print(deployed_service.get_logs())



2022-01-02T11:08:31,581948200+00:00 - gunicorn/run 
Dynamic Python package installation is disabled.
Starting HTTP server
2022-01-02T11:08:31,585528200+00:00 - iot-server/run 
2022-01-02T11:08:31,582336800+00:00 - rsyslog/run 
2022-01-02T11:08:31,654670500+00:00 - nginx/run 
rsyslogd: /azureml-envs/azureml_84c85d362f11658b9008714e1aa4657b/lib/libuuid.so.1: no version information available (required by rsyslogd)
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-01-02T11:08:31,875339800+00:00 - iot-server/finish 1 0
2022-01-02T11:08:31,879519700+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 20.1.0
Listening at: http://127.0.0.1:31311 (80)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 108
SPARK_HOME not set. Skipping PySpark Initialization.
Generating new fontManager, this may take some time...
Initializing logger
2022-01-02 11:08:34,072 | root | INFO | Starting up app insights client
logging socket was

In [32]:
deployed_service.delete()

In [33]:
model.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
