# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
from sklearn.linear_model import LogisticRegression
import argparse
import os
import numpy as np
from sklearn.metrics import mean_squared_error
import joblib
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
from azureml.core.run import Run
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.train.automl import AutoMLConfig
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.model import InferenceConfig

## Dataset

### Overview

In this project, we are going to classify the progressive muscle weakness and disease with the use of SKLearn Classifier and AutoML.

The dataset is custom data curated from one of the health institution.

The dataset contains 3 categories:
1. Control
    
    Indicates whether muscle weakness is in control or not.
    
    
2. SPG4
    
    **S pastic paraplegia 4 (SPG4)** is the most common type of hereditary spastic paraplegia (HSP) inherited in an autosomal dominant manner. Disease onset ranges from infancy to older adulthood. SPG4 is characterized by slowly progressive muscle weakness and spasticity (stiff or rigid muscles) in the lower half of the body. In rare cases, individuals may have a more complex form with seizures, ataxia, and dementia. SPG4 is caused by mutations in the SPAST gene. Severity of symptoms usually worsens over time, however some individuals remain mildly affected throughout their lives. Medications, such as antispastic drugs and physical therapy may aid in stretching spastic muscles and preventing contractures (fixed tightening of muscles) 
    
    Source : https://rarediseases.info.nih.gov/diseases/4925/spastic-paraplegia-4
    
    
3. Disease
   
   Indicates that the muscle weakness is there due to some desease.


In [None]:
path = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv"

ds = TabularDatasetFactory.from_delimited_files(path,
                                                validate=True,
                                                include_path=False,
                                                infer_column_types=True,
                                                separator=',',
                                                header=True,
                                                support_multi_line=False,
                                                empty_as_string=False)


In [None]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'mwd-automl'

experiment=Experiment(ws, experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

In [None]:

try:
    cpu_cluster = ComputeTarget(workspace=ws, name="mwd-compute")
    print('Found existing cluster, use it.')
except:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws,"mwd-compute", compute_config)

## AutoML Configuration


1. experiment_timeout_minutes : To stop the experiment by giving duration in minutes.
2. primary_metric : This is the the target on which our Auto ML config tries to improve.
3. enable_early_stopping: We have used this config to stop the experiment, if the primary_metric not improving.
4. enable_voting_ensemble: We don't want to use VotingClassifier as it a combined result of multiple classifier.
5. enable_stack_ensemble: We also disabled this option as it is stacked resilt of multiple classifier.

In [1]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 30,
    "primary_metric": 'accuracy',
    "n_cross_validations": 5,
    "max_concurrent_iterations": 4,
    "enable_early_stopping": True,
    "enable_voting_ensemble": False,
    "enable_stack_ensemble": False
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(task="classification",
                            compute_target=cpu_cluster,
                            training_data=ds,
                            label_column_name="Output",
                            **automl_settings)


NameError: name 'AutoMLConfig' is not defined

In [None]:
auto_ml_run = experiment.submit(automl_config)

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

In [None]:
RunDetails(remote_run).show()
auto_ml_run.wait_for_completion(show_output=True)

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [None]:
# Retrieve and save your best automl model.

best_run, best_model = auto_ml_run.get_output()

best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)


In [None]:
best_model.steps

In [None]:
best_run.download_file("/outputs/model.pkl", "automl_job_classification-best-model.pkl")

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [None]:
model = best_run.register_model(experiment_name+"-model", model_path="outputs/model.pkl")


In [None]:
best_automl_run.download_file('outputs/model.pkl', './model.pkl')

In [None]:
best_automl_run.download_file('outputs/scoring_file_v_1_0_0.py', './score.py')

In [None]:
inference_config = InferenceConfig(entry_script='./score.py')

service = Model.deploy(ws, experiment_name+"-service", [model], inference_config, overwrite=True)
service.wait_for_deployment(show_output = True)
print(service.state)

TODO: In the cell below, send a request to the web service you deployed to test it.

In [None]:
service.run({
  'Phenotype3': 1526.0,
  'Phenotype12': 0.48177299999999995,
  'Phenotype9': 0.09160499999999999,
  'Phenotype14': 0.632513,
  'Phenotype20': 0.938418,
  'Phenotype25': 0.0590057,
  'Phenotype7': 0.0928148,
  'Phenotype27': 0.0203235,
  'Phenotype6': 0.16384300000000002,
  'Phenotype13': 0.36327,
  'Phenotype16': 14.8264,
  'Phenotype5': 0.06250259999999999,
  'Phenotype21': 0.9901559999999999,
  'Phenotype2': 340.452,
  'Phenotype17': 0.813646,
  'Phenotype23': 1.06998,
  'Phenotype18': 15.3981,
  'Phenotype1': 1540.12,
  'Phenotype26': 0.0134209,
  'Phenotype19': 0.404702,
  'Phenotype15': 2.04665,
  'Phenotype4': 0.0825003,
  'Phenotype10': 0.23433099999999998,
  'Phenotype8': 0.14336400000000002,
  'Phenotype22': 1.07428,
  'Phenotype11': 0.147474,
  'Phenotype24': 1.001
})

TODO: In the cell below, print the logs of the web service and delete the service

In [None]:
service.update(enable_app_insights=True)
service.get_logs()

In [None]:
service.delete()