# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [20]:
from azureml.core import Workspace, Dataset
from azureml.core.experiment import Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl import AutoMLConfig

from sklearn.model_selection import StratifiedKFold

## Dataset

### Overview

For this project, the data used is **Mobile Price Classification** ([data source](https://www.kaggle.com/iabhishekofficial/mobile-price-classification?select=train.csv))
from Kaggle website. The description provided in Kaggle is the following one:

```
Bob has started his own mobile company. He wants to give tough fight to big companies like Apple,Samsung etc.

He does not know how to estimate price of mobiles his company creates. In this competitive mobile phone market you cannot simply assume things. To solve this problem he collects sales data of mobile phones of various companies.

Bob wants to find out some relation between features of a mobile phone(eg:- RAM,Internal Memory etc) and its selling price. But he is not so good at Machine Learning. So he needs your help to solve this problem.

In this problem you do not have to predict actual price but a price range indicating how high the price is.
```

We are using the *train.csv* file.

### Task
*TODO*: Explain the task you are going to be solving with this dataset and the features you will be using for it.

As described above, we are using some technical characteristics of mobile phones
to classify their prices between 0 and 3. So that, we have a Multi-Label
Classification Problem.

The features available are the following:

* **battery_power**: Total energy a battery can store in one time measured in mAh.

* **blue**: Has bluetooth or not.

* **clock_speed**: speed at which microprocessor executes instructions.

* **dual_sim**: Has dual sim support or not.

* **fc**: Front Camera mega pixels

* **four_g**: Has 4G or not.

* **int_memory**: Internal Memory in Gigabytes.

* **m_dep**: Mobile Depth in cm.

* **mobile_wt**: Weight of mobile phone.

* **n_cores**: Number of cores of processor.

* **pc**: Primary Camera mega pixels.

* **px_height**: Pixel Resolution Height.

* **px_width**: Pixel Resolution Width.

* **ram**: Random Access Memory in Mega Bytes.

* **sc_h**: Screen Height of mobile in cm.

* **sc_w**: Screen Width of mobile in cm.

* **talk_time**: longest time that a single battery charge will last when you are.

* **three_g**: Has 3G or not.

* **touch_screen**: Has touch screen or not.

* **wifi**: Has wifi or not.

* **price_range**: This is the target variable with value of 0 (low cost), 1 (medium cost), 2 (high cost) and 3 (very high cost).


In this data we have a balanced target for training set, i.e., each class has almost the same representation. This is important because it makes it easier to create a general model using classical.

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [7]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'automl-mobile'
project_folder = './automl-mobile-udacity'

experiment=Experiment(ws, experiment_name)

In the following cell, data is consumed using *Consume* tab in ML Studio Datasets section.

In [27]:
# azureml-core of version 1.0.72 or higher is required
# azureml-dataprep[pandas] of version 1.1.34 or higher is required

dataset = Dataset.get_by_name(ws, name='mobile_prices')
df = dataset.to_pandas_dataframe()
df.head(5)

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1


In [29]:
df.shape

(2000, 21)

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [15]:
cpu_cluster_name='automl-mobiles'

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                            max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)

Creating
Succeeded................................................................................................................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


In [43]:
#skf = StratifiedKFold(n_splits=5, random_state=42, shuffle=True)
#skf.get_n_splits(df, df['price_range'])
#print(list(skf.split(df, df['price_range'])))

#cv = skf.split(df, df['price_range'])
#for idx_train, idx_val in cv:
#    print(len(idx_train), len(idx_val))
#    df['t_{}'.format(i)]
#    df.loc[idx_train, 't_{}'.format(i)]

In [46]:
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 4,
    "primary_metric" : 'accuracy'
}
project_folder = './automl-mobile-udacity'
automl_config = AutoMLConfig(compute_target=cpu_cluster,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="price_range",   
                             path = project_folder,
                             n_cross_validations=5,
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             **automl_settings
                            )

In [47]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [48]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [49]:
remote_run.wait_for_completion(show_output=True)


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.m

{'runId': 'AutoML_de6e6ea8-ed39-42f4-995d-74460746415c',
 'target': 'automl-mobiles',
 'status': 'Completed',
 'startTimeUtc': '2021-03-07T19:17:41.767142Z',
 'endTimeUtc': '2021-03-07T19:48:43.167267Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'automl-mobiles',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"80aca8ea-7efa-451b-a180-25f2a308055d\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"UI/03-07-2021_045241_UTC/train.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-140008\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"6971f5ac-8af1-446e-8034-05acea24681

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [61]:
data_store = ws.get_default_datastore()
data_store.upload(src_dir='.',target_path=project_folder)

Uploading an estimated of 9 files
Uploading ./automl.ipynb.amltmp
Uploaded ./automl.ipynb.amltmp, 1 files out of an estimated total of 9
Uploading ./automl.log
Uploaded ./automl.log, 2 files out of an estimated total of 9
Uploading ./.amlignore
Uploaded ./.amlignore, 3 files out of an estimated total of 9
Uploading ./.amlignore.amltmp
Uploaded ./.amlignore.amltmp, 4 files out of an estimated total of 9
Uploading ./automl.ipynb
Uploaded ./automl.ipynb, 5 files out of an estimated total of 9
Uploading ./automl_errors.log
Uploaded ./automl_errors.log, 6 files out of an estimated total of 9
Uploading ./azureml_automl.log
Uploaded ./azureml_automl.log, 7 files out of an estimated total of 9
Uploading ./config.json
Uploaded ./config.json, 8 files out of an estimated total of 9
Uploading ./.ipynb_aml_checkpoints/automl-checkpoint2021-2-7-17-52-8.ipynb
Uploaded ./.ipynb_aml_checkpoints/automl-checkpoint2021-2-7-17-52-8.ipynb, 9 files out of an estimated total of 9
Uploaded 9 files


$AZUREML_DATAREFERENCE_20c3f8b55e5a47df9f070ed799554a80

In [74]:
# Retrieve and save your best automl model.
# Retrieve the best automl model

best_automl_run = remote_run.get_best_child()
best_automl_run_metrics = best_automl_run.get_metrics()

print('Best Run Id: ', best_automl_run.id)
print('\n Accuracy: ', best_automl_run_metrics['accuracy'])

# Save model
print('\n SAVE MODEL...')
final_automl_model = best_automl_run.register_model(model_name = 'automl-mobile', model_path = '/outputs/model.pkl', description='Best Model AutoML for mobile classification dataset')
print('\n SAVE MODEL...')

Best Run Id:  AutoML_de6e6ea8-ed39-42f4-995d-74460746415c_52

 Accuracy:  0.9469999999999998

 SAVE MODEL...

 SAVE MODEL...


In [76]:
best_automl_run.get_details()

{'runId': 'AutoML_de6e6ea8-ed39-42f4-995d-74460746415c_52',
 'target': 'automl-mobiles',
 'status': 'Completed',
 'startTimeUtc': '2021-03-07T19:46:46.929639Z',
 'endTimeUtc': '2021-03-07T19:48:04.419689Z',
 'properties': {'runTemplate': 'automl_child',
  'pipeline_id': '__AutoML_Ensemble__',
  'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'automl-mobile\',\'compute_target\':\'automl-mobiles\',\'subscription_id\':\'6971f5ac-8af1-446e-8034-05acea24681f\',\'region\':\'southcentralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_de6e6ea8-ed39-42f4-995d-74460746415c_52","experiment_name":"automl-mobile","workspace_name":"quick-starts-ws-140008","subscription_id":"6971f5ac-8af1-446e-80

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service