# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Dataset
from azureml.core.experiment import Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl import AutoMLConfig

from sklearn.model_selection import StratifiedKFold

## Dataset

### Overview

For this project, the data used is **Mobile Price Classification** ([data source](https://www.kaggle.com/iabhishekofficial/mobile-price-classification?select=train.csv))
from Kaggle website. The description provided in Kaggle is the following one:

```
Bob has started his own mobile company. He wants to give tough fight to big companies like Apple,Samsung etc.

He does not know how to estimate price of mobiles his company creates. In this competitive mobile phone market you cannot simply assume things. To solve this problem he collects sales data of mobile phones of various companies.

Bob wants to find out some relation between features of a mobile phone(eg:- RAM,Internal Memory etc) and its selling price. But he is not so good at Machine Learning. So he needs your help to solve this problem.

In this problem you do not have to predict actual price but a price range indicating how high the price is.
```

We are using the *train.csv* file.

### Task
*TODO*: Explain the task you are going to be solving with this dataset and the features you will be using for it.

As described above, we are using some technical characteristics of mobile phones
to classify their prices between 0 and 3. So that, we have a Multi-Label
Classification Problem.

The features available are the following:

* **battery_power**: Total energy a battery can store in one time measured in mAh.

* **blue**: Has bluetooth or not.

* **clock_speed**: speed at which microprocessor executes instructions.

* **dual_sim**: Has dual sim support or not.

* **fc**: Front Camera mega pixels

* **four_g**: Has 4G or not.

* **int_memory**: Internal Memory in Gigabytes.

* **m_dep**: Mobile Depth in cm.

* **mobile_wt**: Weight of mobile phone.

* **n_cores**: Number of cores of processor.

* **pc**: Primary Camera mega pixels.

* **px_height**: Pixel Resolution Height.

* **px_width**: Pixel Resolution Width.

* **ram**: Random Access Memory in Mega Bytes.

* **sc_h**: Screen Height of mobile in cm.

* **sc_w**: Screen Width of mobile in cm.

* **talk_time**: longest time that a single battery charge will last when you are.

* **three_g**: Has 3G or not.

* **touch_screen**: Has touch screen or not.

* **wifi**: Has wifi or not.

* **price_range**: This is the target variable with value of 0 (low cost), 1 (medium cost), 2 (high cost) and 3 (very high cost).


In this data we have a balanced target for training set, i.e., each class has almost the same representation. This is important because it makes it easier to create a general model using classical.

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'automl-mobile'
project_folder = './automl-mobile-udacity'

experiment=Experiment(ws, experiment_name)

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FW8VZDG54 to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


In the following cell, data is consumed using *Consume* tab in ML Studio Datasets section.

In [3]:
# azureml-core of version 1.0.72 or higher is required
# azureml-dataprep[pandas] of version 1.1.34 or higher is required

dataset = Dataset.get_by_name(ws, name='mobile_prices')
df = dataset.to_pandas_dataframe()
df.head(5)

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,...,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi,price_range
0,842,0,2.2,0,1,0,7,0.6,188,2,...,20,756,2549,9,7,19,0,0,1,1
1,1021,1,0.5,1,0,1,53,0.7,136,3,...,905,1988,2631,17,3,7,1,1,0,2
2,563,1,0.5,1,2,1,41,0.9,145,5,...,1263,1716,2603,11,2,9,1,1,0,2
3,615,1,2.5,0,0,0,10,0.8,131,6,...,1216,1786,2769,16,8,11,1,0,0,2
4,1821,1,1.2,0,13,1,44,0.6,141,2,...,1208,1212,1411,8,2,15,1,1,0,1


In [4]:
df.shape

(2000, 21)

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [5]:
cpu_cluster_name='automl-mobiles'

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                            max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)

Creating....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded.....................................................................................................................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


In [6]:
#skf = StratifiedKFold(n_splits=5, random_state=42, shuffle=True)
#skf.get_n_splits(df, df['price_range'])
#print(list(skf.split(df, df['price_range'])))

#cv = skf.split(df, df['price_range'])
#for idx_train, idx_val in cv:
#    print(len(idx_train), len(idx_val))
#    df['t_{}'.format(i)]
#    df.loc[idx_train, 't_{}'.format(i)]

In [7]:
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 4,
    "primary_metric" : 'accuracy'
}
project_folder = './automl-mobile-udacity'
automl_config = AutoMLConfig(compute_target=cpu_cluster,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="price_range",   
                             path = project_folder,
                             n_cross_validations=5,
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             **automl_settings
                            )

In [8]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [9]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [10]:
remote_run.wait_for_completion(show_output=True)


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

******************************************************************

{'runId': 'AutoML_1ef788bd-c529-476a-b087-01fb3a2ac202',
 'target': 'automl-mobiles',
 'status': 'Completed',
 'startTimeUtc': '2021-03-20T16:42:16.907586Z',
 'endTimeUtc': '2021-03-20T17:09:53.941108Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'automl-mobiles',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"f36cfd6e-1cb7-490f-abfc-4f6453a14881\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"UI/03-20-2021_042907_UTC/train.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-140972\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"a0a76bad-11a1-4a2d-9887-97a29122c8e

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [11]:
data_store = ws.get_default_datastore()
data_store.upload(src_dir='.',target_path=project_folder)

Uploading an estimated of 10 files
Uploading ./.amlignore
Uploaded ./.amlignore, 1 files out of an estimated total of 10
Uploading ./.amlignore.amltmp
Uploaded ./.amlignore.amltmp, 2 files out of an estimated total of 10
Uploading ./automl.ipynb
Uploaded ./automl.ipynb, 3 files out of an estimated total of 10
Uploading ./automl.ipynb.amltmp
Uploaded ./automl.ipynb.amltmp, 4 files out of an estimated total of 10
Uploading ./automl.log
Uploaded ./automl.log, 5 files out of an estimated total of 10
Uploading ./automl_errors.log
Uploaded ./automl_errors.log, 6 files out of an estimated total of 10
Uploading ./azureml_automl.log
Uploaded ./azureml_automl.log, 7 files out of an estimated total of 10
Uploading ./hyperparameter_tuning.ipynb
Uploaded ./hyperparameter_tuning.ipynb, 8 files out of an estimated total of 10
Uploading ./.ipynb_aml_checkpoints/automl-checkpoint2021-2-20-16-26-41.ipynb
Uploaded ./.ipynb_aml_checkpoints/automl-checkpoint2021-2-20-16-26-41.ipynb, 9 files out of an estim

$AZUREML_DATAREFERENCE_47b1e8f29796492182824c7303cb788c

In [36]:
# Retrieve and save your best automl model.
# Retrieve the best automl model

best_automl_run = remote_run.get_best_child()
best_automl_run_metrics = best_automl_run.get_metrics()

print('Best Run Id: ', best_automl_run.id)
print('\n Accuracy: ', best_automl_run_metrics['accuracy'])

# Save model
print('\n SAVE MODEL...')
final_automl_model = best_automl_run.register_model(model_name = 'automl-mobile', model_path = '/outputs/model.pkl', description='Best Model AutoML for mobile classification dataset')
print('\n SAVE MODEL...')

Best Run Id:  AutoML_1ef788bd-c529-476a-b087-01fb3a2ac202_44

 Accuracy:  0.9440000000000002

 SAVE MODEL...

 SAVE MODEL...


In [13]:
best_automl_run.get_details()

{'runId': 'AutoML_1ef788bd-c529-476a-b087-01fb3a2ac202_44',
 'target': 'automl-mobiles',
 'status': 'Completed',
 'startTimeUtc': '2021-03-20T17:08:01.489029Z',
 'endTimeUtc': '2021-03-20T17:09:25.413054Z',
 'properties': {'runTemplate': 'automl_child',
  'pipeline_id': '__AutoML_Ensemble__',
  'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'automl-mobile\',\'compute_target\':\'automl-mobiles\',\'subscription_id\':\'a0a76bad-11a1-4a2d-9887-97a29122c8ed\',\'region\':\'southcentralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_1ef788bd-c529-476a-b087-01fb3a2ac202_44","experiment_name":"automl-mobile","workspace_name":"quick-starts-ws-140972","subscription_id":"a0a76bad-11a1-4a2d-98

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [38]:
model

Model(workspace=Workspace.create(name='quick-starts-ws-140972', subscription_id='a0a76bad-11a1-4a2d-9887-97a29122c8ed', resource_group='aml-quickstarts-140972'), name=automl-mobile, id=automl-mobile:2, version=2, tags={}, properties={})

In [40]:
print(os.getenv('AZUREML_MODEL_DIR'))

None


In [42]:
%%time
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core import Workspace
from azureml.core.model import Model

# ws = Workspace.from_config()
model = Model(ws, 'automl-mobile')
inference_config = InferenceConfig(entry_script="scoring_file_v_1_0_0.py")#, environment=myenv)
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               description='Predict mobile prices')
service = Model.deploy(workspace=ws, 
                       name='automl-mobile-sdk-2', 
                       models=[model], 
                       inference_config=inference_config, 
                       deployment_config=aciconfig)

service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-03-20 19:09:44+00:00 Creating Container Registry if not exists.
2021-03-20 19:09:45+00:00 Use the existing image.
2021-03-20 19:09:45+00:00 Generating deployment configuration.
2021-03-20 19:09:49+00:00 Submitting deployment to compute..
2021-03-20 19:09:51+00:00 Checking the status of deployment automl-mobile-sdk-2..
2021-03-20 19:13:54+00:00 Checking the status of inference endpoint automl-mobile-sdk-2.
Succeeded
ACI service creation operation finished, operation "Succeeded"
CPU times: user 2.74 s, sys: 122 ms, total: 2.86 s
Wall time: 4min 33s


TODO: In the cell below, send a request to the web service you deployed to test it.

In [24]:
import pandas as pd
df_test = pd.read_csv('test.csv')
df_test.drop(columns='id', inplace=True)
df_test.head()

Unnamed: 0,battery_power,blue,clock_speed,dual_sim,fc,four_g,int_memory,m_dep,mobile_wt,n_cores,pc,px_height,px_width,ram,sc_h,sc_w,talk_time,three_g,touch_screen,wifi
0,1043,1,1.8,1,14,0,5,0.1,193,3,16,226,1412,3476,12,7,2,0,1,0
1,841,1,0.5,1,4,1,61,0.8,191,5,12,746,857,3895,6,0,7,1,0,0
2,1807,1,2.8,0,1,0,27,0.9,186,3,4,1270,1366,2396,17,10,10,0,1,1
3,1546,0,0.5,1,18,1,25,0.5,96,8,20,295,1752,3893,10,0,7,1,1,0
4,1434,0,1.4,0,11,1,49,0.5,108,6,18,749,810,1773,15,8,7,1,0,1


In [45]:
sample = list(df_test.iloc[0:10, :].to_dict('index').values())
sample

[{'battery_power': 1043,
  'blue': 1,
  'clock_speed': 1.8,
  'dual_sim': 1,
  'fc': 14,
  'four_g': 0,
  'int_memory': 5,
  'm_dep': 0.1,
  'mobile_wt': 193,
  'n_cores': 3,
  'pc': 16,
  'px_height': 226,
  'px_width': 1412,
  'ram': 3476,
  'sc_h': 12,
  'sc_w': 7,
  'talk_time': 2,
  'three_g': 0,
  'touch_screen': 1,
  'wifi': 0},
 {'battery_power': 841,
  'blue': 1,
  'clock_speed': 0.5,
  'dual_sim': 1,
  'fc': 4,
  'four_g': 1,
  'int_memory': 61,
  'm_dep': 0.8,
  'mobile_wt': 191,
  'n_cores': 5,
  'pc': 12,
  'px_height': 746,
  'px_width': 857,
  'ram': 3895,
  'sc_h': 6,
  'sc_w': 0,
  'talk_time': 7,
  'three_g': 1,
  'touch_screen': 0,
  'wifi': 0},
 {'battery_power': 1807,
  'blue': 1,
  'clock_speed': 2.8,
  'dual_sim': 0,
  'fc': 1,
  'four_g': 0,
  'int_memory': 27,
  'm_dep': 0.9,
  'mobile_wt': 186,
  'n_cores': 3,
  'pc': 4,
  'px_height': 1270,
  'px_width': 1366,
  'ram': 2396,
  'sc_h': 17,
  'sc_w': 10,
  'talk_time': 10,
  'three_g': 0,
  'touch_screen': 1,
 

In [46]:
import requests
import json

# URL for the web service, should be similar to:
# 'http://8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io/score'
scoring_uri = 'http://8973fcc6-b528-4f64-87ec-8cc7c8384ddc.southcentralus.azurecontainer.io/score'
# If the service is authenticated, set the key or token
# key = 'zcdL9IVlIn5Gb6yCEAZ0NrBapBkOQvbw'

# Two sets of data to score, so we get two results back
data = {"data": sample}

# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
# headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.json())

{"result": [3, 3, 2, 3, 1, 3, 3, 1, 3, 0]}


TODO: In the cell below, print the logs of the web service and delete the service

In [48]:
print(service.get_logs())

2021-03-20T19:13:50,926065000+00:00 - iot-server/run 
2021-03-20T19:13:50,933904600+00:00 - gunicorn/run 
2021-03-20T19:13:50,936763000+00:00 - rsyslog/run 
2021-03-20T19:13:50,970808000+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_661474bbe74e96b5d8added5888dfc85/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

In [49]:
cpu_cluster.delete()
service.delete()