# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core import Workspace, Experiment, Dataset, Model, Environment
from azureml.train.automl import AutoMLConfig
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice


from sklearn.model_selection import train_test_split
import pandas as pd
import os
import json
import joblib
import sklearn

## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.

### Kaggle - Housing Prices Competition for Kaggle Learn Users
We will be using the "Housing Prices Competition for Kaggle Learn Users" training and test datasets for this capstone project.

This is a regression competition in which competitors try to predict the price of the houses in the test dataset using the training dataset.

The original dataset was first published by Dean De Cock in his paper [Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project](https://www.researchgate.net/publication/267976209_Ames_Iowa_Alternative_to_the_Boston_Housing_Data_as_an_End_of_Semester_Regression_Project) at Journal of Statistics Education (November 2011).

For competition purposes, approximately all of the data has been divided into two parts: "training dataset" and "test dataset" We will be using the training dataset for training and the test dataset for submission to the competition.

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'automl_experiment'

experiment=Experiment(ws, experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code F569PC9L5 to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.
Workspace name: quick-starts-ws-137659
Azure region: southcentralus
Subscription id: 6b4af8be-9931-443e-90f6-c4c34a1f9737
Resource group: aml-quickstarts-137659


In [3]:
# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.amlcompute(class)?view=azure-ml-py#provisioning-configuration-vm-size-----vm-priority--dedicated---min-nodes-0--max-nodes-none--idle-seconds-before-scaledown-none--admin-username-none--admin-user-password-none--admin-user-ssh-key-none--vnet-resourcegroup-name-none--vnet-name-none--subnet-name-none--tags-none--description-none--remote-login-port-public-access--notspecified--

# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
# https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/ml-frameworks/scikit-learn/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb


# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                              max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [67]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler,OneHotEncoder
from sklearn.compose import ColumnTransformer
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
from sklearn.pipeline import Pipeline

#global X_train, X_test, y_train, y_test

def clean_data(data, test):

    # Convert dataset to pandas dataframe
    X = data.to_pandas_dataframe()
    X_test = test.to_pandas_dataframe()
    # Set Id to index
    X.set_index('Id',inplace=True)
    X_test.set_index('Id',inplace=True)
    # Remove rows with missing target, separate target from predictors
    X.dropna(axis=0, subset=['SalePrice'], inplace=True)
    y = X.SalePrice 
    # Remove target and 'Utilities' 
    X.drop(['SalePrice', 'Utilities'], axis=1, inplace=True)
    X_test.drop(['SalePrice', 'Utilities'], axis=1, inplace=True)
    # Split the data
    X_train, X_valid, y_train, y_valid = train_test_split(X,y)
    # Select object columns
    categorical_cols = [cname for cname in X_train.columns if X_train[cname].dtype == "object"]
    # Select numeric columns
    numerical_cols = [cname for cname in X_train.columns if X_train[cname].dtype in ['int64','float64']]

    # Imputation lists
    # imputation to null values of these numerical columns need to be 'constant'
    constant_num_cols = ['GarageYrBlt', 'MasVnrArea']
    # imputation to null values of these numerical columns need to be 'mean'
    mean_num_cols = list(set(numerical_cols).difference(set(constant_num_cols)))
    # imputation to null values of these categorical columns need to be 'constant'
    constant_categorical_cols = ['Alley', 'MasVnrType', 'BsmtQual', 'BsmtCond','BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PoolQC', 'Fence', 'MiscFeature']
    # imputation to null values of these categorical columns need to be 'most_frequent'
    mf_categorical_cols = list(set(categorical_cols).difference(set(constant_categorical_cols)))

    my_cols = constant_num_cols + mean_num_cols + constant_categorical_cols + mf_categorical_cols

    # Define transformers
    # Preprocessing for numerical data - mean
    numerical_transformer_m = Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')),('scaler', StandardScaler())])
    # Preprocessing for numerical data - constant
    numerical_transformer_c = Pipeline(steps=[('imputer', SimpleImputer(strategy='constant', fill_value=0)),('scaler', StandardScaler())])

    # Preprocessing for categorical data for most frequent
    categorical_transformer_mf = Pipeline(steps=[('imputer', SimpleImputer(strategy='most_frequent')), ('onehot', OneHotEncoder(handle_unknown = 'ignore', sparse = False))])
    # Preprocessing for categorical data for constant
    categorical_transformer_c = Pipeline(steps=[('imputer', SimpleImputer(strategy='constant', fill_value='NA')), ('onehot', OneHotEncoder(handle_unknown = 'ignore', sparse = False))])

    # Bundle preprocessing for numerical and categorical data
    preprocessor = ColumnTransformer(transformers=[
        ('num_mean', numerical_transformer_m, mean_num_cols),
        ('num_constant', numerical_transformer_c, constant_num_cols),
        ('cat_mf', categorical_transformer_mf, mf_categorical_cols),
        ('cat_c', categorical_transformer_c, constant_categorical_cols)])

    # Transform data
    X_train = preprocessor.fit_transform(X_train)
    X_valid = preprocessor.transform(X_valid)
    X_test = preprocessor.transform(X_test)
    
    
    # Concat datasets
    # https://stackoverflow.com/questions/41989950/numpy-array-concatenate-valueerror-all-the-input-arrays-must-have-same-number
    train_data = np.concatenate([X_train, y_train[:,None]], axis=1)
    valid_data = np.concatenate([X_valid, y_valid[:,None]], axis=1)
    
    
    # Return data
    return train_data, valid_data, X_test

In [68]:
# Get the dataset
ds_train = Dataset.get_by_name(ws, name='Housing Prices Dataset')
ds_test = Dataset.get_by_name(ws, name='Housing Prices Dataset')

# Use the clean_data function to clean your data.
train_data, valid_data, test_data = clean_data(ds_train, ds_test)
print (train_data.shape)
print (valid_data.shape)
print(test_data.shape)

(1095, 399)
(365, 399)
(1460, 398)


In [71]:
# automl_config requires TabularDataset as a result we need to
# create a dataset from pandas dataframe
# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets#create-a-filedataset
print(type(train_data))

# create data folder if not exist 
if "data" not in os.listdir():
    os.mkdir("./data")

# convert train dataframe
# https://stackoverflow.com/questions/11106536/adding-row-column-headers-to-numpy-arrays
number_of_columns=len(train_data[1,:])
names = [i for i in range(number_of_columns)]
test_ds_names = [i for i in range(number_of_columns-1)]
train_path = 'data/train_cleaned.csv'
cleaned_train_data = pd.DataFrame(train_data, columns=names)
cleaned_train_data.to_csv(train_path, index=False, header=True, sep=',')

# convert valid dataframe
valid_path = 'data/valid_cleaned.csv'
cleaned_valid_data = pd.DataFrame(valid_data, columns=names)
cleaned_valid_data.to_csv(valid_path, index=False, header=True, sep=',')

# convert test dataframe
test_path = 'data/test_cleaned.csv'
cleaned_test_data = pd.DataFrame(test_data, columns=test_ds_names)
cleaned_test_data.to_csv(test_path, index=False, header=True, sep=',')

# get the datastore to upload prepared data
datastore = ws.get_default_datastore()

# upload the local file from src_dir to the target_path in datastore
datastore.upload(src_dir='data', target_path='data')

# create a dataset referencing the cloud location
train_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, ('data/train_cleaned.csv'))])

# create a dataset referencing the cloud location
valid_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, ('data/valid_cleaned.csv'))])

# create a dataset referencing the cloud location
test_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, ('data/test_cleaned.csv'))])

<class 'numpy.ndarray'>
Uploading an estimated of 8 files
Target already exists. Skipping upload for data/.gitkeep
Target already exists. Skipping upload for data/data_description.txt
Target already exists. Skipping upload for data/sample_submission.csv
Target already exists. Skipping upload for data/test.csv
Target already exists. Skipping upload for data/train.csv
Target already exists. Skipping upload for data/train_cleaned.csv
Target already exists. Skipping upload for data/valid_cleaned.csv
Uploading data/test_cleaned.csv
Uploaded data/test_cleaned.csv, 1 files out of an estimated total of 4
Uploaded 1 files


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [11]:
print(valid_dataset.to_pandas_dataframe().head(1))

          0         1        2         3         4         5         6  \
0 -0.085953 -0.363545  0.13545 -0.279699  1.251437 -0.094764 -0.129057   

          7         8         9  ...  390  391  392  393  394  395  396  397  \
0  0.607661 -1.162877 -0.121786  ...  0.0  0.0  0.0  1.0  0.0  1.0  0.0  0.0   

   398       399  
0  0.0  187000.0  

[1 rows x 400 columns]


In [12]:
project_folder = './'
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 5,
    "max_cores_per_iteration":-1,
    "max_concurrent_iterations":4, 
    "n_cross_validations":5,
    "enable_early_stopping": True,
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(compute_target = ws.compute_targets['cpu-cluster'],
                             task = "regression",
                             primary_metric = 'normalized_root_mean_squared_error',
                             training_data=train_dataset,
                             validation_data=valid_dataset,
                             label_column_name="399",   
                             path = project_folder
                            )

In [13]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [14]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)
assert(remote_run.get_status()=="Completed")

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…


Current status: FeaturesGeneration. Generating features for the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more about high cardinality feature handling: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

*********************************

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [15]:
# Retrieve and save your best automl model.

# https://github.com/MicrosoftLearning/DP100/blob/master/08B%20-%20Using%20Automated%20Machine%20Learning.ipynb
# Get the best run object
best_run, fitted_model = remote_run.get_output()
print("Summary:")
print(remote_run.summary())
print("********************\n")
print("Best run:")
print(best_run)
print("********************\n")
print("Estimator:")
print(fitted_model.steps[-1])
print("********************\n")
print("Model:")
print(fitted_model)
print("********************\n")
best_run_metrics = best_run.get_metrics()
print('MAE:', best_run_metrics['mean_absolute_error'])
print("********************\n")

for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


Summary:
[['StackEnsemble', 1, 0.036372330032146245], ['VotingEnsemble', 1, 0.031617227267149134], ['RandomForest', 4, 0.04300449959084138], ['ElasticNet', 4, 0.03783185834646587], ['XGBoostRegressor', 2, 0.03512004809294422], ['GradientBoosting', 1, 0.03686071239391687], ['DecisionTree', 16, 0.04878785859976609], ['LassoLars', 2, 0.0430270635549879], ['LightGBM', 1, 0.03703172420413354]]
********************

Best run:
Run(Experiment: automl_experiment,
Id: AutoML_2f5e6322-44fd-45be-9353-0662a207e870_30,
Type: azureml.scriptrun,
Status: Completed)
********************

Estimator:
('prefittedsoftvotingregressor', PreFittedSoftVotingRegressor(estimators=[('1',
                                          Pipeline(memory=None,
                                                   steps=[('maxabsscaler',
                                                           MaxAbsScaler(copy=True)),
                                                          ('xgboostregressor',
                             

In [16]:
#TODO: Save the best model
# https://knowledge.udacity.com/questions/357007
os.makedirs('outputs', exist_ok=True)
joblib.dump(fitted_model, 'automl_model.pkl')


['automl_model.pkl']

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [17]:
# Register model
model = Model.register(workspace= ws,model_path='automl_model.pkl', model_name='best_automl_run')
# Check model
for model in Model.list(ws):
    print("Model Name: {}\n".format(model.name))
    print(model)
    print("********************\n")

Registering model best_automl_run
Model Name: best_automl_run

Model(workspace=Workspace.create(name='quick-starts-ws-137659', subscription_id='6b4af8be-9931-443e-90f6-c4c34a1f9737', resource_group='aml-quickstarts-137659'), name=best_automl_run, id=best_automl_run:1, version=1, tags={}, properties={})
********************



In [66]:
%%writefile conda_dependencies.yml

dependencies:
- python=3.6.2
- scikit-learn
- joblib
- numpy
- pip:
  - azureml-defaults
  - inference-schema[numpy-support]
  - azureml-train-automl
  - xgboost

Overwriting conda_dependencies.yml


In [26]:
env = Environment.get(workspace=ws, name="AzureML-AutoML")

In [27]:
print("packages", env.python.conda_dependencies.serialize_to_string())

packages channels:
- anaconda
- conda-forge
- pytorch
dependencies:
- python=3.6.2
- pip=20.2.4
- pip:
  - azureml-core==1.21.0.post1
  - azureml-pipeline-core==1.21.0
  - azureml-telemetry==1.21.0
  - azureml-defaults==1.21.0
  - azureml-interpret==1.21.0
  - azureml-automl-core==1.21.0
  - azureml-automl-runtime==1.21.0
  - azureml-train-automl-client==1.21.0
  - azureml-train-automl-runtime==1.21.0.post1
  - azureml-dataset-runtime==1.21.0
  - inference-schema
  - py-cpuinfo==5.0.0
  - boto3==1.15.18
  - botocore==1.18.18
- numpy~=1.18.0
- scikit-learn==0.22.1
- pandas~=0.25.0
- py-xgboost<=0.90
- fbprophet==0.5
- holidays==0.9.11
- setuptools-git
- psutil>5.0.0,<6.0.0
name: azureml_7ade26eb614f97df8030bc480da59236



In [25]:
from azureml.core import Environment

my_env = Environment.from_conda_specification(name = 'my-env', file_path = './my-env.yml')

In [28]:
with open('score.py') as f:
    print(f.read())

import joblib
import numpy as np
import os

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType


# The init() method is called once, when the web service starts up.
#
# Typically you would deserialize the model file, as shown here using joblib,
# and store it in a global variable so your run() method can access it later.
def init():
    global model

    # The AZUREML_MODEL_DIR environment variable indicates
    # a directory containing the model file you registered.
    model_filename = 'automl_model.pkl'
    model_path = os.path.join(os.environ['AZUREML_MODEL_DIR'], model_filename)

    model = joblib.load(model_path)


# The run() method is called each time a request is made to the scoring API.
#
# Shown here are the optional input_schema and output_schema decorators
# from the inference-schema pip package. Using these decorators on your
# run() method parses and validates th

In [48]:
service_name = 'my-automl-service'
#my_model = Model(ws, 'best_automl_run', version=1)
my_model = Model(ws, 'best_automl_run')
inference_config = InferenceConfig(entry_script='score.py', environment=my_env)
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

service = Model.deploy(workspace=ws,
                       name=service_name,
                       models=[my_model],
                       inference_config=inference_config,
                       deployment_config=aci_config,
                       overwrite=True)
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running......................................
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [49]:
# Enable application insights
service.update(enable_app_insights=True)

TODO: In the cell below, send a request to the web service you deployed to test it.

In [56]:
import json

test_sample = json.dumps({'data': [
    [-0.0859528613529813,-0.36354521171325305,-0.09848942592492069,-0.2796986280674258,-0.7578351616129093,1.0756335865128208,-0.12905680819293966,1.339259538474412,-0.26835014340179053,3.933471146038076,-1.0033203374478632,-0.07333340857320904,-1.2623323347145659,-1.554030619560426,0.2925864878985621,-0.7682554384315856,-1.0374764641885903,-0.2095969950992871,-2.297476708819124,-0.5060972419158338,-0.7947788299373657,-0.26093173135248043,0.5539305114736647,1.2487482387683566,1.0417709664249128,-0.45567727750211645,-0.7955454757135164,-0.24546682664767563,0.31229303496618394,1.1188800229146632,-0.07919625766046033,-1.3559196129494377,2.08580400027604,0.22052234535891524,-0.5768383662665258,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0],
    [0.6325080789639239,-0.36354521171325305,-0.09848942592492069,-0.2796986280674258,1.2514372402421408,-0.09476368905402521,-0.12905680819293966,0.24186178744212383,0.2247306177125543,-0.12178590729178182,-0.6715229655518978,-0.07333340857320904,1.3339481043618608,-0.3231709245017159,0.2925864878985621,-0.7682554384315856,-1.0374764641885903,-0.2095969950992871,0.16772367937974014,-0.5060972419158338,-0.7947788299373657,-0.26093173135248043,-0.886244977399298,-0.6984241198286087,-0.9614929599695236,-0.013297559699212676,-0.7955454757135164,-0.24546682664767563,0.2553201296340031,-0.8258400169132036,-0.07919625766046033,-1.3559196129494377,-0.9655725933141587,0.22052234535891524,0.33377147359196874,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0]
]})

#test_sample = bytes(test_sample, encoding='utf8')

prediction = service.run(input_data=test_sample)
print(prediction)

ERROR:azureml.core.webservice.aci:Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.
Response Code: 502
Headers: {'Connection': 'keep-alive', 'Content-Length': '481', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Sat, 06 Feb 2021 21:32:07 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': 'df372b1c-09c0-4403-8eb3-c0bdf4bcac1f', 'X-Ms-Run-Function-Failed': 'True'}
Content: b'DataException:\n\tMessage: Expected column(s) 0 not found in fitted data.\n\tInnerException: None\n\tErrorResponse \n{\n    "error": {\n        "code": "UserError",\n        "message": "Expected column(s) 0 not found in fitted data.",\n        "target": "X",\n        "inner_error": {\n            "code": "BadArgument",\n            "inner_error": {\n                "code": "MissingColumnsInData"\n            }\n        },\n        "reference_code": "17049f70-3bbe-4060-a63f-f06590e784e5"\n    }\n}'



WebserviceException: WebserviceException:
	Message: Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.
Response Code: 502
Headers: {'Connection': 'keep-alive', 'Content-Length': '481', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Sat, 06 Feb 2021 21:32:07 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': 'df372b1c-09c0-4403-8eb3-c0bdf4bcac1f', 'X-Ms-Run-Function-Failed': 'True'}
Content: b'DataException:\n\tMessage: Expected column(s) 0 not found in fitted data.\n\tInnerException: None\n\tErrorResponse \n{\n    "error": {\n        "code": "UserError",\n        "message": "Expected column(s) 0 not found in fitted data.",\n        "target": "X",\n        "inner_error": {\n            "code": "BadArgument",\n            "inner_error": {\n                "code": "MissingColumnsInData"\n            }\n        },\n        "reference_code": "17049f70-3bbe-4060-a63f-f06590e784e5"\n    }\n}'
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.\nResponse Code: 502\nHeaders: {'Connection': 'keep-alive', 'Content-Length': '481', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Sat, 06 Feb 2021 21:32:07 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': 'df372b1c-09c0-4403-8eb3-c0bdf4bcac1f', 'X-Ms-Run-Function-Failed': 'True'}\nContent: b'DataException:\\n\\tMessage: Expected column(s) 0 not found in fitted data.\\n\\tInnerException: None\\n\\tErrorResponse \\n{\\n    \"error\": {\\n        \"code\": \"UserError\",\\n        \"message\": \"Expected column(s) 0 not found in fitted data.\",\\n        \"target\": \"X\",\\n        \"inner_error\": {\\n            \"code\": \"BadArgument\",\\n            \"inner_error\": {\\n                \"code\": \"MissingColumnsInData\"\\n            }\\n        },\\n        \"reference_code\": \"17049f70-3bbe-4060-a63f-f06590e784e5\"\\n    }\\n}'"
    }
}

In [57]:
import requests

# Set the content type
headers = {'Content-Type': 'application/json'}

# Make the request and display the response
resp = requests.post(service.scoring_uri, test_sample, headers=headers)

print(resp)

<Response [502]>


In [79]:
np.array(cleaned_valid_data[0:1]).tolist()

[[-1.532903851757167,
  -0.36126774733521677,
  -0.09774430387870965,
  -0.2702325452608195,
  3.2369284190971674,
  0.23908882685494287,
  -0.1136169830059204,
  -1.9810882825541094,
  0.29046082985193455,
  -0.10617347647748634,
  -0.48396009291707687,
  -0.05966529825366463,
  -1.2915627647593246,
  -0.30200602885812705,
  0.3058665419333633,
  0.22248443124770598,
  -2.857264942084541,
  4.775595643115285,
  -3.5518323816393966,
  -1.4201917063238745,
  -0.7747431343415216,
  -0.28238898293092624,
  0.8214382742789895,
  -0.6965182266455099,
  1.6338762530088817,
  0.2307995471438437,
  -0.7760859852984908,
  -0.22388540000528934,
  -0.3511129842214174,
  3.0099982922274386,
  -0.09426384930206698,
  0.14520561677937774,
  -0.9547336020313503,
  0.2211725243858734,
  -0.5732053108022752,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
 

In [80]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge


dataset_x, dataset_y = load_diabetes(return_X_y=True)

In [95]:
dataset_x[0:1]

array([[ 0.03807591,  0.05068012,  0.06169621,  0.02187235, -0.0442235 ,
        -0.03482076, -0.04340085, -0.00259226,  0.01990842, -0.01764613]])

In [105]:
dict=cleaned_valid_data[0:1].to_dict(orient="records")

In [117]:
my_list= [-1.532903851757167,-0.36126774733521677,-0.09774430387870965,-0.2702325452608195,3.2369284190971674,0.23908882685494287,-0.1136169830059204,-1.9810882825541094,0.29046082985193455,-0.10617347647748634,-0.48396009291707687,-0.05966529825366463,-1.2915627647593246,-0.30200602885812705,0.3058665419333633,0.22248443124770598,-2.857264942084541,4.775595643115285,-3.5518323816393966,-1.4201917063238745,-0.7747431343415216,-0.28238898293092624,0.8214382742789895,-0.6965182266455099,1.6338762530088817,0.2307995471438437,-0.7760859852984908,-0.22388540000528934,-0.3511129842214174,3.0099982922274386,-0.09426384930206698,0.14520561677937774,-0.9547336020313503,0.2211725243858734,-0.5732053108022752,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,108959.0]
my_data={}
print(len(my_list))
for count in range(len(my_list)):
    my_data[str(count)]=str(my_list[count])
print(my_data)

399
{'0': '-1.532903851757167', '1': '-0.36126774733521677', '2': '-0.09774430387870965', '3': '-0.2702325452608195', '4': '3.2369284190971674', '5': '0.23908882685494287', '6': '-0.1136169830059204', '7': '-1.9810882825541094', '8': '0.29046082985193455', '9': '-0.10617347647748634', '10': '-0.48396009291707687', '11': '-0.05966529825366463', '12': '-1.2915627647593246', '13': '-0.30200602885812705', '14': '0.3058665419333633', '15': '0.22248443124770598', '16': '-2.857264942084541', '17': '4.775595643115285', '18': '-3.5518323816393966', '19': '-1.4201917063238745', '20': '-0.7747431343415216', '21': '-0.28238898293092624', '22': '0.8214382742789895', '23': '-0.6965182266455099', '24': '1.6338762530088817', '25': '0.2307995471438437', '26': '-0.7760859852984908', '27': '-0.22388540000528934', '28': '-0.3511129842214174', '29': '3.0099982922274386', '30': '-0.09426384930206698', '31': '0.14520561677937774', '32': '-0.9547336020313503', '33': '0.2211725243858734', '34': '-0.57320531080

In [118]:
data = {"data":
        [
          my_data,
      ]
    }

In [119]:
data

{'data': [{'0': '-1.532903851757167',
   '1': '-0.36126774733521677',
   '2': '-0.09774430387870965',
   '3': '-0.2702325452608195',
   '4': '3.2369284190971674',
   '5': '0.23908882685494287',
   '6': '-0.1136169830059204',
   '7': '-1.9810882825541094',
   '8': '0.29046082985193455',
   '9': '-0.10617347647748634',
   '10': '-0.48396009291707687',
   '11': '-0.05966529825366463',
   '12': '-1.2915627647593246',
   '13': '-0.30200602885812705',
   '14': '0.3058665419333633',
   '15': '0.22248443124770598',
   '16': '-2.857264942084541',
   '17': '4.775595643115285',
   '18': '-3.5518323816393966',
   '19': '-1.4201917063238745',
   '20': '-0.7747431343415216',
   '21': '-0.28238898293092624',
   '22': '0.8214382742789895',
   '23': '-0.6965182266455099',
   '24': '1.6338762530088817',
   '25': '0.2307995471438437',
   '26': '-0.7760859852984908',
   '27': '-0.22388540000528934',
   '28': '-0.3511129842214174',
   '29': '3.0099982922274386',
   '30': '-0.09426384930206698',
   '31': '0

In [101]:
dict

<bound method DataFrame.to_dict of     0     1     2     3    4    5     6     7    8     9    ...  389  390  \
0 -1.53 -0.36 -0.10 -0.27 3.24 0.24 -0.11 -1.98 0.29 -0.11  ... 0.00 0.00   

   391  392  393  394  395  396  397       398  
0 0.00 1.00 0.00 1.00 0.00 0.00 0.00 108959.00  

[1 rows x 399 columns]>

In [120]:

import json


input_payload = json.dumps(data)

output = service.run(input_payload)

print(output)

ERROR:azureml.core.webservice.aci:Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.
Response Code: 502
Headers: {'Connection': 'keep-alive', 'Content-Length': '57', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Sat, 06 Feb 2021 23:32:06 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': '0c80803a-6030-44fb-bce3-3e4e3a16ee39', 'X-Ms-Run-Function-Failed': 'True'}
Content: b"float() argument must be a string or a number, not 'dict'"



WebserviceException: WebserviceException:
	Message: Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.
Response Code: 502
Headers: {'Connection': 'keep-alive', 'Content-Length': '57', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Sat, 06 Feb 2021 23:32:06 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': '0c80803a-6030-44fb-bce3-3e4e3a16ee39', 'X-Ms-Run-Function-Failed': 'True'}
Content: b"float() argument must be a string or a number, not 'dict'"
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Received bad response from service. More information can be found by calling `.get_logs()` on the webservice object.\nResponse Code: 502\nHeaders: {'Connection': 'keep-alive', 'Content-Length': '57', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Sat, 06 Feb 2021 23:32:06 GMT', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Ms-Request-Id': '0c80803a-6030-44fb-bce3-3e4e3a16ee39', 'X-Ms-Run-Function-Failed': 'True'}\nContent: b\"float() argument must be a string or a number, not 'dict'\""
    }
}

TODO: In the cell below, print the logs of the web service and delete the service

In [109]:
print(service.get_logs())

2021-02-06T21:15:12,425631597+00:00 - iot-server/run 
2021-02-06T21:15:12,426584313+00:00 - nginx/run 
2021-02-06T21:15:12,426673315+00:00 - gunicorn/run 
/usr/sbin/nginx: /azureml-envs/azureml_7ade26eb614f97df8030bc480da59236/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7ade26eb614f97df8030bc480da59236/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7ade26eb614f97df8030bc480da59236/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7ade26eb614f97df8030bc480da59236/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7ade26eb614f97df8030bc480da59236/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
2021-02-06T21:15:12,428706650+00:00 - rsyslog/run 
rsyslogd

In [47]:
# Delete the service
service.delete()

In [42]:
# Delete compute cluster
cpu_cluster.delete()

# References
- Cock, Dean. (2011). Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project. Journal of Statistics Education. 19. 10.1080/10691898.2011.11889627.
- [Deployment to Cloud Example](https://github.com/ErkanHatipoglu/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment/deploy-to-cloud)