# Invoke a model in Power BI using Azure ML service

Power BI offers the possibility to invoke models deployed with either Azure ML service or Studio. AML services allows for more flexibility regarding model contruction regarding AML Studio. This last is however much easier to use. Follow this [link](https://docs.microsoft.com/en-us/learn/modules/intro-to-azure-machine-learning-service/2-azure-ml-service-vs-ml-studio) to know more about AML services and Studio.

In this notebook, we are going to see how to train and deploy a model using Azure ML services and call it with Power BI. As  I particularly struggled to reach this task due to the few amount of references in the community, I decided to write this notebook to clearly explain _every_ step to call a Machine Learning model in Power BI.

To do so, we will divide this notebook in three sections:
- Model training
- Model deployment
- Invoke the deployed model in Power BI

The requirements are: 
- an Azure subscription to train & deploy your model 
- a Power BI Professional subscription to invoke your model in Power BI Dataflows
- install the azureml-sdk library. [Link](https://docs.microsoft.com/en-gb/python/api/overview/azure/ml/install?view=azure-ml-py) to the installation guide

Here are some ressources I followed:
- https://docs.microsoft.com/en-us/power-bi/service-machine-learning-integration
- https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#example-script-with-dictionary-input-support-consumption-from-power-bi
- https://community.powerbi.com/t5/Community-Blog/Azure-Machine-Learning-in-Power-BI-Dataflows/ba-p/709744

## 1 - Model Training
To set up a project on AML service, we are going to train it locally first and the deploy it once it has been properly constructed.

In this example, we are going to use the [diabetes](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) dataset from the sklearn library. We are going to perform a simple Ridge Regression to predict the disease progression of the patients in the dataset.

In order to train a model in the cloud, we are going to follow these steps:
- __Train a model locally__ <br>
This allows to make sure your model is properly working and save Azure credits
- __Create a workspace in Azure__ <br>
Here is where we are going to store our Experiments
- __Create an experiment__ <br>
Here is where we are going to store our models and deployments
- __Create a compute target__ <br>
Specify the type of machine that is going to execute your code
- __Import the data in the cloud__ <br>
The data used to train your model has to be stored in the cloud.
-

### Train a model locally

In [9]:
# Libraries
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Data preprocessing
X, y = load_diabetes(return_X_y=True)
X = pd.DataFrame(X)
y = pd.DataFrame(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Ridge Regression
alpha = 0.1
reg = Ridge(alpha=alpha)
reg.fit(X_train, y_train)

# Prediction and evaluation
preds = reg.predict(X_test)
mse = mean_squared_error(preds, y_test)
print("The Mean Squared Error on the test set is %d" %mse)

The Mean Squared Error on the test set is 3372


### Create a workspace

Now we are going to upload the model in the cloud. The model will be stored in an experiment in a workspace in Azure Machine Learning service Workspace. A Worskpace is the place where all experiments, and thereby models, are going to be stored. It serves as a hub for building and deploying models.

A workspace can be manually created via the Azure portal or running the following cell.

In [10]:
import azureml.core # azureml-sdk library

# Create the workspace
from azureml.core import Workspace

ws = Workspace.create(name = "create_a_workspace_name", # Workspace name, choose the name you like
                      subscription_id = "9e8d74ab-518e-4aa2-91eb-f1606e7312b6", # Your Azure's subscription id
                      resource_group = "create_a_resource_group", # Resource group name, choose the name you like
                      create_resource_group = True,
                      location = "eastus2", # Place where your workspace will be located
                      exist_ok = True)
ws.get_details()
ws.write_config()



Deploying StorageAccount with name createawstoragec3dffa9ae.
Deploying AppInsights with name createawinsights83eda8d1.
Deployed AppInsights with name createawinsights83eda8d1. Took 9.75 seconds.
Deploying KeyVault with name createawkeyvaulta68cf964.
Deployed KeyVault with name createawkeyvaulta68cf964. Took 24.86 seconds.
Deployed StorageAccount with name createawstoragec3dffa9ae. Took 26.5 seconds.
Deploying Workspace with name create_a_workspace_name.
Deployed Workspace with name create_a_workspace_name. Took 34.76 seconds.


Wait some minutes until the workspace has been created. To visually check the workspace, go to the [Azure Portal](portal.azure.com) -> search "Machine Learning service workspaces" and click on the Workspace you just created.

Next step is to connect to the workspace.

In [11]:
# Connect to the workspace
ws = Workspace.from_config()

### Create an experiment

An experiment is a collection of runs. A run is an execution of Python code that does a specific task, such as training a model.

An experiment can as well be created viat the Azure portal or with the following cell.

In [12]:
# Create an experiment in the workspace
from azureml.core import Experiment
exp = Experiment(workspace = ws, # Workspace to store our Experiment. Here, it's the previously created variable ws
                 name = 'create_an_experiment_name') # Experiment name, choose the name you like

### Create a compute target

A compute target is the compute resource to run a training script or to host a service deployment. It is attached to a workspace. It specifies the type of machine on which your experiment is going to be executed.

In [13]:
# Create a compute target
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os

# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpucluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)

# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_D2_V2")

if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size=vm_size,
                                                                min_nodes=compute_min_nodes,
                                                                max_nodes=compute_max_nodes)

    # create the cluster
    compute_target = ComputeTarget.create(
        ws, compute_name, provisioning_config)

    # can poll for a minimum number of nodes and for a specific timeout.
    # if no min node count is provided it will use the scale settings for the cluster
    compute_target.wait_for_completion(
        show_output=True, min_node_count=None, timeout_in_minutes=20)

    # For a more detailed view of current AmlCompute status, use get_status()
    print(compute_target.get_status().serialize())

creating a new compute target...
Creating
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned
{'currentNodeCount': 0, 'targetNodeCount': 0, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-09-10T13:07:31.918000+00:00', 'errors': None, 'creationTime': '2019-09-10T13:06:57.368799+00:00', 'modifiedTime': '2019-09-10T13:07:39.703918+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


In [15]:
import os
ds = ws.get_default_datastore()

ds.upload(X_train, target_path = 'data', )

# Save the data locally

# # Import the data in a blob storage in the storage account
# ds.upload(src_dir = os.path.join(os.getcwd(), 'data'), target_path = "data", overwrite = True)

TypeError: _isdir: path should be string, bytes or os.PathLike, not DataFrame

In [6]:
os.chdir('C:/Users/a.nogue.sanchez/OneDrive - Avanade/Documents/Projects/Projets internes Avanade/Tribu Analytics/Machine Learning in Power BI/Azure ML train and deploy/Income prediction use case')

In [7]:
# Create a local directory to store the model
import os
script_folder = os.path.join(os.getcwd(), "income_classifier")
os.makedirs(script_folder, exist_ok=True)
os.chdir(script_folder)

In [8]:
%%writefile train2.py

# Libraries
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib
from azureml.core import Run
from utils import load_data
import argparse
from sklearn import preprocessing

parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str)
args = parser.parse_args()
data_folder = args.data_folder
print('Data folder:', data_folder)

# Data preprocessing
X, y = load_diabetes(return_X_y=True)
X = pd.DataFrame(X)
y = pd.DataFrame(y)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=0)

# get hold of the current run
run = Run.get_context()

# Ridge Regression
alpha = 0.1
reg = Ridge(alpha=alpha)
reg.fit(X_train, y_train)

# Prediction and evaluation
preds = reg.predict(X_test)
mse = mean_squared_error(preds, y_test)
run.log('alpha', alpha)
run.log('mse', mse)

os.makedirs('outputs', exist_ok=True)
joblib.dump(value=reg, filename='outputs/income_classifier_model2.pkl')

Overwriting train2.py


In [9]:
# Copy the utils.py file in the folder just created
os.chdir("..\\")
import shutil
shutil.copy('utils.py', script_folder)

'C:\\Users\\a.nogue.sanchez\\OneDrive - Avanade\\Documents\\Projects\\Projets internes Avanade\\Tribu Analytics\\Machine Learning in Power BI\\Azure ML train and deploy\\Income prediction use case\\income_classifier\\utils.py'

In [10]:
# Create an estimator
from azureml.train.sklearn import SKLearn

script_params = {
    '--data-folder': ds.path('data').as_mount()
}

est = SKLearn(source_directory=script_folder,
              script_params=script_params,
              compute_target=compute_target,
              entry_script='train2.py')

In [11]:
# Submit the job to the cluster
run = exp.submit(config=est)
run

Experiment,Id,Type,Status,Details Page,Docs Page
income_prediction_exp2,income_prediction_exp2_1568031646_e5cd3b1d,azureml.scriptrun,Queued,Link to Azure Portal,Link to Documentation


In [12]:
# Get the run metrics
print(run.get_metrics())

{'alpha': 0.1, 'mse': 3372.649627810032}


In [13]:
# Register the model
print(run.get_file_names())
model = run.register_model(model_name='income_classifier',
                           model_path='outputs/income_classifier_model2.pkl')
print(model.name, model.id, model.version, sep='\t')

['azureml-logs/55_azureml-execution-tvmps_3c25d8d0273bb62276981122979b0f56d8d01726c68b52f8f298ddb598ab88d0_d.txt', 'azureml-logs/65_job_prep-tvmps_3c25d8d0273bb62276981122979b0f56d8d01726c68b52f8f298ddb598ab88d0_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_3c25d8d0273bb62276981122979b0f56d8d01726c68b52f8f298ddb598ab88d0_d.txt', 'logs/azureml/124_azureml.log', 'logs/azureml/azureml.log', 'outputs/income_classifier_model2.pkl']
income_classifier	income_classifier:1	1


In [14]:
# Retrieve the model from the workspace
from azureml.core import Workspace
from azureml.core.model import Model
ws = Workspace.from_config()
model = Model(ws, 'income_classifier')

model.download(target_dir=os.getcwd(), exist_ok=True)

# verify the downloaded model file
file_path = os.path.join(os.getcwd(), "income_classifier_model2.pkl")

os.stat(file_path)

os.stat_result(st_mode=33206, st_ino=4222124651202485, st_dev=270394477, st_nlink=1, st_uid=0, st_gid=0, st_size=637, st_atime=1567782412, st_mtime=1568032105, st_ctime=1567754604)

In [15]:
# Test the model locally
# Import the data
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=0)

# Predict on test set
import pickle
from sklearn.externals import joblib

lr = joblib.load(os.path.join(os.getcwd(), 'income_classifier_model2.pkl'))
y_hat = lr.predict(X_test)
mse = mean_squared_error(y_hat, y_test)

print(mse)

3372.6496278100326


In [21]:
y_hat[1]

array([241.27592573])

### Deploy as a web service

In [17]:
%%writefile score2.py

# Create a scoring script
import json
import numpy as np
import pandas as pd
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType
from azureml.core.model import Model

def init():
    global model
    # retrieve the path to the model file using the model name
    model_path = Model.get_model_path('income_classifier')
    model = joblib.load(model_path)
    
input_sample = pd.DataFrame(data=[{
    "0": 0.1,
    "1": 0.1,
    "2": 0.1,
    "3": 0.1,
    "4": 0.1,
    "5": 0.1,
    "6": 0.1,
    "7": 0.1,
    "8": 0.1,
    "9": 0.1
}])
output_sample = np.array([241.1])

@input_schema('data', PandasParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample)) # super careful with the type of your output!!!!!!

def run(data):
#     data = np.array(json.loads(raw_data)['data'])
    # make prediction
    y_hat = model.predict(data)
    # you can return any data type as long as it is JSON-serializable
    return y_hat.tolist()

Overwriting score2.py


In [18]:
# # Create the environment file
# from azureml.core.conda_dependencies import CondaDependencies 

# myenv = CondaDependencies()
# myenv.add_conda_package("scikit-learn")

# with open("myenv.yml", "w") as f:
#     f.write(myenv.serialize_to_string())

In [19]:
# Create a configuration file
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "Census_data",  
                                                     "method": "sklearn"},
                                               description='Predict income with sklearn')

In [20]:
%%time

# Deploy in container instances
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(runtime= "python", 
                                   entry_script="score2.py",
                                   conda_file="myenv.yml")

service = Model.deploy(workspace=ws, 
                       name='diabetes-regression2',
                       models=[model], 
                       inference_config=inference_config,
                       deployment_config=aciconfig)

service.wait_for_deployment(show_output=True)

Creating service
Running......................................................................
SucceededACI service creation operation finished, operation "Succeeded"
Wall time: 6min 8s


In [84]:
#compute_target.delete()
service.delete()

In [84]:
X_train.to_csv('C:/Users/a.nogue.sanchez/OneDrive - Avanade/Documents/Projects/Projets internes Avanade/Tribu Analytics/Machine Learning in Power BI/Azure ML train and deploy/Income prediction use case/data/diabetes.csv')

In [26]:
# test deployed web service
result = service.run(input_data = data)
result[20]

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [30]:
data2 = pd.read_csv(os.path.join(os.getcwd(), 'data\\census_income.csv'), sep = ", ")
data2.head()

  """Entry point for launching an IPython kernel.


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [31]:
data2.dtypes

age                int64
workclass         object
fnlwgt             int64
education         object
education-num      int64
marital-status    object
occupation        object
relationship      object
race              object
sex               object
capital-gain       int64
capital-loss       int64
hours-per-week     int64
native-country    object
income            object
dtype: object