## Loading Data

In [4]:
import pandas as pd

df = pd.read_csv('traintest.csv')
df.drop(columns='Unnamed: 0', inplace=True)
df.head(1)

Unnamed: 0,Is Action,Is Adventure,Is Animation,Is Comedy,Is Crime,Is Documentary,Is Drama,Is Family,Is Fantasy,Is Foreign,...,overview,release_date,release_month,release_quarter,release_year,runtime,tagline,title,Is Christmas Movie,keywords
0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,...,Three ghosts try to help two young lovers whom...,1940-01-01,1.0,1.0,1940.0,84.0,Is there a better time to fall in love?,Beyond Tomorrow,0,"nurse,seduction,radio program,ghost,"


## Get a Reference to Azure
In order to perform Machine Learning, we need a Machine Learning Workspace. We'll get it from a config file.

In [2]:
from azureml.core import Workspace, Experiment, Dataset, Model

# Load the workspace information from config.json using the Azure ML SDK
ws = Workspace.from_config()
ws.name

'2022-data-science-talks'

## Registering a DataSet on Azure

In [5]:
# Get the storage account associated with this ML workspace
datastore = ws.get_default_datastore()
datastore.name

'workspaceblobstore'

In [6]:

ds = Dataset.Tabular.register_pandas_dataframe(dataframe=df, 
        name='ChristmasMovies', 
        description='Movies broken down by Christmas movies and non-Christmas movies', 
        target=datastore)
ds.name

Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/7d85751d-9da7-455a-969b-9b68fdfe66b0/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


'ChristmasMovies'

## Create a Compute Resource
We'll need some compute resources to run the experiment

In [11]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Now let's make sure we have a compute resource
cluster_name = "Low-End-Compute-Cluster"
max_nodes = 4

# Fetch or create the compute resource
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name) # This will throw a ComputeTargetException if this doesn't exist
    print('Using existing compute: ' + cluster_name)
except ComputeTargetException:
    # Create the cluster
    print('Provisioning cluster...')
    compute_config = AmlCompute.provisioning_configuration(vm_size="Standard_D2DS_V4", 
                                                           min_nodes=0, 
                                                           max_nodes=max_nodes, 
                                                           vm_priority='lowpriority')
    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

# Ensure the cluster is ready to go
cpu_cluster.wait_for_completion(show_output=True)

Provisioning cluster...
InProgress....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Create the Machine Learning Experiment
This will hold runs of our experiment so we can track progress over time

In [9]:
from azureml.core.experiment import Experiment

# Create a Machine Learning Experiment
experiment_name = 'DieHard-AutoML'

experiment=Experiment(ws, experiment_name)
experiment.name

'DieHard-AutoML'

## Submit the Experiment
This asks Azure to run the experiment and waits for it to complete

In [20]:
from azureml.train.automl import AutoMLConfig

# Set up the experiment
automl_config = AutoMLConfig(
    task='classification',                  # The machine learning task we're trying to accomplish
    primary_metric='AUC_weighted',          # How we judge one model as better than another. AUC tends to be fairly balanced
    training_data=ds,                       # Our dataset of movies
    enable_dnn=True,                        # Enable Deep Learning
    compute_target=cpu_cluster,             # The compute resource to use
    max_concurrent_iterations=max_nodes,    # Don't want more concurrent iterations than CPU nodes
    iteration_timeout_minutes=5,            # The maximum number of minutes per individual run
    label_column_name='Is Christmas Movie') # The value we want to predict for future values

In [21]:
from azureml.widgets import RunDetails

# Submit the experiment
run = experiment.submit(automl_config)

# Wait for the experiment to complete
RunDetails(run).show()
run.wait_for_completion(show_output=False)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
DieHard-AutoML,AutoML_05af2ba7-1474-4411-86bb-cddaec43b952,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

{'runId': 'AutoML_05af2ba7-1474-4411-86bb-cddaec43b952',
 'target': 'Low-End-Compute-Cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-08-30T02:49:51.910878Z',
 'endTimeUtc': '2022-08-30T04:41:49.378718Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'Low-End-Compute-Cluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"74098b45-79fc-42cc-abce-6557f01df36b\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_ve

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


## Working with the Resulting Model
Now that we have a number of models, including the best performing model, let's save it locally for deployment later

In [23]:
# Grab the resulting model and best run
best_auto_run, automl_model = run.get_output()

# Display details about the best run
RunDetails(best_auto_run).show()

Package:azureml-automl-runtime, training version:1.44.0, current version:1.40.0
Package:azureml-core, training version:1.44.0, current version:1.40.0
Package:azureml-dataprep, training version:4.2.2, current version:3.0.2
Package:azureml-dataprep-rslex, training version:2.8.1, current version:2.4.2
Package:azureml-dataset-runtime, training version:1.44.0, current version:1.40.0
Package:azureml-defaults, training version:1.44.0, current version:1.40.0
Package:azureml-inference-server-http, training version:0.7.4, current version:0.4.13
Package:azureml-interpret, training version:1.44.0, current version:1.40.0
Package:azureml-mlflow, training version:1.44.0, current version:1.40.0
Package:azureml-pipeline-core, training version:1.44.0, current version:1.40.0
Package:azureml-telemetry, training version:1.44.0, current version:1.40.0
Package:azureml-train-automl-client, training version:1.44.0, current version:1.40.0
Package:azureml-train-automl-runtime, training version:1.44.0, current ve

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

In [24]:
# Save the best model locally
best_auto_run.download_files(output_directory='automl-output')

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


In [35]:
# Register the model in Azure
model = best_auto_run.register_model(model_name='ChristmasMovie-AutoML', 
                                     model_path='outputs/model.pkl', 
                                     description='Predict whether or not a movie is a Christmas movie')

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


## Deploying the Model
We can deploy the model to Azure as an Azure Container Instance (ACI) directly from code

In [37]:
from azureml.core import Environment
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import AciWebservice

# Create an inference config from the best model's files
env = Environment.from_conda_specification("AutoML-env", "automl-output/outputs/conda_env_v_1_0_0.yml")
inference_config = InferenceConfig(environment=env, 
                                   source_directory='./automl-output/outputs', 
                                   entry_script='./scoring_file_v_2_0_0.py')

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1, enable_app_insights=True)

# Deploy the model
service = Model.deploy(ws, "christmas-movie-predictor", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-08-30 01:05:19-04:00 Creating Container Registry if not exists.

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Usin

.
2022-08-30 01:15:19-04:00 Registering the environment..
2022-08-30 01:15:20-04:00 Building image.

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Usin

.
2022-08-30 01:35:36-04:00 Generating deployment configuration.
2022-08-30 01:35:37-04:00 Submitting deployment to compute..
2022-08-30 01:35:53-04:00 Checking the status of deployment christmas-movie-predictor.

INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Using default datastore for uploads
INFO:interpret_community.common.explanation_utils:Usin

KeyboardInterrupt: 

## Calling our Endpoint
Now that we have our endpoint, let's see what it thinks of Die Hard

In [None]:
# Grab our scoring endpoint for testing
scoring_uri = service.scoring_uri
print('Endpoint active at ' + scoring_uri)

In [28]:
# Load Die Hard. The movie data was saved to a separate CSV with the same format
df_dieHard = pd.read_csv('DieHard.csv')
df_dieHard.drop(columns='id', inplace=True)
df_dieHard.head()

Unnamed: 0,Is Action,Is Adventure,Is Animation,Is Comedy,Is Crime,Is Documentary,Is Drama,Is Family,Is Fantasy,Is Foreign,...,overview,release_date,release_month,release_quarter,release_year,runtime,tagline,title,Is Christmas Movie,keywords
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,"NYPD cop, John McClane's plan to reconcile wit...",1988-07-15,7.0,3.0,1988.0,131.0,40 Stories. Twelve Terrorists. One Cop.,Die Hard,1,"helicopter,journalist,based on novel,terrorist..."


In [31]:
# Isolate die hard into a variable that we can pass along to our endpoint
dieHard = df_dieHard.iloc[0]
dieHard

Is Action                                                          1.00
Is Adventure                                                       0.00
Is Animation                                                       0.00
Is Comedy                                                          0.00
Is Crime                                                           0.00
Is Documentary                                                     0.00
Is Drama                                                           0.00
Is Family                                                          0.00
Is Fantasy                                                         0.00
Is Foreign                                                         0.00
Is History                                                         0.00
Is Horror                                                          0.00
Is Music                                                           0.00
Is Mystery                                                      



In [None]:
import requests
import json

# Create an object that looks like what our ACI endpoint expects
data = {
  "Inputs": {
    "data": dieHard
  },
  "GlobalParameters": 1.0
}

# Convert to JSON string
input_data = json.dumps(data)

# Set the content type
headers = {'Content-Type': 'application/json'}

# NOTE: with Auth enabled we'd need to set some headers

# Make the request and look at details of the response
resp = requests.post(scoring_uri, input_data, headers=headers)
resp_json = resp.json()

resp_json

In [None]:
# Find out if Die Hard is a Christmas movie
results = resp_json['Results']
results

## Cleaning up Resources
The cluster will live on idle and not costing us anything, but we need to delete our ACI endpoint to not get billed for it

In [None]:
# Delete the ACI endpoint
service.delete()