# Azure Machine Learning Pipeline with AutoMLStep 

This notebook is part of [Project 2: Operationalizing Machine Learning with Azure](https://github.com/dpbac/Operationalizing-Machine-Learning-with-Azure) of the `Udacity Nanodegree Program : Machine Learning Engineer with Azure`.

In this project you will continue working with the Bank Dataset introduced in [Project 1: Optimizing an ML Pipeline in Azure](https://github.com/dpbac/Optimizing-an-ML-Pipeline-in-Azure). We will use Azure in this project to configure a cloud-based machine learning product model, deploy it, and consume it.

**REVIEW AND COMPLETE ALL TEXT BELOW**

This notebook demonstrates the use of AutoMLStep in Azure Machine Learning Pipeline.

A model is generated using AutoML for classifcation using the dataset available at https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv

## Introduction

In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline. 

If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.

In this notebook the following steps are performed

1. Create an `Experiment` in an existing `Workspace`.
2. Create or Attach existing AmlCompute to a workspace.
3. Define data loading in a `TabularDataset`.
4. Configure AutoML using `AutoMLConfig`.
5. Use AutoMLStep
6. Train the model using AmlCompute
7. Explore the results.
8. Test the best fitted model.



# Automated ML Experiment

In [None]:
# Azure Machine Learning and Pipeline SDK-specific imports

import logging
import os
import csv

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset

from azureml.pipeline.steps import AutoMLStep

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

## Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

The config.json can be downloaded in the overview of Azure portal.

In [None]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

In [None]:
# it it is not working try like in the previous project

# from azureml.core import Workspace, Experiment

# # Initialize a workspace object for an existing Azure Machine Learning Workspace
# ws = Workspace.get("quick-starts-ws-127549")

## Create an Azure ML experiment

**REVIEW AND UPDATE TEXT**
**ATTENTION NAME OF THE EXPERIMENT**

Let's create an experiment named "automlstep-classification" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step.

*Udacity Note:* There is no need to create an Azure ML experiment, this needs to re-use the experiment that was already created

In [None]:
# Choose a name for the run history container in the workspace.
# NOTE: update these to match your existing experiment name
experiment_name = 'automl-experiment'
project_folder = './pipeline-project'

experiment = Experiment(ws, experiment_name)
experiment

In [None]:
# # Create a experiment
# exp = Experiment(workspace=ws, name="udacity-project2")

# run = exp.start_logging()

In [None]:
dic_data = {'Workspace name': ws.name,
            'Azure region': ws.location,
            'Subscription id': ws.subscription_id,
            'Resource group': ws.resource_group,
            'Experiment Name': exp.name}

df_data = pd.DataFrame.from_dict(data = dic_data, orient='index')

df_data.rename(columns={0:''}, inplace = True)
df_data