# Azure Machine Learning Series Notebook - Ep1
- Prepared by Vivek Raja P S
- In this notebook, we will learn how to use Azure Machine Learning Python SDK to perform Machine Learning Tasks


## About the Author

Vivek Raja P S is working as Data Scientist at NexStem and Organiser at Azure Developer Community Groups in Tamil Nadu. He is also an AWS Community Builder for Machine Learning. He is Microsoft Certified Azure Data Scientist, AI Engineer and Data Engineer. Besides, he loves to mentor hackathon teams, blogging and speaking at various developer groups in the field of AI & Cloud. He is also an active speaker, blogger in various Developer Communities such as at AWS User Group India, TensorFlow User Group, Google Developer Group, Tamil FOSS Community.

### Social Handles:
- Email: vivekraja98@gmail.com
- Linkedin: https://linkedin.com/in/Vivek0712
- Twitter: https://twitter.com/VivekRaja007

### Repos:
GitHub: https://github.com/Vivek0712



# Before we get started...


## Pre-requisites
 - Basic Python programming Language
 - Understanding of Machine Learning Workflows
 
## Setup

 - Azure Account with Subscription
 - Create a Machine Learning Resource.
 - Provide a name for the workspace, Container Register
 - Launch the Machine Learning Studio
 - Create Dataset
 - Create Compute Resource
 - Launch a Notebook instance

## Preparing the Environment

 - Retreive all the necessary info 
 - Make sure all imports are done
 - Create Workspace (using SDK or Portal)

In [4]:
 from azureml.core import Workspace

# ws = Workspace.create(name='myworkspace',
#                subscription_id='<azure-subscription-id>',
#                resource_group='myresourcegroup',
#                create_resource_group=True,
#                location='eastus2'
#                )

In [5]:
import json

with open('keys.txt') as f:
        keys = json.load(f)
subscription_id = keys["SUBSCRIPTION_ID"]
resource_group = keys["RESOURCE_GROUP"]
workspace_name = keys["WORKSPACE_NAME"]
workspace_region = keys["WORKSPACE_REGION"]

## check if imports are done
import azureml.core


# Workspace

- An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models.

## Create / Access the Workspace

- Using Constructor
- Using config.json file

In [6]:
from azureml.core import Workspace

try:
    ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)

    ws.write_config()
    
    ws = Workspace.from_config()
    
    print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
    print("Workspace not accessible. Create the workspace")
    ws = Workspace.create(name= workspace_name,
               subscription_id= subscription_id,
               resource_group= resource_group,
               create_resource_group=True,
               location= workspace_region
               )
    
# Fetch and Display the workspace
ws = Workspace.from_config()

#Display the details
#ws.get_details()

Workspace configuration succeeded. Skip the workspace creation steps below


# Compute 

- All ML Experiments requires Compute to execute. 

## Create / Access the Compute Resource

- Using ComputeTarget Class

In [7]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print("Found existing cpu-cluster")
except ComputeTargetException:
    print("Creating new cpu-cluster")
    
    # Specify the configuration for the new cluster
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D2_V2",
                                                           min_nodes=0,
                                                           max_nodes=4)

    # Create the cluster with the specified name and configuration
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
    
    # Wait for the cluster to complete, show the output log
    cpu_cluster.wait_for_completion(show_output=True)

Found existing cpu-cluster


# Data

- Any Machine Learning problems involves working with Data.
- It involves importing the data from the data source
- Registering, Maintaining the dataset in Data Store
- Versioning the dataset


In [8]:
#Check and List the datasets attached to our Workspace
from azureml.core import Dataset

print("\nData Stores:")
# Get the default datastore
default_ds = ws.get_default_datastore()

# Enumerate all datastores, indicating which is the default
for ds_name in ws.datastores:
    print(ds_name, "- Default =", ds_name == default_ds.name)
    
    
print("\nDatasets:")
for dataset_name in list(ws.datasets.keys()):
    dataset = Dataset.get_by_name(ws, dataset_name)
    print("\t", dataset.name, 'version', dataset.version)
    


Data Stores:
azureml_globaldatasets - Default = False
workspacefilestore - Default = False
workspaceblobstore - Default = True

Datasets:
	 diabetes dataset version 1
	 Sample: Diabetes version 1


In [9]:
# Using the data

tab_data_set = Dataset.get_by_name(ws, dataset_name)

#Taking first 20 rows and converting it to a Pandas Dataframe
tab_data_set.take(20).to_pandas_dataframe()


Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,Y
0,59,2,32.1,101.0,157,93.2,38.0,4.0,4.8598,87,151
1,48,1,21.6,87.0,183,103.2,70.0,3.0,3.8918,69,75
2,72,2,30.5,93.0,156,93.6,41.0,4.0,4.6728,85,141
3,24,1,25.3,84.0,198,131.4,40.0,5.0,4.8903,89,206
4,50,1,23.0,101.0,192,125.4,52.0,4.0,4.2905,80,135
5,23,1,22.6,89.0,139,64.8,61.0,2.0,4.1897,68,97
6,36,2,22.0,90.0,160,99.6,50.0,3.0,3.9512,82,138
7,66,2,26.2,114.0,255,185.0,56.0,4.55,4.2485,92,63
8,60,2,32.1,83.0,179,119.4,42.0,4.0,4.4773,94,110
9,29,1,30.0,85.0,180,93.4,43.0,4.0,5.3845,88,310


In [24]:
#Upload your own data

# default_ds.upload_files(files=['./data/diabetes.csv'], # Upload the diabetes csv files in /data
#                        target_path='diabetes-data/', # Put it in a folder path in the datastore
#                        overwrite=True, # Replace existing files of the same name
#                        show_progress=True)

In [10]:
# Registering the Dataset with the workspace

try:
    tab_data_set = tab_data_set.register(workspace=ws, 
                                        name='diabetes dataset',
                                        description='diabetes data',
                                        tags = {'format':'CSV'},
                                        create_new_version=True)
except Exception as ex:
    print(ex)


print('Datasets registered')

Datasets registered


In [11]:
print("Datasets:")
for dataset_name in list(ws.datasets.keys()):
    dataset = Dataset.get_by_name(ws, dataset_name)
    print("\t", dataset.name, 'version', dataset.version)

Datasets:
	 diabetes dataset version 1
	 Sample: Diabetes version 1


# Experiment

- In Azure Machine Learning, an experiment is a named process, usually the running of a script or a pipeline, that can generate metrics and outputs and be tracked in the Azure Machine Learning workspace.

## Create Experiment

- An experiment can be run multiple times, with different data, code, or settings; and Azure Machine Learning tracks each run, enabling you to view run history and compare results for each run.

## The Experiment Run Context

- When you submit an experiment, you use its run context to initialize and end the experiment run that is tracked in Azure Machine Learning

- You can log, monitor every run in the experiment


In [12]:
from azureml.core import Experiment
import pandas as pd

# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace = ws, name = 'my-experiment')

# Start logging data from the experiment
run = experiment.start_logging()

# All your experiment code goes here!!!


### BLAH! BLAH! BLAH! ML STUFFF

print("Hello ML World!!")


# Complete the experiment
run.complete()

Hello ML World!!


# Training Your Model

In [19]:
from azureml.core import Experiment
import pandas as pd
from azureml.core import Run, Dataset
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from azureml.core import Model



# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace = ws, name = 'my-mlseries-ep1')

# Start logging data from the experiment
run = experiment.start_logging()
run = Run.get_context()
dataset = Dataset.get_by_name(ws, dataset_name)

# Get the training dataset
print("Loading Data...")
diabetes = dataset.to_pandas_dataframe()

# Separate features and labels
X, y = diabetes[['AGE','SEX','BMI','BP','S1','S2','S3','S4','S5','S6']].values, diabetes['Y'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)


# model = LinearRegression().fit(X_train,y_train)
model = LogisticRegression( solver="liblinear").fit(X_train, y_train)

print('Coefficients: \n', model.coef_)
# The mean squared error
mse = np.mean((model.predict(X_test) - y_test) ** 2)
print("Mean squared error: %.2f" % np.mean((model.predict(X_test) - y_test) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % model.score(X_test, y_test))

run.log('MSE',mse)

os.makedirs('outputs_ep1', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename='outputs_ep1/diabetes_model.pkl')



run.complete()

Loading Data...
Coefficients: 
 [[-1.30812930e-03 -3.49677716e-01  1.92253207e-02 ... -2.42169882e-01
   6.90486870e-02 -5.47646928e-02]
 [ 9.07457882e-02 -1.53463757e-01  2.67652342e-01 ... -2.31621581e-01
   1.66079678e-01 -1.55542186e-02]
 [-4.57926170e-02 -7.96261592e-02  5.10937059e-01 ...  1.25280503e-01
   1.50677663e-01 -1.86929097e-01]
 ...
 [-7.55766174e-06  1.22519490e-01  4.34938667e-01 ... -5.60217704e-01
  -3.63281208e-01 -7.12092734e-03]
 [-1.38965324e-03 -8.45701155e-03  3.94295968e-01 ...  5.45584314e-02
  -1.25744107e-02  8.95632505e-02]
 [-1.79064130e-01 -2.58587522e-02  6.56715977e-01 ...  2.89150695e-02
   2.28028895e-02  1.00368271e-01]]
Mean squared error: 6030.80
Variance score: 0.01
Attempted to log scalar metric MSE:
6030.804511278196


# Clean up resources


In [None]:
# Delete the Resource Group to delete all ml related resources in it


# Summary of the notebook

Learn how to use Azure Machine Learning services for experimentation and model management.

As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) notebook first to set up your Azure ML Workspace. Then, run the notebooks in following recommended order.

* [train-within-notebook](./training/train-within-notebook): Train a model while tracking run history, and learn how to deploy the model as web service to Azure Container Instance.
* [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed 
 
Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).


# Get Certified as Microsoft Azure Data Scientist Associate!

## DP-100: Designing and Implementing a Data Science Solution on Azure

- Candidates for the Azure Data Scientist Associate certification should have subject matter expertise applying data science and machine learning to implement and run machine learning workloads on Azure.
Responsibilities for this role include planning and creating a suitable working environment for data science workloads on Azure. You run data experiments and train predictive models. In addition, you manage, optimize, and deploy machine learning models into production.
A candidate for this certification should have knowledge and experience in data science and using Azure Machine Learning and Azure Databricks.

- More info :  https://docs.microsoft.com/en-us/learn/certifications/exams/dp-100


In [120]:
!tar chvfz notebook.tar.gz *

End to End ML - Azure - Vivek Raja P S.ipynb
environment.yml
keys.txt
ml-series-diabetes-exp/
ml-series-diabetes-exp/.amlignore
ml-series-diabetes-exp/.amlignore.amltmp
ml-series-diabetes-exp/.ipynb_checkpoints/
ml-series-diabetes-exp/diabetes_training.py
ml-series-diabetes-exp/score_diabetes.py
ml_train_deploy.ipynb.amltmp
outputs/
outputs/diabetes_model.pkl
