# How to Communicate with Azure ML from R

## 1 Introduction

This notebook demonstrates how the AzureML package can be used for the following tasks:
  * Read, download, and delete data in Azure Machine Learning (Azure ML) workspace
  * Read intermediate data from Azure ML experiment
  * Consume web services in a straightforward way

The target audience should have a basic understanding of the [Azure Machine Learning studio][studio link]. Specifically, you should
 * Know how to get the workspace ID and authorization token 
 * Understand setting up web services on Azure
 
If you are completely new to Azure ML, the [Tutorial for Data Scientists][tutorial link] can help you get started.
[studio link]: https://studio.azureml.net/
[tutorial link]: https://gallery.cortanaanalytics.com/Experiment/Tutorial-for-Data-Scientists-3

## 2 Work with Azure ML workspace

The AzureML package allows users to work with workspaces directly. Specifically, with this package users can read, download, and delete data in an Azure ML workspace.

### 2.1 Connect with AzureML workspace
We'll start by loading the library and setting up connection with an AzureML workspace.

In [1]:
# load the library
require(AzureML)

# workspace information
ws <- workspace()

Loading required package: AzureML
: package 'AzureML' was built under R version 3.2.3

### 2.2 List datasets
The *datasets* attribute of a workspace() variable contains information about all the datasets in the workspace, including the default datasets from Microsoft.

In [2]:
# list first several datasets in my workspace
head(cbind(Name = ws$datasets$Name, DataType = ws$datasets$DataTypeId))

Name,DataType
Bill Gates RGB mod,Dataset
Office365 Faked Training Data.csv,GenericCSV
Office365 Training Data.csv,GenericCSV
text.preprocessing.zip,Zip
fraudTemplateUtil.zip,Zip
Sample Named Entity Recognition Articles,GenericTSVNoHeader


### 2.3 Download a dataset
To download a dataset we can use the download.datasets() function.

In [3]:
# download datasets
airports <- download.datasets(ws, name = "Bill Gates RGB Image", quote = "\"")
head(airports)

Unnamed: 0,X,Y,R,G,B
1,0,0,123,167,214
2,0,1,103,146,189
3,0,2,63,101,140
4,0,3,18,51,82
5,0,4,0,20,44
6,0,5,0,13,29


### 2.4 Upload a dataset
We'll use the air quality dataset that comes with base R to show how a dataset can be uploaded. Note that if dataset with the same name already exists in the workspace an error will be reported.

In [4]:
airquality[1:10,]

Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
1,41.0,190.0,7.4,67,5,1
2,36.0,118.0,8.0,72,5,2
3,12.0,149.0,12.6,74,5,3
4,18.0,313.0,11.5,62,5,4
5,,,14.3,56,5,5
6,28.0,,14.9,66,5,6
7,23.0,299.0,8.6,65,5,7
8,19.0,99.0,13.8,59,5,8
9,8.0,19.0,20.1,61,5,9
10,,194.0,8.6,69,5,10


In [5]:
# uploading R data frame to Azure ML workspace
mydata <- airquality[1:10,]
# information about the uploaded dataset in the workspace will be returned
upload.dataset(mydata, ws, name = "my air quality") 

Unnamed: 0,VisualizeEndPoint,SchemaEndPoint,SchemaStatus,Id,DataTypeId,Name,Description,FamilyId,ResourceUploadId,SourceOrigin,ellip.h,PromotedFrom,UploadedFromFilename,ServiceVersion,IsLatest,Category,DownloadLocation,IsDeprecated,Culture,Batch,CreatedDateTicks
1,NANA,NANANA,Pending,b97064acf29d45d0a1c8b951c9dfca73.1c29d4b6c5e34ee9b9590e990d144a57.v1-default-8,GenericTSV,my air quality,,1c29d4b6c5e34ee9b9590e990d144a57,57ab2e61d56b4828ba7c3033ec4facf4,FromResourceUpload,<8b>,,,0,True,,https://esprodussouthsus.blob.core.windows.net/uploadedresources/1FD65_b97064acf29d45d0a1c8b951c9dfca73_57ab2e61d56b4828ba7c3033ec4facf4.tsv?sv=2015-02-21&sr=b&sig=cReCfTMpWS3Yd54kTu6HmvVxG8nGZB%2FWoOtEVdZ8Pg4%3D&st=2016-03-11T21%3A48%3A52Z&se=2016-03-12T21%3A53%3A52Z&sp=r&rscd=attachment%3B%20filename%3D%22my%20air%20quality.tsv%22,False,default,8,6.359333e+17


In [6]:
# download to check its content
head(download.datasets(ws, name = "my air quality"))

Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
1,41.0,190.0,7.4,67,5,1
2,36.0,118.0,8.0,72,5,2
3,12.0,149.0,12.6,74,5,3
4,18.0,313.0,11.5,62,5,4
5,,,14.3,56,5,5
6,28.0,,14.9,66,5,6


### 2.5 Delete a dataset
If the delete action is successful the returned status value for Deleted should be *TRUE*.

In [7]:
# delete dataset
delete.datasets(ws, name = "my air quality")

Unnamed: 0,Name,Deleted,status_code
1,my air quality,True,204


## 3 Work with experiments
The AzureML package allows users to get a summary of the existing experiments and to download the intermediate datasets.

### 3.1 List existing experiments
Information for all experiments in the workspace, including the default ones from Microsoft, can be returned. 

In [8]:
# experiments
exps <- ws$experiments
head(cbind(Description = exps$Description, ExperimentId = exps$ExperimentId))

Description,ExperimentId
Office365 Service Request Routing [Predictive Exp.],b97064acf29d45d0a1c8b951c9dfca73.f-id.02d178f6468847479dce7d068c0bba24
Auto-featurization: Churn Prediction on KDDCup2015 Dataset,b97064acf29d45d0a1c8b951c9dfca73.f-id.2b2af538302949eb83c0168fc8512fd5
Office365 Service Request Routing,b97064acf29d45d0a1c8b951c9dfca73.f-id.9bd7cb09edfe4e18842d9a0cc02c3667
My 1st Experiment - Copy,b97064acf29d45d0a1c8b951c9dfca73.f-id.be27989c76354b048cd3c1716a0915a1
My 1st Experiment,b97064acf29d45d0a1c8b951c9dfca73.f-id.d9a71c1a24fb4e61ae52a54f7bd338f6
My 1st Experiment [Scoring Exp.],b97064acf29d45d0a1c8b951c9dfca73.f-id.e91c150cda25431f9125e4e2db4f8f1b


You can also filter by using the experiments() function with the "filter" argument.

In [9]:
# check sample experiments
e <- experiments(ws, filter = "samples")
head(e$Creator)
head(cbind(e$Description, e$ExperimentId))

0,1
"Sample 6: Train, Test, Evaluate for Regression: Auto Imports Dataset",506153734175476c4f62416c57734963.f-id.080a00ea09564d1d9aa40761a3ad2bc6
"Text Classification: Step 2 of 5, text preprocessing",506153734175476c4f62416c57734963.f-id.081f01e00eeb4eb6b817054d855cb7e9
Quantile Regression: Car price prediction,506153734175476c4f62416c57734963.f-id.2475eba8bba24cc1b41275d0dc933f7e
Multiclass Classification: News categorization,506153734175476c4f62416c57734963.f-id.25f9e9bec227445aaedeb29f791b4f32
Neural Network: Basic convolution,506153734175476c4f62416c57734963.f-id.27751df494e443779d9a1168543a5734
"Text Classification: Step 3B of 5, unigrams TF-IDF feature extraction",506153734175476c4f62416c57734963.f-id.2ab14cb54ca24ae8aef4ea3e6b93871c


### 3.2 Download intermediate data
We can also download intermediate data from an experiment. To do this we need information for four variables: experiment id, node_id, port_name, and data_type_id. After creating an Azure ML experiment, we can use the "Convert to CSV" module to convert the data. Then we can right click the output port of the module and select "Generate Data Access Code..." (as shown in [Figure 1][figure1 link]). [Figure 2][figure2 link] shows the automatically generated code. 

[![Figure 1][figure1 link]][figure1 link] Figure 1

[![Figure 2][figure2 link]][figure2 link] Figure 2

[figure1 link]: https://cloud.githubusercontent.com/assets/9322661/11898668/91a91c00-a567-11e5-9f78-dcd386344187.PNG
[figure2 link]: https://cloud.githubusercontent.com/assets/9322661/13715845/6e6ecaac-e7a5-11e5-8553-1703ab97614e.PNG

## 4 A concise way of consuming web service
The Azure ML Package also allows a very concise way of consuming the web service. All you need is to provide the web service ID and the workspace information. Then consume() can be used to consume the service from any R terminal (as long as you have internet access).

For illustration purpose, we'll fit a linear model and deploy a web service based on the model.

In [10]:
# load the library
require(MASS)

# fit a model using all variables except medv as predictors
lm1 <- lm(medv~., data = Boston)

# define predict function
mypredict <- function(newdata)
{
  res <- predict(lm1, newdata)
  res
}

# test the prediction function
newdata <- Boston[1, 1:13]
print(mypredict(newdata))

# Publish the service
ep <- publishWebService(ws = ws, fun = mypredict, name = "HousePricePrediction", inputSchema = newdata)

Loading required package: MASS


       1 
30.00384 


After deploying a web service, we can retrieve the web service ID and save it for future use.

In [11]:
# save workspace ID and authorization token
ws_id <- ws$id
ws_auth <- ws$.auth
# save web service ID
service_id <- ep$WebServiceId
# define workspace, this is necessary if you are running outside of the service deployment session
ws <- workspace(
   id = ws_id,
   auth = ws_auth
)
# defind endpoint based on workspace and service ID information
ep_price_pred <- endpoints(ws, service_id)
# consume
consume(ep_price_pred, newdata)

Unnamed: 0,ans
1,30.00384


After saving the workspace access information and the above service_id, we can run the following code from any R terminal to consume the web service.

## 5 Conclusion
The AzureML packages makes it possible to communicate with Azure ML workspace and experiments from R. This notebook demonstrated how you can accomplish some of most common tasks.

---  
Created by a Microsoft Employee.  
Copyright (C) Microsoft. All Rights Reserved.