# A Review on the Upcoming AzureML R Package
Lixun Zhang  
Dec 21, 2015

A new version of AzureML R package, currently hosted on [GitHub][AzureML link], will replace version 0.1.1 on hosted on [CRAN][CRAN link] soon. There are lots of improvements in the update. This notebook will demonstrate three new capabilities in this updated package:
  * Read, download, and delete data in Azure ML workspace
  * Read intermediate data from Azure ML experiment
  * Allow for a concise way of consuming web services

The target audience should have a basic understanding of the [Azure Machine Learning studio][studio link]. Specifically, you should
 * Know how to get the workspace ID and authorization token 
 * Understand setting up web services on Azure
 
If you are completely new to Azure ML, the [Tutorial for Data Scientists][tutorial link] can help you get started.
All results as shown here are from my own Azure ML workspace. Your should enter your own workspace ID and authorization token while trying to run the code.

[AzureML link]: https://github.com/RevolutionAnalytics/AzureML
[CRAN link]: https://cran.r-project.org/web/packages/AzureML/index.html
[studio link]: https://studio.azureml.net/
[tutorial link]: https://gallery.cortanaanalytics.com/Experiment/Tutorial-for-Data-Scientists-3

## 0 Install the package
Since the new version of AzureML R package is not on CRAN yet, we'll need to install it from GitHub directly with the following commands. 

In [1]:
if(!require("devtools")) install.packages("devtools")
devtools::install_github("RevolutionAnalytics/azureml")

Loading required package: devtools
Downloading GitHub repo RevolutionAnalytics/azureml@master
Installing AzureML
"C:/PROGRA~1/R/R-32~1.2/bin/x64/R" --no-site-file --no-environ --no-save  \
  --no-restore CMD INSTALL  \
  "C:/Users/lixzhan/AppData/Local/Temp/Rtmpw1NIe8/devtools2438580831ef/RevolutionAnalytics-AzureML-5f25624"  \
  --library="C:/Users/lixzhan/Documents/R/win-library/3.2" --install-tests 



## 1 Work with Workspace

The new version of AzureML R package allows users to work with workspaces directly. Specifically, with this package users can read, download, and delete data in an Azure ML workspace.

### 1.1 Connect with AzureML workspace
We'll start by loading the library and setting up connection with an AzureML workspace.

In [2]:
# load the library
library(AzureML)

paste("AzureML package version:", packageVersion("AzureML"))

# workspace information
ws <- workspace(
  id = "b2bbeb56a1d04e1599d2510a06c59d87",
  auth = "a3978d933cd84e64ab583a616366d160", 
  api_endpoint = "https://studio.azureml.net", 
  management_endpoint = "https://management.azureml.net"
)

### 1.2 List datasets
The *datasets* attribute of a workspace() variable contains information about all the datasets in the workspace, including the default datasets from Microsoft.

In [3]:
# list first several datasets in my workspace
head(ws$datasets) # head(datasets(ws))

Unnamed: 0,VisualizeEndPoint,SchemaEndPoint,SchemaStatus,Id,DataTypeId,Name,Description,FamilyId,ResourceUploadId,SourceOrigin,ellip.h,PromotedFrom,UploadedFromFilename,ServiceVersion,IsLatest,Category,DownloadLocation,IsDeprecated,Culture,Batch,CreatedDateTicks
1,NANA,NANANA,NotSupported,b2bbeb56a1d04e1599d2510a06c59d87.1ee9283bcb0e4affa293bf5230c69c33.v1-default-3,Zip,mypickfile.zip,,1ee9283bcb0e4affa293bf5230c69c33,b6a37b4f72ba4a86b682088b6f518311,FromResourceUpload,<8b>,,mypickfile.zip,0,True,,https://gallerystorage.blob.core.windows.net/uploadedresources/b6a37b4f72ba4a86b682088b6f518311.zip?sv=2014-02-14&sr=b&sig=Pdsja4dwoS%2BQ1ADXVpLOPjh%2BJNWfqnjYuywNDVFUP8o%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r&rscd=attachment%3B%20filename%3D%22mypickfile.zip%22,False,default,3,6.357968e+17
2,NANA,NANANA,NotSupported,b2bbeb56a1d04e1599d2510a06c59d87.5f53199900c54e81b2ef16587eb15138.v1-default-2,Zip,Hello.zip,,5f53199900c54e81b2ef16587eb15138,673a72ee8bd54478b0fa95346f78bbd0,FromResourceUpload,<8b>,,Hello.zip,0,True,,https://gallerystorage.blob.core.windows.net/uploadedresources/673a72ee8bd54478b0fa95346f78bbd0.zip?sv=2014-02-14&sr=b&sig=%2FhIe%2F7v4m76gpGlwrGAOB98IMoQqnlYPoOKtxK5%2Bn7c%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r&rscd=attachment%3B%20filename%3D%22Hello.zip%22,False,default,2,6.357968e+17
3,NANA,NANANA,NotSupported,b2bbeb56a1d04e1599d2510a06c59d87.e81303a54c204ad7a226820e7fd66b2f.v1-default-1,Zip,MyHello.zip,,e81303a54c204ad7a226820e7fd66b2f,35064df5e02e4038b062fb1f0d7e6579,FromResourceUpload,<8b>,,MyHello.zip,0,True,,https://gallerystorage.blob.core.windows.net/uploadedresources/35064df5e02e4038b062fb1f0d7e6579.zip?sv=2014-02-14&sr=b&sig=M2onKFvu7oobpbIetLgB1oUbjPOYpOnF%2FbDU%2Brt1IkY%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r&rscd=attachment%3B%20filename%3D%22MyHello.zip%22,False,default,1,6.357968e+17
4,https://gallerystorage.blob.core.windows.net/?sv=2014-02-14&sr=b&sig=emIqhHIKh1nzuYJswVlVx%2FrY75aYQidBGAyN%2FFldiSM%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r,https://gallerystorage.blob.core.windows.net/experimentoutput/aa8a3bab-b4d4-4bf3-9612-6e9cd1bf978d/be46851c-401c-44f3-9324-c23ef89ec886?sv=2014-02-14&sr=b&sig=lTprbo4xez3RSsOYZz9TmRjtnaYHmiYBbVJt4pb%2FMcA%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r,Complete,b2bbeb56a1d04e1599d2510a06c59d87.1d7fa753dd3c44638e17e84b24775b79.v1-default-32,GenericTSV,a.zip,,1d7fa753dd3c44638e17e84b24775b79,ef852bb3f8df42d980e27718f031e18b,FromResourceUpload,<8b>,,,0,False,,https://gallerystorage.blob.core.windows.net/uploadedresources/ef852bb3f8df42d980e27718f031e18b.tsv?sv=2014-02-14&sr=b&sig=KwC9H8piIpMQS0G%2BZCNqrrKkQTZhHJTcGvxIfSZ4O%2Fw%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r&rscd=attachment%3B%20filename%3D%22a.zip.tsv%22,False,default,32,6.35864e+17
5,https://gallerystorage.blob.core.windows.net/?sv=2014-02-14&sr=b&sig=AgotAfif812iMjFuPcwA1%2FyCrdIUqpM7bCXGMR3Pras%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r,https://gallerystorage.blob.core.windows.net/experimentoutput/392626fe-e6a6-4696-aad4-f861ff97cca1/b838f061-06d9-47be-9fe6-6024e4e25ea0?sv=2014-02-14&sr=b&sig=o8LroajRrPV5uSZ3cyFC33hnykTsiYlYap4M5EWZtxY%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r,Complete,b2bbeb56a1d04e1599d2510a06c59d87.3ebf31deebc74946b0958f9399b64500.v1-default-33,GenericTSV,myairquality,,3ebf31deebc74946b0958f9399b64500,60b9855784ab4de5a75fcfc79a11572d,FromResourceUpload,<8b>,,,0,False,,https://gallerystorage.blob.core.windows.net/uploadedresources/60b9855784ab4de5a75fcfc79a11572d.tsv?sv=2014-02-14&sr=b&sig=E3EWsEIxdkZEzEa5ibOLWWgkc%2BR%2BqpRDwT7o0L%2Bhzzs%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r&rscd=attachment%3B%20filename%3D%22myairquality.tsv%22,False,default,33,6.35864e+17
6,NANA,NANANA,NotSupported,506153734175476c4f62416c57734963.7bbb260f62084435bc0a39d085242d8a.v1-default-653,Zip,text.preprocessing.zip,Utility R script for text preprocessing to use with text classification template,7bbb260f62084435bc0a39d085242d8a,78f7b04e9c5043b187eb9ace35b660b5,FromResourceUpload,<8b>,,,2,True,,https://esprodussouth001.blob.core.windows.net/uploadedresources/78f7b04e9c5043b187eb9ace35b660b5.zip?sv=2014-02-14&sr=b&sig=acKTc191DspB1gfL7BhgTD4p2HMLbwk1hKpmAJ%2B%2FQxU%3D&st=2015-12-22T20%3A23%3A52Z&se=2015-12-23T20%3A28%3A52Z&sp=r&rscd=attachment%3B%20filename%3D%22text.preprocessing.zip%22,False,default,653,6.356413e+17


### 1.3 Download a dataset
To download a dataset we can use the download.datasets() function.

In [4]:
# download datasets
airports <- download.datasets(ws, name = "Bill Gates RGB Image", quote = "\"")
head(airports)

Unnamed: 0,X,Y,R,G,B
1,0,0,123,167,214
2,0,1,103,146,189
3,0,2,63,101,140
4,0,3,18,51,82
5,0,4,0,20,44
6,0,5,0,13,29


### 1.4 Upload a dataset
We'll use the air quality dataset that comes with base R to show how a dataset can be uploaded. Note that if dataset with the same name already exists in the workspace an error will be reported.

In [5]:
airquality[1:10,]

Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
1,41.0,190.0,7.4,67,5,1
2,36.0,118.0,8.0,72,5,2
3,12.0,149.0,12.6,74,5,3
4,18.0,313.0,11.5,62,5,4
5,,,14.3,56,5,5
6,28.0,,14.9,66,5,6
7,23.0,299.0,8.6,65,5,7
8,19.0,99.0,13.8,59,5,8
9,8.0,19.0,20.1,61,5,9
10,,194.0,8.6,69,5,10


In [6]:
# uploading R data frame to Azure ML workspace
mydata <- airquality[1:10,]
# information about the uploaded dataset in the workspace will be returned
upload.dataset(mydata, ws, name = "my air quality") 

Unnamed: 0,VisualizeEndPoint,SchemaEndPoint,SchemaStatus,Id,DataTypeId,Name,Description,FamilyId,ResourceUploadId,SourceOrigin,ellip.h,PromotedFrom,UploadedFromFilename,ServiceVersion,IsLatest,Category,DownloadLocation,IsDeprecated,Culture,Batch,CreatedDateTicks
1,NANA,NANANA,Pending,b2bbeb56a1d04e1599d2510a06c59d87.f505ad7ad4384008b1aeffd8c78bc648.v1-default-34,GenericTSV,my air quality,,f505ad7ad4384008b1aeffd8c78bc648,e29e15c1c89c40308be4ff92fa60d51f,FromResourceUpload,<8b>,,,0,True,,https://gallerystorage.blob.core.windows.net/uploadedresources/e29e15c1c89c40308be4ff92fa60d51f.tsv?sv=2014-02-14&sr=b&sig=d4v1fB4Jz143lWHqDfPpnXg9hZ7fakw%2FDzYqUhssmpE%3D&st=2015-12-22T20%3A23%3A57Z&se=2015-12-23T20%3A28%3A57Z&sp=r&rscd=attachment%3B%20filename%3D%22my%20air%20quality.tsv%22,False,default,34,6.358641e+17


In [7]:
# download to check its content
head(download.datasets(ws, name = "my air quality"))

Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
1,41.0,190.0,7.4,67,5,1
2,36.0,118.0,8.0,72,5,2
3,12.0,149.0,12.6,74,5,3
4,18.0,313.0,11.5,62,5,4
5,,,14.3,56,5,5
6,28.0,,14.9,66,5,6


### 1.5 Delete a dataset
If the delete action is successful the returned status value for Deleted should be *TRUE*.

In [8]:
# delete dataset
delete.datasets(ws, name = "my air quality")

Request failed with status 400. Waiting 1 seconds before retry
.

Request failed with status 400. Waiting 3 seconds before retry
...



Unnamed: 0,Name,Deleted,status_code
1,my air quality,True,204


The "Airport Codes Dataset" is one of the dafault datasets in Azure ML. This example shows that the default datasets cannot be deleted.

In [9]:
# delete Azure sample dataset: not allowed
delete.datasets(ws, name = "Airport Codes Dataset")

Request failed with status 400. Waiting 1 seconds before retry
.

Request failed with status 400. Waiting 3 seconds before retry
...

Request failed with status 400. Waiting 2 seconds before retry
..

Request failed with status 400. Waiting 6 seconds before retry
......

Request failed with status 400. Waiting 18 seconds before retry
..................



Unnamed: 0,Name,Deleted,status_code
1,Airport Codes Dataset,False,400


## 2 Work with experiments
The new version of the AzureML package allows users to get a summary of the existing experiments and to download the intermediate datasets.

### 2.1 List existing experiments
Information for all experiments in the workspace, including the default ones from Microsoft, can be returned. 

In [10]:
# experiments
exps <- ws$experiments
head(cbind(Description = exps$Description, ExperimentId = exps$ExperimentId, Creator = exps$Creator))

Description,ExperimentId,Creator
Python Code Web Service - This version doesn't work,b2bbeb56a1d04e1599d2510a06c59d87.f-id.0121548866a645d9bb43b0bec5977a09,lxzhang
Check sklearn version,b2bbeb56a1d04e1599d2510a06c59d87.f-id.5406f11d120649b48f7644ec19c22b88,lxzhang
Python Call Script,b2bbeb56a1d04e1599d2510a06c59d87.f-id.75772735de634c1fb353e636ba170ee9,lxzhang
Data for Jupyter Notebooks,b2bbeb56a1d04e1599d2510a06c59d87.f-id.911630d13cbe4407b9fe408b5bb6ddef,lxzhang
Python Code Web Service - This version works,b2bbeb56a1d04e1599d2510a06c59d87.f-id.9ccc3e3b1942476ca1b7d75e90837861,lxzhang
Experiment created on 12/22/2015,b2bbeb56a1d04e1599d2510a06c59d87.f-id.d6045275482d4b3a81215ed023125d96,lxzhang


You can also filter by using the experiments() function with the "filter" argument.

In [11]:
# check sample experiments
e <- experiments(ws, filter = "samples")
head(e$Creator)
head(cbind(e$Description, e$ExperimentId))

0,1
"Sample 6: Train, Test, Evaluate for Regression: Auto Imports Dataset",506153734175476c4f62416c57734963.f-id.080a00ea09564d1d9aa40761a3ad2bc6
"Text Classification: Step 2 of 5, text preprocessing",506153734175476c4f62416c57734963.f-id.081f01e00eeb4eb6b817054d855cb7e9
Quantile Regression: Car price prediction,506153734175476c4f62416c57734963.f-id.2475eba8bba24cc1b41275d0dc933f7e
Multiclass Classification: News categorization,506153734175476c4f62416c57734963.f-id.25f9e9bec227445aaedeb29f791b4f32
Neural Network: Basic convolution,506153734175476c4f62416c57734963.f-id.27751df494e443779d9a1168543a5734
"Text Classification: Step 3B of 5, unigrams TF-IDF feature extraction",506153734175476c4f62416c57734963.f-id.2ab14cb54ca24ae8aef4ea3e6b93871c


### 2.2 Download intermediate data
We can also download intermediate data from an experiment. To do this we need information for four variables: experiment, node_id, port_name, and data_type_id. [Figure 1][figure1 link] shows up when I right click the output port of the "Convert to CSV" module in my experiment and [Figure 2][figure2 link] shows the information about the dataset after I click "Generate Data Access Code..." in Figure 1. From the several tests I did, it seems that the note_id remains unchanged after making changes to the experiment, as long as the note was never deleted.

[![Figure 1][figure1 link]][figure1 link] Figure 1

[![Figure 2][figure2 link]][figure2 link] Figure 2

[figure1 link]: https://cloud.githubusercontent.com/assets/9322661/11898668/91a91c00-a567-11e5-9f78-dcd386344187.PNG
[figure2 link]: https://cloud.githubusercontent.com/assets/9322661/11898669/91acd70a-a567-11e5-8e7c-1ed3c31572be.png

In [12]:
# download intermediate data
exp_data <- download.intermediate.dataset(ws = ws, 
            experiment  = "b2bbeb56a1d04e1599d2510a06c59d87.f-id.e3ffb92ffcce4bd7bd5cac45af47ca1c",
            node_id = "a2b5bd15-bd30-483a-aa92-c1ce8f848a07-108",
            port_name = "Results dataset",
            data_type_id = "GenericCSV")

head(exp_data)

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
1,0.00632,18,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
2,0.02731,0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
3,0.02729,0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
4,0.03237,0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
5,0.06905,0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2
6,0.02985,0,2.18,0,0.458,6.43,58.7,6.0622,3,222,18.7,394.12,5.21,28.7


## 3 A concise way of consuming web service
The new version also allows a very concise way of consuming the web service. All you need is to provide the web service ID and the workspace information. Then consume() can be used to consume the service from any R terminal (as long as you have internet access).

For illustration purpose, we'll fit a linear model and deploy a web service based on the model.

If you encounter the error "Requires external zip utility. Please install zip, ensure it's on your path and try again" while running this on Windows, you can install [RTools][rtools link] and add the install directory to the system path. For example, if it's installed in *C:\Tools*, you should add *C:\Tools\bin* to your system path and then restart R.
 
[rtools link]: https://cran.r-project.org/bin/windows/Rtools/

In [13]:
# load the library
library(MASS)

# fit a model using all variables except medv as predictors
lm1 <- lm(medv~., data = Boston)

# define predict function
mypredict <- function(newdata)
{
  res <- predict(lm1, newdata)
  res
}

# test the prediction function
newdata <- Boston[1, 1:13]
print(mypredict(newdata))

# Publish the service
ep <- publishWebService(ws = ws, fun = mypredict, name = "HousePricePrediction", inputSchema = newdata)

       1 
30.00384 


After deploying a web service, we can retrieve the web service ID and save it for future use.

In [14]:
# retrieve web service ID
service_id <- ep$WebServiceId
print(service_id)

[1] "bb2ff52ca8ea11e5a4dddd64db11e2ca"


After saving the workspace access information and the above service_id, we can run the following code from any R terminal to consume the web service.

In [15]:
# obtain endpoint information based on workspace information and service ID
ep_price_pred <- endpoints(ws, service_id)
# consume
consume(ep_price_pred, newdata)

Request failed with status 401. Waiting 3 seconds before retry
...



Unnamed: 0,ans
1,30.00384


## 4 Conclusion
There are many major improvements in the new version of AzureML R package. In this notebook we saw three new capabilities that are not present in version 0.1.1. Another important improvement is that there are more helpful descriptions for functions such as publishWebService(). As another example, the argument "wsID" for web service ID in updateWebService() has a new name "serviceId", preventing people from thinking it means workspace ID. 

## 5 Additional resources
The authors of the package have a writeup <a href="https://htmlpreview.github.io/?https://github.com/RevolutionAnalytics/AzureML/blob/master/vignettes/getting_started.html">Getting Started with the AzureML Package</a> that covers a wider range of examples. The [Bug bash instructions][bugbash link] has some helpful information as well.

[bugbash link]: https://github.com/RevolutionAnalytics/AzureML/wiki/Bug-bash-instructions

---  
Created by a Microsoft Employee.  
Copyright © Microsoft. All Rights Reserved.