# Using the AzureML R package to connect Azure ML Studio and R

This notebook demonstrates some of the capabilities of the `AzureML` package:
  * Read, download, and delete data in Azure ML workspace
  * Read intermediate data from Azure ML experiment
  * Allow for a concise way of publishing and consuming web services

The target audience should have a basic understanding of the [Azure Machine Learning studio][studio link]. Specifically, you should:
* Know how to get the **workspace ID** and **authorization token**
     - Note that this step is not necessary in Azure ML Jupyter notebooks, because the notebook service stores your credentials in the file system
* Understand setting up web services on Azure
 
If you are completely new to Azure ML, the [Tutorial for Data Scientists](tutorial link) can help you get started.
All results as shown here are from my own Azure ML workspace.

Note that you don't have to specify own workspace ID and authorization token to run the code.

[AzureML link]: https://github.com/RevolutionAnalytics/AzureML
[CRAN link]: https://cran.r-project.org/web/packages/AzureML/index.html
[studio link]: https://studio.azureml.net/
[tutorial link]: https://gallery.cortanaanalytics.com/Experiment/Tutorial-for-Data-Scientists-3

## Load the package

The package is already in the Jupyter service on Azure ML.

Start by loading the package.

In [None]:
library(AzureML)

## Work with Workspace

The `AzureML` R package allows you to work with workspaces directly. Specifically, with this package users can read, download, and delete data in an Azure ML workspace.

### Connect with AzureML workspace
Start by loading the library and setting up connection with an AzureML workspace.

In [None]:
# Connect to the workspace
ws <- workspace()

### List datasets
The *datasets* attribute of a workspace() variable contains information about all the datasets in the workspace, including the default datasets from Microsoft.

In [None]:
# list first several datasets in my workspace
head(datasets(ws, filter = "sample")$Name)



### Download a dataset
To download a dataset we can use the download.datasets() function.

In [None]:
# download datasets
movies <- download.datasets(ws, name = "Movie Ratings")
head(movies)

In [None]:
options(repr.plot.width = 6, repr.plot.height = 4)
hist(movies$Rating, main = "Rating", xlab = NULL)

### Upload a dataset
We'll use the air quality dataset that comes with base R to show how a dataset can be uploaded. Note that if dataset with the same name already exists in the workspace an error will be reported.

In [None]:
airquality[1:10,]

In [None]:
# uploading R data frame to Azure ML workspace
mydata <- airquality[1:10,]
# information about the uploaded dataset in the workspace will be returned
upload.dataset(mydata, ws, name = "my air quality") 

In [None]:
# download to check its content
head(download.datasets(ws, name = "my air quality"))

### Delete a dataset
If the delete action is successful the returned status value for Deleted should be *TRUE*.

In [None]:
# delete dataset
delete.datasets(ws, name = "my air quality")

The "Airport Codes Dataset" is one of the dafault datasets in Azure ML. This example shows that the default datasets cannot be deleted.

In [None]:
# delete Azure sample dataset: not allowed
# Uncomment the following line to see the failure report

# delete.datasets(ws, name = "Airport Codes Dataset")

## Work with experiments
The `AzureML` package allows you to get a summary of the existing experiments and to download the intermediate datasets.

### List existing experiments
Information for all experiments in the workspace, including the default ones from Microsoft, can be returned. 

In [None]:
# experiments
exps <- experiments(ws)
head(
    with(exps, data.frame(Description, ExperimentId, Creator, stringsAsFactors = FALSE))
    )
#head(cbind(Description = exps$Description, ExperimentId = exps$ExperimentId, Creator = exps$Creator))

You can also filter by using the experiments() function with the "filter" argument.

In [None]:
# check sample experiments
e <- experiments(ws, filter = "samples")
head(e$Creator)
head(cbind(e$Description, e$ExperimentId))

### Download intermediate data

You can also download intermediate data from an experiment. To do this you need information for four variables:
* experiment
* node_id
* port_name
* data_type_id.

To obtain this information, follow these simple steps

1. Click the output port of the "Convert to CSV" module in an experiment you get: [Figure 1][figure1 link]
2. Next, click "Generate Data Access Code...", to get the information in [Figure 2][figure2 link]
3. Copy and paste the code from the "R" tab to your R session



#### Step 1: Click on Convert to CSV module output

[![Figure 1][figure1 link]][figure1 link]
[figure1 link]: https://raw.githubusercontent.com/andrie/jupyter-notebook-samples/master/images/data%20access/6-download-intermediate-dataset.PNG


#### Step 2: Click Generate data access code, and "R" tab

[![Figure 2][figure2 link]][figure2 link]
[figure2 link]: https://raw.githubusercontent.com/andrie/jupyter-notebook-samples/master/images/data%20access/6-generate-data-access-code.PNG

## Copying the information to your local R session

You can copy and paste the code in the "R" tab to a local R session. When you have evaluate the code, the data gets downloaded from the Azure ML Studio to your local session.

In [None]:
# download intermediate data

# Replace the code below with the snipped provided by the AzureML Studio
#exp_data <- download.intermediate.dataset(ws = ws, 
#            experiment  = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
#            node_id = "xxxxxxxx-xxx-xxx-xxxx-xxxxxxxxxxxx-xxx",
#            port_name = "Results dataset",
#            data_type_id = "GenericCSV")

#head(exp_data)

## A concise way of consuming web service
The `AzureML`package also allows a very concise way of consuming the web service. All you need is to provide the web service ID and the workspace information. Then use `consume()` to consume the service from any R terminal (as long as you have internet access).

For illustration purpose, fit a linear model and deploy a web service based on the model.

If you encounter the error `Requires external zip utility. Please install zip, ensure it's on your path and try again` while running this on Windows, you can install [RTools][rtools link] and add the install directory to the system path. For example, if it's installed in `C:\Tools`, you should add `C:\Tools\bin` to your system path and then restart R.
 
[rtools link]: https://cran.r-project.org/bin/windows/Rtools/

In [None]:
# load the library
library(MASS)

# fit a model using all variables except medv as predictors
lm1 <- lm(medv ~ ., data = Boston)

# define predict function
mypredict <- function(newdata){
  predict(lm1, newdata)
}

# test the prediction function
newdata <- Boston[1, 1:13]

# Publish the service
ep <- publishWebService(ws = ws, fun = mypredict, name = "HousePricePrediction", inputSchema = newdata)
str(ep)


Now you are ready to consume the web service.

In [None]:
# consume
consume(ep, newdata)

## Additional resources
The authors of the package have a writeup <a href="https://htmlpreview.github.io/?https://github.com/RevolutionAnalytics/AzureML/blob/master/vignettes/getting_started.html">Getting Started with the AzureML Package</a> that covers a wider range of examples.

---  
Created by a Microsoft Employee.  
Copyright © Microsoft. All Rights Reserved.