# A Tutorial on R Notebooks in Azure ML

## 1 Introduction
The purpose of this notebook is to demonstrate how to use Jupyter notebooks on the Azure Machine Learning (ML) platform to develope a model in R and publish a web service based on the model.

### 1.1 Why Azure ML R notebooks
The answer to this question depends on how much you know about Azure ML. For data scientists who're new to Azure ML and accustomed to doing all analytical work using R on local computers,  Azure Machine Learning makes it possible to write R notebooks on the cloud. So anyone with internet access can work with R from a web browser. 

If you use R and understand the basics of Azure ML, Azure ML's R notebooks make it possible to develope your models in R and then operationalize them easily. 

For data scientists who're comfortable with both R and Azure ML Experiments, R notebooks can be used together with Azure ML Experiments in different ways:
  * To explore data from Azure Machine Learning Experiment. For example, you can use R notebook to visualize your data in different ways.
  * To fit models and use techniques that are not available in Azure ML Experiments yet. For example, R offers more options in terms of variable selection techniques and a wider variety of GBM models. You can also used it for time series analysis.
  * To test code before they are used in the "Execute R Script" module of Azure ML Experiments.

### 1.2 Target audience
The target audience of this notebook are R users who have a basic understanding of Azure ML. If you are new to Azure ML, Section 2 of the [Data Scientists' Guide][guide link] provides enough information for you to follow this tutorial. 

[guide link]: https://gallery.cortanaanalytics.com/Experiment/Tutorial-for-Data-Scientists-3


## 2 Data
In this example, we'll use the housing data from the R package "MASS." There are 506 rows and 14 columns in the dataset. Available information includes median home price, average number of rooms per dwelling, crime rate by town, etc. More information about this dataset can be found at [UCI][uci link] or by running "help(Boston)" in an R terminal.

## 3 A linear regression model
For illustration purposes, we'll use "medv" - median home price - as the response variable and the remaining variables as predictors.

[uci link]: https://archive.ics.uci.edu/ml/datasets/Housing

In [1]:
# load the library to use the Boston dataset
library(MASS)

# fit a model using all variables except medv as predictors
lm1 <- lm(medv~., data = Boston)

# check model performance
summary(lm1)

pred <- predict(lm1)
mae <- mean(abs(pred-Boston$medv))
rmse <- sqrt(mean((pred-Boston$medv)^2))
rae <- mean(abs(pred-Boston$medv))/mean(abs(Boston$medv-mean(Boston$medv)))
rse <- mean((pred-Boston$medv)^2)/mean((Boston$medv-mean(Boston$medv))^2)

print(paste("Mean Absolute Error: ", as.character(round(mae,digit=6)), sep=""))
print(paste("Root Mean Squared Error: ", as.character(round(rmse,digit=6)), sep=""))
print(paste("Relative Absolute Error: ", as.character(round(rae,digit=6)), sep=""))
print(paste("Relative Squared Error: ", as.character(round(rse,digit=6)), sep=""))


Call:
lm(formula = medv ~ ., data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.595  -2.730  -0.518   1.777  26.199 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
zn           4.642e-02  1.373e-02   3.382 0.000778 ***
indus        2.056e-02  6.150e-02   0.334 0.738288    
chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
age          6.922e-04  1.321e-02   0.052 0.958229    
dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
black        9.312e-03  2.686e-03   3.467 0.000573 ***
lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
---
Signif. codes:  0

[1] "Mean Absolute Error: 3.270863"
[1] "Root Mean Squared Error: 4.679191"
[1] "Relative Absolute Error: 0.492066"
[1] "Relative Squared Error: 0.259357"


## 4 Web service
### 4.1 Deploy a web service
With the developed model, we can deploy a web service so that others can use it to make predictions. The "AzureML" package will be used for this purpose. You'll need to provide the work space ID and authorization token for an Azure machine learning workspace. The two screenshots below show where you can find them in your workspace. Either primary or secondary authorization token can be used. 

[![Figure 1][pic 1]][pic 1] Where to find workspace ID

[![Figure 2][pic 2]][pic 2] Where to find authorization token

[pic 1]: https://cloud.githubusercontent.com/assets/9322661/11348789/952a5db4-91f6-11e5-9710-8805451194dd.PNG
[pic 2]: https://cloud.githubusercontent.com/assets/9322661/11348692/09746e40-91f6-11e5-9dfa-6ac897e3c426.PNG

The code below sets up a web service. 

In [2]:
# load the library
library(AzureML)

# retrieve workspace information
ws <- workspace()
# define predict function
mypredict <- function(newdata)
{
  res <- predict(lm1, newdata)
  res
}

# a sample with predictor information
newdata <- Boston[1, 1:13]

# test the prediction function
print(mypredict(newdata))

# publish the service
ep <- publishWebService(ws = ws, fun = mypredict, name = "HousePricePrediction", inputSchema = newdata)

: package 'AzureML' was built under R version 3.2.3

       1 
30.00384 


### 4.2 Consume a web service
After setting up a web service, you can use R scripts to consume it in three ways: one is in-session and two are out-of-session.

#### 4.2.1 In-session consumption
If you are consuming the web service in the same session that the web service was set up, you can refer to the endpoint directly.

In [3]:
pred <- consume(ep, newdata)
pred

Request failed with status 401. Waiting 7.7 seconds before retry
........Request failed with status 401. Waiting 12.8 seconds before retry
.............

Unnamed: 0,ans
1,30.00384


#### 4.2.2 Out-of-session consumption
If you consume the web service in a new session, you can do it in two ways. In the first approach, you need to save the workspace information - workspace id and  authorization token - and web service ID. Such information can then be used by the consume\(\) function as shown below.

In [4]:
# save workspace ID and authorization token
ws_id <- ws$id
ws_auth <- ws$.auth
# save web service ID
service_id <- ep$WebServiceId
# define workspace, this is necessary if you are running outside of the service deployment session
ws <- workspace(
   id = ws_id,
   auth = ws_auth
)
# defind endpoint based on workspace and service ID information
ep_price_pred <- endpoints(ws, service_id)
# consume
consume(ep_price_pred, newdata)

Unnamed: 0,ans
1,30.00384


Alternatively, you can save the API information and call it directly. 

In [5]:
# save the API info
url <- ep$ApiLocation
api_key <- ep$PrimaryKey
help_url <- ep$HelpLocation

# clean up the url string
url_split <- strsplit(url, "&")
url_s <- paste(url_split[[1]][1],url_split[[1]][2], sep = "&")

The code below access the API with the saved information.

In [6]:
library("RCurl")
library("rjson")

# Accept SSL certificates issued by public Certificate Authorities
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

h = basicTextGatherer()
hdr = basicHeaderGatherer()


req = list(

        Inputs = list(

 
            "input1" = list(
                "ColumnNames" = list("crim", "zn", "indus", "chas", "nox", "rm", "age", 
                                     "dis", "rad", "tax", "ptratio", "black", "lstat"),
                "Values" = list( list( "0.00632", "18", "2.31", "0", "0.538", "6.575", "65.2", 
                                      "4.0900", "1", "296", "15.3", "396.9", "4.98" ),  
                                list( "0.02731", "0", "7.07", "0", "0.469", "6.421", "78.9", 
                                     "4.9671", "2", "242", "17.8", "396.9", "9.14" )  )
            )),
        GlobalParameters = setNames(fromJSON('{}'), character(0))
)

body = enc2utf8(toJSON(req))
api_key = api_key # Replace this with the API key for the web service
authz_hdr = paste('Bearer', api_key, sep=' ')

h$reset()
curlPerform(url = url_s,
            httpheader=c('Content-Type' = "application/json", 'Authorization' = authz_hdr),
            postfields=body,
            writefunction = h$update,
            headerfunction = hdr$update,
            verbose = TRUE
            )

headers = hdr$value()
httpStatus = headers["status"]
if (httpStatus >= 400)
{
    print(paste("The request failed with status code:", httpStatus, sep=" "))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(headers)
}

print("Result:")
result = h$value()
print(fromJSON(result))

Loading required package: bitops


[1] "Result:"
$Results
$Results$output1
$Results$output1$type
[1] "table"

$Results$output1$value
$Results$output1$value$ColumnNames
[1] "ans"

$Results$output1$value$ColumnTypes
[1] "Double"

$Results$output1$value$Values
[1] "30.0038433770169" "25.0255623790532"



$Results$output2
$Results$output2$type
[1] "table"

$Results$output2$value
$Results$output2$value$ColumnNames
[1] "R Output JSON"

$Results$output2$value$ColumnTypes
[1] "String"

$Results$output2$value$Values
[1] "{\"Standard Output\":\"RWorker pushed \\\"port1\\\" to R workspace.\\r\\n\",\"Standard Error\":\"R reported no errors.\",\"visualizationType\":\"rOutput\",\"Graphics Device\":[\"iVBORw0KGgoAAAANSUhEUgAAAeAAAAHgCAMAAABKCk6nAAAABlBMVEVkBgD///+5/CcxAAAAAnRSTlMA/1uRIrUAAARTSURBVHic7dEBCQAwDMCwzb/pqTiHkigodJa0+R3AWwbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQbHGRxncJzBcQb

The above service consumption code is based on those at the help_url, which contains, among others, sample code written in C#, Python, and R for consuming the web service. To check out the code for yourself, copy the help_url value to a new browser and click on "Request Response." Scroll down the newly opened page till you see the section "Sample Code" as in [Figure 3][pic 3]. Click on the R tab and you'll see the sample code. Two changes were made in the above code:
  1) fill in the url and api\_key with the values returned for the current web service, and 
  2) enter the values for the first two records. 

[![Figure 3][pic 3]][pic 3] Figure 3

[pic 3]: https://cloud.githubusercontent.com/assets/9322661/11215051/560aa424-8d12-11e5-844a-f1911f988192.PNG

### 4.3 Update a web service
After making improvements to a model, you can update the existing web service. For this purpose you can use the updateWebService\(\) function by specifying the web service ID. 

In [7]:
# define test function
mypredictnew <- function(newdata)
{
  res <- predict(lm1, newdata) + 100
  res
}

# update service with the new function
ep_update <- updateWebService(
  ws = ws,
  fun = mypredictnew, 
  name = "newpredict", # this does not matter since serviceId is provided
  inputSchema = newdata, 
  serviceId = service_id   
)

## 5 Conclusion
Through this example, you've learned how to fit a model, deploy the model on Azure, and consume the service, all using Azure ML's Jupyter notebook.

The R package AzureML also allows you to read data from Azure ML workspace or experiments, making it possible for users to have a seamless experience between Azure ML experiments and notebooks. Details can be found in the package's documentation.

---  
Created by a Microsoft Employee.  
Copyright (C) Microsoft. All Rights Reserved.