![title](media/DataRobot.png)

### DataRobot provides R and Python package to access different functionalities in the API
1 - Project   
2 - Model             
3 - Retraining    
4- Predicting

Full documentation of the R package can be found here: https://cran.r-project.org/web/packages/datarobot/index.html

The dataset we will be using today, is the well-known "readmissions dataset". You can also find it online but it will also be available when you download this notebook.

## Getting started
You can install datarobot using install.packages command from any computer with internet access! 

In [None]:
install.packages('datarobot')

#require(devtools)
#install_version("datarobot", version = "2.16.0", repos = "http://cran.us.r-project.org")

### Loading the libraries

In [None]:
library(datarobot)
library(ggplot2)
library(reshape2)
library(MLmetrics)

### Credentials
To access the DataRobot API user need to connect to it. To make sure authorize users are accessing the DataRobot API user need to use their username, password or API token.
You also need to ensure your "API Access" configuration is ON (please ask your administrator if not).

To find your API Token, visit <code>YOUR_API_HOST</code>, log in and follow the instructions below:

![title](media/credentials_1.png)

![title](media/credentials_2.png)

![title](media/credentials_3.png)

In [None]:
ConnectToDataRobot(endpoint = "YOUR_DATAROBOT_HOST", 
                   token = "YOUR_API_KEY")


In [None]:
readmissions = read.csv("data/10k_diabetes_training.csv")

In [None]:
head(readmissions)

In [None]:
project <- StartProject(dataSource = readmissions,
                        projectName = "Readmission",
                        target = "readmitted",
                        workerCount = -1, #-1 = max worker count
                        wait = TRUE)

### Interacting with autopilot

In [None]:
PauseQueue(project) #Pause project
UnpauseQueue(project) #Unpause project
WaitForAutopilot(project, checkInterval = 20, timeout = NULL, verbosity = 1) #Wait for autopilot to complete

In [None]:
initialJobs <- ListModelJobs(project) #  This gets the currently inprogress and queued jobs

### Pick another project

### Where to find the project ID?
![title](media/model_id.png)

### What if I don't want to use my browser

In [None]:
#Print id's and names of first 5 projects
ListProjects()$projectId[1:5]
ListProjects()$projectName[1:5]

In [None]:
#Choose another project
another_project <-GetProject("YOUR_PROJECT_ID")

### Take a look at finished models

In [None]:
#Get names of top 15 models
for(i in 1:15){
    print(ListModels(project)[[i]]$modelType)
}

In [None]:
#Pick best model
best_model <- GetRecommendedModel(project, type = "Recommended for Deployment")

#Print accuracy metrics for the best model
print(best_model$metrics$AUC)
print(best_model$metrics$`Gini Norm`)

In [None]:
#Plot ROC Curve
roc_data <- GetRocCurve(best_model, source = DataPartition$VALIDATION)

ggplot(roc_data$rocPoints, aes(x = falsePositiveRate, y = truePositiveRate)) + 
  geom_point(color = "green") + xlab("False Positive Rate (Fallout)") + ylab("True Positive Rate (Sensitivity)") + 
  theme_dark() + 
  annotate("text", x = .75, y = .25, color = "white", 
           label = paste("AUC =", round(Area_Under_Curve(roc_data$rocPoints$falsePositiveRate, 
                                                         roc_data$rocPoints$truePositiveRate), 4)))

### Plotting Feature Impact

In [None]:
feature_impact <- GetFeatureImpact(best_model)

ggplot(data = feature_impact, aes(x = reorder(featureName, impactNormalized), y = impactNormalized)) + 
  geom_bar(stat = "identity") + coord_flip() + ylab("Effect") + xlab("") +
  scale_y_continuous(labels = function(x){ paste0(x*100, "%") })

### Train on 100% of Data

In [None]:
#Unlock holdout
UpdateProject(project, holdoutUnlocked = TRUE)

#Wait a bit for this task to finish running. This command returns the model job id.

#Get blueprint of the best model
blueprint <- GetBlueprint(project, best_model$blueprintId)

jobId <- RequestNewModel(project, blueprint, samplePct = 100)
best_retrained_model <- GetModelFromJobId(project, jobId)

## Predictions
#### Modelling API
You can use the modelling API if you use Python or R and there are multiple ways you can interact with it.
#### Prediction API
Any project can be called with the Prediction API if you have prediction servers. This is a simple REST API. Click on a model in the UI, then "Deploy Model" and "Activate now". You'll have access to a Python code snippet to help you interact with it. You can also deploy the model through the python API.

### Using the Modelling API

In [None]:
test_df <- read.csv("data/10k_diabetes_test.csv") #Load testing data

# Uploading the testing dataset
scoring <- UploadPredictionDataset(project, dataSource = test_df)

# Requesting prediction
predict_job_id <- RequestPredictions(project, modelId = best_model$modelId, datasetId = scoring$id)

# Grabbing predictions
predictions_prob <- GetPredictions(project, 
                                   predictId = predict_job_id, 
                                   type = "probability")

# Output
head(data.frame(True_Class = test_df$readmitted, Probability = predictions_prob))