# Why Expose A Model As API?

R is widely used for data science and deploying models into production. The end users are typically IT and enterprise software that need to consume solutions that can be easily integrated with their software.

You can replicate the code into other programming languages used widely around the world by software developers but here lies the problem - Majority of the programming languages do not have libraries to perform ML!!!

So one good solution is to expose the model as API. The most common type of API is REST/RESTful API. REST APIs allow two applications each other over the Internet using HTTP as their protocol. In simpler terms, they deliver the requests from client to the server (specified in the URL) and then deliver the responses (as requested in the method - GET, PUT, POST) back to the client.

By exposing the R Model as API, we can let other applications to use model for the purposes it is designed for without worrying abut the underlying environment/architecture.

# Using Rest APIs to expose model as a service

Do we have a package in R to help with this? The answer is yes. Which one? the answer is "Plumber"

Plumber is straightforward and easy to use. It is an open source package that lets you create APIs by decorating the R functions with special annotations/comments. Comments can be prefixed with "#'" or "#*", "#'" is recommended. To read more, refer [rplumber](https://www.rplumber.io/)

Install the plumber package by typing: 


In [None]:
install.packages("plumber") 

To start using it, type : 

In [1]:
library("plumber")

Before we can start using Plumber to create APIs, we need a trained ML model.
Let's begin with training a simple model for the [Classic Titanic problem from Kaggle](https://www.kaggle.com/c/titanic/data) which predicts the Survival on the Titanic.




In [9]:
#Read the dataset
titanic_data <- read.csv("/home/ashwini/my-progs/train.csv")

#View the data types of variables
str(titanic_data)

#Preprocess the data, i.e clean the NA/missing values, convert all variables to factor 
#variables and include only the significant variables in the model.
changeTitanicData <- function(input_titantic_data) {
  cleaned_titanic_data <- data.frame(
    Sex = titanic_data$Sex,
    Pclass = as.factor(titanic_data$Pclass),
    Age = factor(dplyr::if_else(titanic_data$Age < 18, "child", "adult", "NA"), 
                  levels = c("child", "adult", "unknown")),
    Survived = as.factor(titanic_data$Survived)
  )
}

processed_titanic_data <- changeTitanicData(titanic_data) 


'data.frame':	891 obs. of  12 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
 $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
 $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...


In [6]:
#Split the data into 2 sets - train and test
split_set <- sample(1:nrow(processed_titanic_data), size = floor(0.7*nrow(processed_titanic_data)))
train_set <- processed_titanic_data[split_set, ]
test_set <- processed_titanic_data[-split_set, ]

In [13]:
#Train the model using RandomForest algorithm
#Install 'RandomForest' package. Only for the first time 
#install.packages('randomForest')
library(randomForest)

set.seed(415)
titanic_rf <- randomForest(Survived ~ Sex + Pclass + Age, data = train_set, importance = TRUE, na.action = NULL)

In [14]:
#Evaluate the model
test_predict_titanic <- predict(titanic_rf, newdata = test_set, type = "response") 
test_actual <- test_set$Survived == 1

#Calculate the model accuracy
model_accuracy <- table(test_predict_titanic, test_actual)
print(model_accuracy)
print(paste0("Accuracy: ", round(100 * sum(diag(model_accuracy))/sum(model_accuracy), 2), "%"))


                    test_actual
test_predict_titanic FALSE TRUE
                   0   113   32
                   1    11   50
[1] "Accuracy: 79.13%"


In [None]:
#Save the model
save(titanic_rf, file = "random_forest_titanic_problem.RData")