# Understanding PMML

PMML stands for **Predictive Model Markup Language**. It is an XML-based standard for not only describing machine learning and data mining models but also for interchanging these models between PMML compliant systems.

PMML also has several in-built functions to perform data pre and post-processing. Thus a simple PMML file can include an entire predictive solution, right from raw data to business decisions.

PMML is developed by the Data Mining Group, a consortium responsible for developing standards for data science.To read more, click on **[PMML documentation](http://dmg.org/).**

### **Why use PMML?**

A PMML file consists of all the necessary information needed for deployments, such as time of model creation, software used to create the model, list of all possible fields used in the model, data transformations, inputs, targets, and outputs.
For details on the PMML components refer, **[PMML Components](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language).**
The export and import of this file allow for the easy sharing of models among different tools and software without having to worry about software, operating systems or any other inconsistencies or incompatibilities.

# How to convert a R model to PMML

Before we convert a model to PMML, let's build a simple Random forest model for the **[Classic Titanic dataset from Kaggle](https://www.kaggle.com/c/titanic/data)** for predicting the Survival probability of passengers onboard the Titanic.

### Model Building:

In [1]:
#titanic.R

titanic_data <- read.csv("dataset/train.csv")

#View the data types of variables
str(titanic_data)

#Preprocess the data, i.e clean the NA/missing values, convert all variables to factor 
#variables and include only the significant variables in the model.
changeTitanicData <- function(input_titantic_data) {
  cleaned_titanic_data <- data.frame(
    Survived = factor(input_titantic_data$Survived, levels = c(0, 1)),
    Sex = factor(input_titantic_data$Sex, levels = c("male", "female")),
    Pclass = factor(input_titantic_data$Pclass, levels = c("1", "2", "3")),
    Age = factor(dplyr::if_else(input_titantic_data$Age < 18, "child", "adult", "NA"), 
                  levels = c("child", "adult", "NA"))
    )
}

processed_titanic_data <- changeTitanicData(titanic_data) 

#Split the data into 2 sets - train and test
split_set <- sample(1:nrow(processed_titanic_data), size = floor(0.7*nrow(processed_titanic_data)))
train_set <- processed_titanic_data[split_set, ]
test_set <- processed_titanic_data[-split_set, ]

#Train the model using RandomForest algorithm
#Install 'RandomForest' package, only for the first time
#install.packages('randomForest')
library(randomForest)

set.seed(415)
titanic_rf <- randomForest(Survived ~ Sex + Pclass + Age, data = train_set, importance = TRUE, na.action = NULL)


'data.frame':	891 obs. of  12 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ...
 $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ...
 $ Embarked   : Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ...


randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.


Once you have trained your model, you can install the "pmml" package, create a pmml object and save that in a file for future use/reference.


In [2]:
#Use the pmml package, create an object and save it in a file
library("pmml")
pmmlObj <- pmml(titanic_rf, model.name = "randomForest_Model", description = "Random Forest Tree model for Titanic dataset")
savePMML(pmmlObj, name = "model/titanicRF.pmml")

Loading required package: XML


[1] "Now converting tree  1  to PMML"
[1] "Now converting tree  2  to PMML"
[1] "Now converting tree  3  to PMML"
[1] "Now converting tree  4  to PMML"
[1] "Now converting tree  5  to PMML"
[1] "Now converting tree  6  to PMML"
[1] "Now converting tree  7  to PMML"
[1] "Now converting tree  8  to PMML"
[1] "Now converting tree  9  to PMML"
[1] "Now converting tree  10  to PMML"
[1] "Now converting tree  11  to PMML"
[1] "Now converting tree  12  to PMML"
[1] "Now converting tree  13  to PMML"
[1] "Now converting tree  14  to PMML"
[1] "Now converting tree  15  to PMML"
[1] "Now converting tree  16  to PMML"
[1] "Now converting tree  17  to PMML"
[1] "Now converting tree  18  to PMML"
[1] "Now converting tree  19  to PMML"
[1] "Now converting tree  20  to PMML"
[1] "Now converting tree  21  to PMML"
[1] "Now converting tree  22  to PMML"
[1] "Now converting tree  23  to PMML"
[1] "Now converting tree  24  to PMML"
[1] "Now converting tree  25  to PMML"
[1] "Now converting tree  26  to P