<h2> Decision Forest Example</h2>
This R notebook shows an example of using the Decision Forest Model.  This example uses the dataset "fgl" found in the R package "MASS".

<i>NOTE: You must have a connection to Teradata Vantage that has the Teradata analytic functions installed.</i>

In [None]:
help(package=tdplyr,td_decision_forest_mle)

In [None]:
help(package=tdplyr,td_decision_forest_predict_sqle)

<h3> Include libraries and create a connection using the native driver </h3>

In [None]:
library(tdplyr)
library(dplyr)
library(dbplyr)
library(DBI)
library(MASS)

# Replace your cluster details for user, passwd and host
user = "xxxxxx"
passwd = "xxxxxx"
host = "xxxxxx"
con <- td_create_context(host = host, uid = user, pwd = passwd, dType = "native")
con



<h3> Perform preliminary tasks </h3>

In [None]:
fgl_with_rowids <- cbind(rownames(fgl), fgl)
newColNames <- c("rowID", "RI", "Na", "Mg", "Al", "Si", "K", "Ca", "Ba", "Fe", "type")

colnames(fgl_with_rowids) <- newColNames

<h3>Divide the data into training and test datasets using the split() function.  </h3>

In [None]:
glass_types <- split(fgl_with_rowids, fgl_with_rowids$type)

<h3>Use "glass_types" to get the data of each individual type in separate lists.</h3>

In [None]:
WinF <- glass_types[[1]]
WinNF <- glass_types[[2]]
Veh <- glass_types[[3]]
Con <- glass_types[[4]]
Tabl <- glass_types[[5]]
Head <- glass_types[[6]]

<h3>Divide the observations for each type into training and test subsets. In this example, use 70% the observations as training data and the rest 30% as test data. </h3>

In [None]:
WinF_train_indices <- sample(1:nrow(WinF), 0.7*nrow(WinF))
WinF.test <- WinF[-WinF_train_indices,]
WinF.train <- WinF[WinF_train_indices,]

WinNF_train_indices <- sample(1:nrow(WinNF), 0.7*nrow(WinNF))
WinNF.test <- WinNF[-WinNF_train_indices,]
WinNF.train <- WinNF[WinNF_train_indices,]

Veh_train_indices <- sample(1:nrow(Veh), 0.7*nrow(Veh))
Veh.test <- Veh[-Veh_train_indices,]
Veh.train <- Veh[Veh_train_indices,]

Con_train_indices <- sample(1:nrow(Con), 0.7*nrow(Con))
Con.test <- Con[-Con_train_indices,]
Con.train <- Con[Con_train_indices,]

Tabl_train_indices <- sample(1:nrow(Tabl), 0.7*nrow(Tabl))
Tabl.test <- Tabl[-Tabl_train_indices,]
Tabl.train <- Tabl[Tabl_train_indices,]

Head_train_indices <- sample(1:nrow(Head), 0.7*nrow(Head))
Head.test <- Head[-Head_train_indices,]
Head.train <- Head[Head_train_indices,]

<h3> Combine the training and test subsets for each type to create the training and test datasets "fgl.tr" and "fgl.te", respectively. </h3>

In [None]:
fgl.tr <- rbind(WinNF.train, Con.train, Tabl.train, Veh.train, WinF.train, Head.train)

fgl.te <- rbind(WinNF.test, Con.test, Tabl.test, Veh.test, WinF.test, Head.test)

<h3> Save the training and test datasets into the Teradata Database using the copy_to() function. </h3>

In [None]:
copy_to(con, fgl.tr, name="fgl_train", overwrite=FALSE)

copy_to(con, fgl.te, name="fgl_test", overwrite=FALSE)

<h3> Create R tables from the Teradata Database tables using the tbl() function. </h3>

In [None]:
tddf_fgl.tr <- tbl(con, "fgl_train")

tddf_fgl.te <- tbl(con, "fgl_test")

<h3>Create two different Decision Forest models with the training datasets using the td_decision_forest_mle tdplyr analytic function. </h3>

In [None]:
glass_rf_list_1 <- td_decision_forest_mle(
  formula = (type ~ RI + Na + Mg + Al + Si + K + Ca + Ba + Fe),
  tree.type = "classification",
  data = tddf_fgl.tr,
  ntree = 5)

glass_rf_list_2 <- td_decision_forest_mle(
  formula = (type ~ RI + Na + Mg + Al + Si + K + Ca + Ba + Fe),
  tree.type = "classification",
  data = tddf_fgl.tr,
  ntree = 6,
  mtry = 3)

<h3> Predict on the test dataset for each model using the td_decision_forest_predict_sqle tdplyr analytic function. </h3>

In [None]:
td_decision_forest_predict_sqle(
  object = glass_rf_list_1,
  newdata = tddf_fgl.te,
  id.column = "rowID"
)

td_decision_forest_predict_sqle(
  object = glass_rf_list_2,
  newdata = tddf_fgl.te,
  id.column = "rowID"
)

<h4> Remove tables created by this example </h4>

In [None]:
dbRemoveTable(con,"fgl_test")

In [None]:
dbRemoveTable(con,"fgl_train")

In [None]:
td_remove_context()

<span style="font-size:16px;">For more information on the Teradata analytic functions, refer to the [Teradata Documentation](https://docs.teradata.com/) and search for Teradata R Package.</span>

Copyright 2019 Teradata. All rights reserved.