RandomForest
===============

Random forests are an ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes output by individual trees.

The algorithm for inducing a random forest was developed by Leo Breiman and Adele Cutler, and "Random Forests" is their trademark. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. 
 
The method combines Breiman's "bagging" idea and the random selection of features, introduced independently by Ho and Amit and Geman in order to construct a collection of decision trees with controlled variance.


In [12]:
library(randomForest)

In [13]:
# download Titanic Survivors data
data <- read.table("http://math.ucdenver.edu/RTutorial/titanic.txt", h=T, sep="\t")
# make survived into a yes/no
data$Survived <- as.factor(ifelse(data$Survived==1, "yes", "no"))                 
 

In [18]:
# runif(nrow(data)) <= 0.75

In [14]:
# split into a training and test set
# Handy little trick when you dont want to load up additional packages like caret

idx <- runif(nrow(data)) <= 0.75



In [15]:




data.train <- data[idx,]
data.test <- data[-idx,]


In [11]:
### Train a random forest

rf <- randomForest(Survived ~ PClass + Age + Sex, 
             data=data.train, importance=TRUE, na.action=na.omit)



In [5]:
### How important is each variable in the model?
imp <- importance(rf)
o <- order(imp[,3], decreasing=T)
imp[o,]


Unnamed: 0,no,yes,MeanDecreaseAccuracy,MeanDecreaseGini
Sex,48.36676,52.74725,53.1847,73.4501
PClass,22.62161,21.91741,25.86533,22.85662
Age,22.08949,14.09322,25.24734,19.97112


### Display the confusion matrix



In [6]:
# confusion matrix [[True Neg, False Pos], [False Neg, True Pos]]
table(data.test$Survived, predict(rf, data.test),
  dnn=list("actual", "predicted"))
#      predicted
#actual  no yes
#   no  427  16
#   yes 117 195

      predicted
actual  no yes
   no  422  21
   yes 112 200