# Evaluating Monte Carlo Error of Random Forest Algorithms for Bullet Land Matching

## Methods

### Random Forests

- Take many bootstrap samples from the training set and train a decision tree on each one
- For each decision tree, for each split, maximize Gini impurity over all possible splits on a randomly-chosen subset of features

### Bayesian Forests

- Bootstrapping can be viewed as taking a size-$n$ sample from a multinomial distribution with categories corresponding to each $\mathbf{x}_i$ value. The probability of each category is $\frac{1}{n}$.
- The Bayesian bootstrap simply adds a Dirichlet level to the hierarchy. First, a vector of $n$ probability values (summing to 1) is generated from a Dirichlet distribution, each with mean $\frac{1}{n}$. These probabilities are then used as the class probabilities for the multinomial draw.
- Use the Bayesian bootstrap over the standard one in the random forest algorithm to get "Bayesian forests".

### Extra-Trees

- Use the standard bootstrap.
- For each split (with its corresponding subset of features), don't search over the set of all splits. Instead, choose the best of a small number of randomly-chosen splits.

### Random Rotation Forests

- Do everything as in a standard random forest algorithm, except...
- Also randomly rotate the input space before training each decision tree.

### (BART)

Fit the model:
$$ y \sim \text{Bernoulli}(p) $$
with
$$ p = \Phi \left[ \sum_{i = 1}^m h_i(x) \right]$$

where each $ h_i(x) $ is a decision tree.

- Put priors on decision trees by specifying priors:
  - the probability that a given node is terminal
  - the probability of a particular split at a node
  - the probability of a particular terminal node value
- Use a Gibbs sampler to get a posterior distribution.
- Sampling from full conditionals requires a weird version of the Metropolis-Hastings algorithm.

## Plots

### ROC Curves, Faceted by Method and Dataset

![](images/roc_plot_12.png)

### Interval Width Distributions

![](images/big_plot.png)

### Interval Widths by Predicted Score

![](images/scatterplot_widths.png)