# Random Forest Classification

[Random Forest](https://medium.com/machine-learning-101/chapter-5-random-forest-classifier-56dc7425c3e1) is an ensemble model. An ensemble model is one executes a number of different models to determine a value's classification, and then takes a vote amongst the models' predicted classes. In the case of Random Forest, the data is randomly sampled numerous times to generate a number of simple Decision Trees. The predicted classification of each Decision Tree is then aggregated to inform the final classification.


#### Some of the benefits of Decision Trees:

- Robust against overfitting because each decision tree is built with only a small sample of the training data
- Not highly influenced by outliers due to a binning approach
- Fast to train (training processes can be parallelized)
- Works for both classificatio and regression (would need to use RandomForestRegressor)
- Requires very little preprocessing and can handle numerical, categorical, binary features
- Works well with high dimensionality because trees split on randomly selected features

#### Some of the drawbacks of Decision Trees:

- Low interpretability due to the random sampling of data (can't be visualized)
- For large datasets, the trees can take up a lot of memory
- Model can generate different results upon each execution, due to the random sampling

### Import dependencies

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Load the Iris Dataset
iris = load_iris()

### Split data into Train/Test sets

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=1, stratify=iris.target)

### Fit RandomForestClassifier to training set and score with test

In [None]:
rf = RandomForestClassifier(n_estimators=200)
rf = rf.fit(X_train, y_train)
rf.score(X_test, y_test)

### Random Forests in sklearn will automatically calculate feature importance

In [None]:
importances = rf.feature_importances_
importances

### We can sort the features by their importance

In [None]:
sorted(zip(rf.feature_importances_, iris.feature_names), reverse=True)