# Random Forest Classifier
More information on the algorithm can be found [here](https://scikit-learn.org/stable/modules/ensemble.html#forest) or [here](https://www.datacamp.com/tutorial/random-forests-classifier-python).

In [10]:
from util import get_wpm_train_test, fit_predict_print_wp
from sklearn.ensemble import RandomForestClassifier

train_x, train_y, test_x, test_y = get_wpm_train_test()
model = RandomForestClassifier(random_state=42)

fit_predict_print_wp(model, train_x, train_y, test_x, test_y)

Accuracy: 52.75% (96/182)


## Hyperparameter tuning
Maybe our bad results are just because we didn't find the right hyperparameters. Let's try some different ones. Some of this code is based on [this site](https://www.kaggle.com/code/sociopath00/random-forest-using-gridsearchcv/notebook). Some extra parameters are based on [this site](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74).

In [25]:
from sklearn.model_selection import GridSearchCV
from util import get_manually_labeled_features

param_grid = {
    'n_estimators': [100, 200, 500, 1000, 1500],
    'max_features': ['sqrt', 'log2'],
    'max_depth' : [4,6,8, 10, 50, 100, None],
    'criterion' :['gini', 'entropy']
}

grid_model = GridSearchCV(model, param_grid, cv=5, error_score="raise", verbose=1)

grid_model.fit(get_manually_labeled_features(train_x), train_y["Winner"])

grid_model.best_params_

Fitting 5 folds for each of 140 candidates, totalling 700 fits


{'criterion': 'gini',
 'max_depth': 6,
 'max_features': 'sqrt',
 'n_estimators': 1500}

Let's test those parameters:

In [26]:
model = RandomForestClassifier(random_state=42, **grid_model.best_params_)
fit_predict_print_wp(model, train_x, train_y, test_x, test_y)

Accuracy: 53.85% (98/182)


This results in only a slight increase in accuracy (+1.10%). This is still worse than the accuracy of our RandomCalssifier (with an accuracy of 54.14%).