In [1]:
# import libraries and models
import pandas as pd
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, BaggingClassifier

In [6]:
# import data from csv
# clean data
df = (pd
      .read_csv("data.csv")
      .drop(columns=["Timestamp", "Name", "Type", "Damage"])
      )

# set features and target for models to train on and compare
features = (df.drop(columns=["Rarity"]))
target = (df["Rarity"])

# compare models and see which one performs the best
models = [RandomForestClassifier(n_estimators=100,
                                 random_state=42),
    GradientBoostingClassifier(n_estimators=100,
                               random_state=42),
    BaggingClassifier(n_estimators=100,
                      random_state=42)]

for model in models:
    (model.fit(features,
               target))

    prediction, *_ = (model.predict(features))

    confidence, *_ = (model.predict_proba(features))

    min_max_confid = [min(confidence), max(confidence)]

    print(f"{model} prediction: {prediction}")
    print(f"{model} confidence: {confidence}")
    print(f"{model}'s minimum to maximum confidence value: {min_max_confid}")
    print("--------------------------------")


RandomForestClassifier(random_state=42) prediction: Rank 1
RandomForestClassifier(random_state=42) confidence: [0.   0.97 0.03 0.   0.   0.  ]
RandomForestClassifier(random_state=42)'s minimum to maximum confidence value: [0.0, 0.97]
--------------------------------
GradientBoostingClassifier(random_state=42) prediction: Rank 1
GradientBoostingClassifier(random_state=42) confidence: [2.85822227e-04 9.66735828e-01 3.13237365e-02 6.53863698e-04
 9.36403671e-04 6.43462847e-05]
GradientBoostingClassifier(random_state=42)'s minimum to maximum confidence value: [6.434628471815968e-05, 0.966735827630308]
--------------------------------
BaggingClassifier(n_estimators=100, random_state=42) prediction: Rank 1
BaggingClassifier(n_estimators=100, random_state=42) confidence: [0. 1. 0. 0. 0. 0.]
BaggingClassifier(n_estimators=100, random_state=42)'s minimum to maximum confidence value: [0.0, 1.0]
--------------------------------


While the Random Forest and Bagging Classifiers may show higher rates of confidence to their predictions, this
confidence is misleading. If one compares the range of confidence values from lowest to highest, it becomes apparent
that the Gradient Boosting Classifier is generalizing to data better than the Bagging and Random Forest Classifier.
The same is observed when the values of each classifiers confidence values are compared. Bagging and Random Forest
are either 1 or 0 which indicates it is either very confident or not at all, a major sign of overfitting. However, if
the confidence values of the Gradient Booster are viewed, one can see they contain more variance of values than the
Bagging and Random Forest.

Therefore, the Gradient Boosting Classifier contains the best metrics to be the base model for the app's predictions
on the monster data.