You want a simple baseline classifier to compare against your model.

Use scikit-learn’s DummyClassifier



In [1]:
# Load libraries
from sklearn.datasets import load_iris
from sklearn.dummy import DummyClassifier

In [2]:
from sklearn.model_selection import train_test_split
# Load data
iris = load_iris()
# Create target vector and feature matrix
features, target = iris.data, iris.target
# Split into training and test set
features_train, features_test, target_train, target_test = train_test_split(
features, target, random_state=0)
# Create dummy classifier
dummy = DummyClassifier(strategy='uniform', random_state=1)
# "Train" model
dummy.fit(features_train, target_train)
# Get accuracy score
dummy.score(features_test, target_test)

0.42105263157894735

By comparing the baseline classifier to our trained classifier, we can see the
improvement

In [3]:
# Load library
from sklearn.ensemble import RandomForestClassifier
# Create classifier
classifier = RandomForestClassifier()
# Train model
classifier.fit(features_train, target_train)
# Get accuracy score
classifier.score(features_test, target_test)


0.9736842105263158

A common measure of a classifier’s performance is how much better it is than
random guessing. scikit-learn’s DummyClassifier makes this comparison easy The strategy parameter gives us a number of options for generating values.
There are two particularly useful strategies. First, stratified makes predictions
that are proportional to the training set’s target vector’s class proportions (i.e., if
20% of the observations in the training data are women, then DummyClassifier
will predict women 20% of the time). Second, uniform will generate predictions
uniformly at random between the different classes. For example, if 20% of
observations are women and 80% are men, uniform will produce predictions
that are 50% women and 50% men.
