A naive classifier for recognizing handwritten digits from the MNISR data set. The program classifies digits based on how dark they are -- the idea is that digits like `1` tend to be less dark than digits like `8`, simply becasue the latter has a more complex shape. When shown an image the classifier returns whichever digits in the training data has the closest average darkness.

First trains the classifier, and then it applies the classifier to the MNIST test data to see how many digits are correctly classified.

In [2]:
from collections import defaultdict
import mnist_loader

In [21]:
def main():
    training_data, validation_data, test_data = mnist_loader.load_data()
    # training phase: compute average darkness for each digit,
    # based on the training data
    avgs = avg_darkness(training_data)
    num_correct = sum(int(guess_digit(image, avgs) == digit)
                         for image, digit in zip(test_data[0], test_data[1]))
    print("Baseline classifier using average darkness of image.")
    print("{} of {} values correct.".format(num_correct, len(test_data[1])))

In [22]:
def avg_darkness(training_data):
    digits_counts = defaultdict(int)
    darkness = defaultdict(float)
    for image, digit in zip(training_data[0], training_data[1]):
        digits_counts[digit] += 1
        darkness[digit] += sum(image)
    avgs = defaultdict(float)
    for digit, n in digits_counts.items():
        avgs[digit] = darkness[digit]/n
    return avgs

In [25]:
def guess_digit(image, avgs):
    """Return the digit whose average darkness in the training data is
    closest to the darkness of ``image``.  Note that ``avgs`` is
    assumed to be a defaultdict whose keys are 0...9, and whose values
    are the corresponding average darknesses across the training data."""
    darkness = sum(image)
    distances = {k: abs(v-darkness) for k, v in avgs.items()}
    return min(distances, key=distances.get)

In [26]:
if __name__ == "__main__":
    main()

Baseline classifier using average darkness of image.
2225 of 10000 values correct.
