# Prediction

This notebook shows off the prediction methods we use in this project, in addition to the evaluation scheme.

In [10]:
import sys, os

sys.path.append('..')
import planet.predict, planet.util

import sklearn.metrics
import numpy

data_dir = '../data'

First, we load all the label data.

In [2]:
all_tags = planet.util.read_tags(os.path.join(data_dir, 'train_v2.csv'))
tag_indices = planet.util.get_tag_indices(all_tags)
all_labels = planet.util.tags_to_labels(all_tags, tag_indices)
(num_all, num_labels) = all_labels.shape

## Random Classifier

In order to establish a baseline for performance, we first use a classifier that assigns labels at random by flipping an unbiased coin for each label.

In [11]:
pred_labels_rand = planet.predict.random(num_all, num_labels)

planet.predict.plot_scores(pred_labels_rand, all_labels, tag_indices.keys(),
                           'Random', os.path.join(data_dir, 'rand_scores.html'))

[0.50022804526483367, 0.16919655721433582, 0.35954051255661018]

First, note that the recall (`tp / (tp + fn)`) of this classifier is roughly `0.5` because both the number of true positives (`tp`) and false negatives (`fn`) should be half the number of positive labels (`p/2`). Also note that there is a bit more fluctuation for rarer labels like *conventional_mine*. The average used here, and elsewhere in these analyses, computes the total number of `tp` and `fn` across all samples and labels.

Next, note that the precision (`tp / (tp + fp)`) of this classifier precision roughly follows the empirical distribution of the labels (see the `Data Exploration` notebook for comparison). That's because the number of false positives should be roughly half the number of negative occurences, leading to a precision of `p/2 / (p/2 + n/2) =  p / (p + n)` which is the empirical probability of the label.

Finally, note that the F2 score of this classifier is a little closer to the recall than precision, which is expected because it's a geometric mean between recall and precision that weights recall more heavily than precision.

Instead of using a threshold of 0.5 for each label, we can use the empirical probability of each label instead.

In [18]:
label_probs = numpy.mean(all_labels, axis=0, keepdims=True)
pred_labels_emp_rand = planet.predict.empirical_random(num_all, label_probs)

planet.predict.plot_scores(pred_labels_emp_rand, all_labels, tag_indices.keys(),
                           'Empirical Random', os.path.join(data_dir, 'emp_rand_scores.html'))

[0.55007960070564954, 0.54901186110228373, 0.54986572066112582]

Introducing these probabilities increases the average recall a little, increases the average precision a lot and balances the two scores. Note that even though recall decreased for many labels, it increased overall because some labels much more frequently that others and so the overall score is boosted by predicting those more frequently.