In order to know that our model is any good, we need some way of measuring its accuracy. When you're working with only a couple of features, looking at a graph can be very helpful, but once you get into higher dimensions human intuition breaks down. Thankfully, since we created this data ourselves, we have the "ground truth" labels available to us, and we can use that to determine how correct our learning was.

In [1]:
# first, read in the data

import os
import csv

os.chdir('../data/')

records = []

with open('solutions.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        records.append(row)

print(records[0]) # print the header
records = records[1:] # remove the header
print(records[0]) # print an example record

['person', 'number', 'switch_time']
['fc0ce6c8-1f2a-46e9-8870-e6bd048512e6', '974-703-1399', '0']


In [2]:
def generate_labels(cluster):
    """ given a list of phone numbers (as strings), return a list of category labels (integers)
    that correspond to those numbers """
    all_people = list(set([r[0] for r in records]))
    categories = range(len(all_people))
    labels = []
    for number in cluster:
        person = [r[0] for r in records if r[1] == number]
        if len(person) != 1:
            raise ValueError("shouldn't be more or less than one person per number")
        person = person[0]
        labels.append(all_people.index(person))
    return labels

There are a few different ways to measure the accuracy of your model, but for this example we'll use an Adjusted Rand Index. You can read more about this score, and other alternatives, [here](http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation).

In [3]:
from sklearn import metrics

# retrieve our clustered data from "3. Training"
%store -r all_numbers
%store -r labels

labels_true = generate_labels(all_numbers)

metrics.adjusted_rand_score(labels_true, labels)

-5.6097617415991172e-05

Just to give you an idea, a perfect match (which we don't want! beware [overfitting](https://en.wikipedia.org/wiki/Overfitting)!) would be 1.0.