In [23]:
import numpy as np
import pandas as pd

### Show assigned class counts for each true class.

During a cross validation phase of classifier construction, each example clip serves once as test data for a classifier that did not see the example during training. The counts in the table below tally these classifications. There is one row in the table for each true class and one column for each possible assigned class. Each number in the table is the number of clips of the true class to the left that were assigned to the class above.

In [24]:
counts = np.array([
    [467, 0, 2, 4, 314],
    [2, 42, 0, 2, 99],
    [24, 0, 83, 0, 185],
    [5, 1, 0, 866, 297],
    [90, 3, 14, 138, 2729]
])
names = ['Call.CHSP', 'Call.SAVS', 'Call.WCSP', 'Call.WIWA']
true_names = names + ['Other']
assigned_names = names + ['Unclassified']
counts = pd.DataFrame(counts, index=true_names, columns=assigned_names)

In [25]:
counts

Unnamed: 0,Call.CHSP,Call.SAVS,Call.WCSP,Call.WIWA,Unclassified
Call.CHSP,467,0,2,4,314
Call.SAVS,2,42,0,2,99
Call.WCSP,24,0,83,0,185
Call.WIWA,5,1,0,866,297
Other,90,3,14,138,2729


In [26]:
c = np.array(counts.values, dtype='float')

In [27]:
def _get_rounded_percents(counts, totals):
    percents = 100. * (counts.T / totals).T
    return np.round(percents * 10) / 10

### Show percentages of unclassified clips for each true class.

This classifier is called *conservative* because is leaves many clips unclassified. The table below shows the percentage of clips of each true class that the cross validation classifiers declined to classify.

In [28]:
unclassified_counts = c[:, -1]
totals = c.sum(axis=1)
percents = _get_rounded_percents(unclassified_counts, totals)
percents = pd.DataFrame(percents, index=true_names, columns=['Unclassified'])

In [29]:
percents

Unnamed: 0,Unclassified
Call.CHSP,39.9
Call.SAVS,68.3
Call.WCSP,63.4
Call.WIWA,25.4
Other,91.8


### Disregarding unclassified clips, show assigned class percentages for each true class.

There is one row in the table below for each true class and one column for each assigned class. Each number in the table is the percentage of clips of the true class to the left that were assigned to the class above. The percentages in each row sum to 100 (or about 100, since the percentages are rounded to the nearest tenth).

In [30]:
assigned_counts = c[:, :-1]
totals = assigned_counts.sum(axis=1)
percents = _get_rounded_percents(assigned_counts, totals)
percents = pd.DataFrame(percents, index=true_names, columns=assigned_names[:-1])

In [31]:
percents

Unnamed: 0,Call.CHSP,Call.SAVS,Call.WCSP,Call.WIWA
Call.CHSP,98.7,0.0,0.4,0.8
Call.SAVS,4.3,91.3,0.0,4.3
Call.WCSP,22.4,0.0,77.6,0.0
Call.WIWA,0.6,0.1,0.0,99.3
Other,36.7,1.2,5.7,56.3


### Show true class percentages for each assigned class.

There is one row in the table below for each assigned class and one column for each true class. Each number in the table is the percentage of clips assigned to the class to the left that are in fact of the true class above. The percentages in each row sum to 100 (or about 100, since the percentages are rounded to the nearest tenth).

This table is probably more telling than the previous one about the potential utility of the classifier. Each diagonal entry tells you the percentage of clips assigned to a class (i.e. the one to the left of the entry) by the cross validation classifiers that were truly of that class, and the off-diagonal entries give the percentages of clips assigned to a class that were actually of different classes. These are the approximate correct and incorrect classification rates that one might expect to see when browsing classified clips, assuming that the classifier generalizes well.

In [32]:
true_counts = c.T
totals = true_counts.sum(axis=1)
percents = _get_rounded_percents(true_counts, totals)
percents = pd.DataFrame(percents, index=assigned_names, columns=true_names)

In [33]:
percents

Unnamed: 0,Call.CHSP,Call.SAVS,Call.WCSP,Call.WIWA,Other
Call.CHSP,79.4,0.3,4.1,0.9,15.3
Call.SAVS,0.0,91.3,0.0,2.2,6.5
Call.WCSP,2.0,0.0,83.8,0.0,14.1
Call.WIWA,0.4,0.2,0.0,85.7,13.7
Unclassified,8.7,2.7,5.1,8.2,75.3
