## Evaluating Classifiers

Today we are going to discuss metrics for comparing classifiers. We will bring back the rule-based classifier that was introduced yesterday (the classifier that used the word overlap proportion for headlines and articles in order to make its decision), and we will compare the performance of this classifier against a classifier that classifies every article-headline pair as 'unrelated'. We will see that accuracy is only one way to measure performance, and that it may not be the best metric when the training and test examples are unbalanced across classes.

### 1. Load the data 

We have copied the functions that we created yesterday into a separate file (named 'initial_classifier_utils.py'), and we will now import those functions here, along with the other python packages we need:

In [1]:
import initial_classifier_utils as utils
import numpy as np
import pandas as pd
import sys
import os.path
# Adjust settings so that we can fully see the dataset below
pd.set_option('display.max_colwidth', -1)

In [2]:
# Create the training data, if it doesn't already exist:
if os.path.isfile('train_data.csv') == False:
    utils.merge_data("train_stances.csv", "train_bodies.csv", "train_data.csv")

In [3]:
# Create the test data, if it doesn't already exist:
if os.path.isfile('test_data.csv') == False:
    utils.merge_data("competition_test_stances.csv", "competition_test_bodies.csv", "test_data.csv")

In [4]:
# Read in the training and test data and separate out the training data according to the value of the Stance variable:
train_data = pd.read_csv("train_data.csv", encoding = "utf-8")
unrelated_train = train_data[train_data['Stance'] == 'unrelated']
discuss_train = train_data[train_data['Stance'] == 'discuss']
agree_train = train_data[train_data['Stance'] == 'agree']
disagree_train = train_data[train_data['Stance'] == 'disagree']

test_data = pd.read_csv("test_data.csv", encoding = "utf-8")

### 2. Load the classifier and make predictions on the test data

In [5]:
# We are going to compute the headline-article word overlap for each of the examples in the unrelated,
# discuss, agree and disagree categories in the training set.  
# Warning: this may take a minute - it is iterating through almost 50,000 examples!
proportions_train = utils.compute_proportions(unrelated_train, discuss_train, agree_train, disagree_train)

In [6]:
# We are going to use a lot of the functions that were created yesterday, but we are also going to define a new one:
# The function below takes the test data set and iterates through each article-headline pair. For each article-headline pair
# it makes a prediction by calculating the headline-article word overlap and comparing this to the mean overlap values 
# for each of the categories in the training set 
def make_predictions(test_data, proportions_train):
    stance_counts = {"unrelated":0, "discuss":0, "agree":0, "disagree":0}
    stance_correct_counts = {"unrelated":0, "discuss":0, "agree":0, "disagree":0}
    predictions_list = []
    for i in range(test_data.shape[0]):
        example = test_data.iloc[i]
        predicted_stance = utils.make_prediction(example, proportions_train)
        actual_stance = example['Stance']
        # Append the predicted stance to the predictions_list - this will allow us to compare predictions and true 
        # values later on
        predictions_list.append(predicted_stance)
    return predictions_list

In [7]:
# Call the make_predictions function (it may take a minute, the test_data set contains 25,413 examples!) 
# and check that it is the right size (one way to check that the make_predictions function is performing as expected)
predictions = make_predictions(test_data, proportions_train)
len(predictions)

25413

### 3. Compare predictions and ground truth

In [8]:
# We are going to append the predictions list we just obtained to the test_data dataframe as an additional column,
# along with one more column that predicts 'unrelated' for every example:
test_data['prediction_1'] = predictions
test_data['prediction_2'] = 'unrelated'

In [9]:
# Let's look at the modified test_data dataframe by viewing the first few examples:
test_data.head()

Unnamed: 0,Body ID,articleBody,Headline,Stance,prediction_1,prediction_2
0,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,Apple installing safes in-store to protect gold Watch Edition,unrelated,unrelated,unrelated
1,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,El-Sisi denies claims he'll give Sinai land to Palestinians,agree,unrelated,unrelated
2,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,Apple to keep gold Watch Editions in special in-store safes,unrelated,unrelated,unrelated
3,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,Apple Stores to Keep Gold “Edition” Apple Watch in Custom Safes,unrelated,unrelated,unrelated
4,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,South Korean woman's hair 'eaten' by robot vacuum cleaner as she slept,unrelated,unrelated,unrelated


In [10]:
# Let's look at a few more examples (this time at the end of the dataset):
test_data[-3:]

Unnamed: 0,Body ID,articleBody,Headline,Stance,prediction_1,prediction_2
25410,2586,"Remember how much Republicans wanted to repeal Obamacare?\n\nThe Republican majority in the House of Representatives has voted more than 50 times to repeal the law. Conservatives have twice brought challenges to the Supreme Court — a court with powerful voices that often lean in their direction — only to be largely rebuffed both times. The last government shutdown was driven by Republicans who insisted on defunding Obamacare (not to be confused with what may be the next government shutdown, driven by Republicans insisting on defunding Planned Parenthood).\n\nSome suggest that the calls for repealing Obamacare are fading. Sarah Kliff argued that the “near-complete absence of Obama’s health overhaul” in last week’s Republican presidential debate was “remarkable.”\n\nMaybe, but don’t count on it. In GOP presidential candidate Jeb Bush’s white paper on how he would get to 4 percent growth through supply-side tax cuts, his team of economists stresses that repealing the Affordable Care Act will be an “important means of enhancing economic growth.” Front-runner Donald Trump said just last week that he was going to replace Obamacare with “DonaldCare,” which would be both “absolutely great” and “really spectacular.” Repealing health-care reform remains a prominent talking point for Wisconsin Gov. Scott Walker and Sen. Ted Cruz, both GOP presidential candidates.\n\nWell, since we don’t know what’s in it, I can’t comment on DonaldCare.\n\nBut I can tell you this about Obamacare: When it comes to meeting one of its most important goals — providing coverage to the uninsured — it is working extremely well. It’s posting historical gains on this front and, in so doing, both insulating itself from repeal and creating a daunting political challenge for its opponents.\n\nThe facts of the case are thoroughly drawn out in this new analysis by my colleagues Matt Broaddus and Edwin Park (B&P) from the Center on Budget and Policy Priorities:\n\n— As the figure above shows, newly released census data show that the share of those without health coverage fell from 13.3 percent to 10.4 last year.\n\n— That’s the largest single-year drop on record based on data going back to 1987.\n\nhi2\nSource: B&P (see text)\n— In this type of work, the strength of your findings are much bolstered when you see them across multiple sources. As B&P point out, the census findings are “consistent with the historic coverage gains measured in the Centers for Disease Control (CDC) and Prevention’s National Health Interview Survey and several private surveys…at 9.2 percent, the CDC’s estimated uninsured rate for the first quarter of 2015 was the lowest since the CDC began collecting these data in 1997 and more than 40 percent below the peak in 2010.”\n\n— The ACA takes a two-pronged attack on covering the uninsured, subsidizing private coverage through the exchanges and expanding Medicaid in the 25 states (as well as D.C.) that accepted that part of the deal. Both private and public coverage are making significant gains.\n\n— Speaking of anti-Obamacare ideology and its effect on people, B&P provide this revealing calculation: “If the uninsured rate had fallen in non-expansion states at the same rate as in expansion states, an additional 2.6 million uninsured Americans would have gained coverage last year.”\n\n— Coverage last year grew most quickly among households with income below $50,000; their uninsured rate fell from about 20 percent to about 15 percent — in one year! Households above $50,000 had lower uninsured rates to start with, so we would expect smaller changes, but they, too, went from 9 percent to 7.4 percent.\n\nSuch truths are more than inconvenient for sworn enemies of the law. They can bob and weave in their primaries, but once they reach the general election, their Democratic opponent will slam them on these points. And unless they’re willing to massively flip — pretty unimaginable, given the heat generated by the issue for conservatives — people represented in the numbers and charts above will rightfully fear that their health security is at stake.\n\nI cannot overemphasize the importance of these facts, and, yes, I know better than most that our political discourse too often exists in a fact-free zone. But the success of the ACA strikes at the heart of the dysfunction strategy employed extremely effectively by anti-government conservatives. This is the self-fulfilling-prophecy strategy that campaigns on: “Washington is broken! Send me there and I’ll make sure it stays that way!”\n\nRemember, when government messes up, when it shuts down, when it fails to address the things that matter to most people — economic opportunity, wage stagnation, affordable college — and, instead, squabbles about Planned Parenthood, the winners are those who can say: “See? We told you. Government’s broken!” Never mind that those making that case are the ones doing the breaking.\n\nBut with Obamacare, they’ve been failing, and the more we elevate that case, the closer we get to the road back to Factville.",CBO’s Alternate Facts Show Obamacare is Unsustainable,disagree,unrelated,unrelated
25411,2586,"Remember how much Republicans wanted to repeal Obamacare?\n\nThe Republican majority in the House of Representatives has voted more than 50 times to repeal the law. Conservatives have twice brought challenges to the Supreme Court — a court with powerful voices that often lean in their direction — only to be largely rebuffed both times. The last government shutdown was driven by Republicans who insisted on defunding Obamacare (not to be confused with what may be the next government shutdown, driven by Republicans insisting on defunding Planned Parenthood).\n\nSome suggest that the calls for repealing Obamacare are fading. Sarah Kliff argued that the “near-complete absence of Obama’s health overhaul” in last week’s Republican presidential debate was “remarkable.”\n\nMaybe, but don’t count on it. In GOP presidential candidate Jeb Bush’s white paper on how he would get to 4 percent growth through supply-side tax cuts, his team of economists stresses that repealing the Affordable Care Act will be an “important means of enhancing economic growth.” Front-runner Donald Trump said just last week that he was going to replace Obamacare with “DonaldCare,” which would be both “absolutely great” and “really spectacular.” Repealing health-care reform remains a prominent talking point for Wisconsin Gov. Scott Walker and Sen. Ted Cruz, both GOP presidential candidates.\n\nWell, since we don’t know what’s in it, I can’t comment on DonaldCare.\n\nBut I can tell you this about Obamacare: When it comes to meeting one of its most important goals — providing coverage to the uninsured — it is working extremely well. It’s posting historical gains on this front and, in so doing, both insulating itself from repeal and creating a daunting political challenge for its opponents.\n\nThe facts of the case are thoroughly drawn out in this new analysis by my colleagues Matt Broaddus and Edwin Park (B&P) from the Center on Budget and Policy Priorities:\n\n— As the figure above shows, newly released census data show that the share of those without health coverage fell from 13.3 percent to 10.4 last year.\n\n— That’s the largest single-year drop on record based on data going back to 1987.\n\nhi2\nSource: B&P (see text)\n— In this type of work, the strength of your findings are much bolstered when you see them across multiple sources. As B&P point out, the census findings are “consistent with the historic coverage gains measured in the Centers for Disease Control (CDC) and Prevention’s National Health Interview Survey and several private surveys…at 9.2 percent, the CDC’s estimated uninsured rate for the first quarter of 2015 was the lowest since the CDC began collecting these data in 1997 and more than 40 percent below the peak in 2010.”\n\n— The ACA takes a two-pronged attack on covering the uninsured, subsidizing private coverage through the exchanges and expanding Medicaid in the 25 states (as well as D.C.) that accepted that part of the deal. Both private and public coverage are making significant gains.\n\n— Speaking of anti-Obamacare ideology and its effect on people, B&P provide this revealing calculation: “If the uninsured rate had fallen in non-expansion states at the same rate as in expansion states, an additional 2.6 million uninsured Americans would have gained coverage last year.”\n\n— Coverage last year grew most quickly among households with income below $50,000; their uninsured rate fell from about 20 percent to about 15 percent — in one year! Households above $50,000 had lower uninsured rates to start with, so we would expect smaller changes, but they, too, went from 9 percent to 7.4 percent.\n\nSuch truths are more than inconvenient for sworn enemies of the law. They can bob and weave in their primaries, but once they reach the general election, their Democratic opponent will slam them on these points. And unless they’re willing to massively flip — pretty unimaginable, given the heat generated by the issue for conservatives — people represented in the numbers and charts above will rightfully fear that their health security is at stake.\n\nI cannot overemphasize the importance of these facts, and, yes, I know better than most that our political discourse too often exists in a fact-free zone. But the success of the ACA strikes at the heart of the dysfunction strategy employed extremely effectively by anti-government conservatives. This is the self-fulfilling-prophecy strategy that campaigns on: “Washington is broken! Send me there and I’ll make sure it stays that way!”\n\nRemember, when government messes up, when it shuts down, when it fails to address the things that matter to most people — economic opportunity, wage stagnation, affordable college — and, instead, squabbles about Planned Parenthood, the winners are those who can say: “See? We told you. Government’s broken!” Never mind that those making that case are the ones doing the breaking.\n\nBut with Obamacare, they’ve been failing, and the more we elevate that case, the closer we get to the road back to Factville.",Why Obamacare failed,disagree,disagree,unrelated
25412,2586,"Remember how much Republicans wanted to repeal Obamacare?\n\nThe Republican majority in the House of Representatives has voted more than 50 times to repeal the law. Conservatives have twice brought challenges to the Supreme Court — a court with powerful voices that often lean in their direction — only to be largely rebuffed both times. The last government shutdown was driven by Republicans who insisted on defunding Obamacare (not to be confused with what may be the next government shutdown, driven by Republicans insisting on defunding Planned Parenthood).\n\nSome suggest that the calls for repealing Obamacare are fading. Sarah Kliff argued that the “near-complete absence of Obama’s health overhaul” in last week’s Republican presidential debate was “remarkable.”\n\nMaybe, but don’t count on it. In GOP presidential candidate Jeb Bush’s white paper on how he would get to 4 percent growth through supply-side tax cuts, his team of economists stresses that repealing the Affordable Care Act will be an “important means of enhancing economic growth.” Front-runner Donald Trump said just last week that he was going to replace Obamacare with “DonaldCare,” which would be both “absolutely great” and “really spectacular.” Repealing health-care reform remains a prominent talking point for Wisconsin Gov. Scott Walker and Sen. Ted Cruz, both GOP presidential candidates.\n\nWell, since we don’t know what’s in it, I can’t comment on DonaldCare.\n\nBut I can tell you this about Obamacare: When it comes to meeting one of its most important goals — providing coverage to the uninsured — it is working extremely well. It’s posting historical gains on this front and, in so doing, both insulating itself from repeal and creating a daunting political challenge for its opponents.\n\nThe facts of the case are thoroughly drawn out in this new analysis by my colleagues Matt Broaddus and Edwin Park (B&P) from the Center on Budget and Policy Priorities:\n\n— As the figure above shows, newly released census data show that the share of those without health coverage fell from 13.3 percent to 10.4 last year.\n\n— That’s the largest single-year drop on record based on data going back to 1987.\n\nhi2\nSource: B&P (see text)\n— In this type of work, the strength of your findings are much bolstered when you see them across multiple sources. As B&P point out, the census findings are “consistent with the historic coverage gains measured in the Centers for Disease Control (CDC) and Prevention’s National Health Interview Survey and several private surveys…at 9.2 percent, the CDC’s estimated uninsured rate for the first quarter of 2015 was the lowest since the CDC began collecting these data in 1997 and more than 40 percent below the peak in 2010.”\n\n— The ACA takes a two-pronged attack on covering the uninsured, subsidizing private coverage through the exchanges and expanding Medicaid in the 25 states (as well as D.C.) that accepted that part of the deal. Both private and public coverage are making significant gains.\n\n— Speaking of anti-Obamacare ideology and its effect on people, B&P provide this revealing calculation: “If the uninsured rate had fallen in non-expansion states at the same rate as in expansion states, an additional 2.6 million uninsured Americans would have gained coverage last year.”\n\n— Coverage last year grew most quickly among households with income below $50,000; their uninsured rate fell from about 20 percent to about 15 percent — in one year! Households above $50,000 had lower uninsured rates to start with, so we would expect smaller changes, but they, too, went from 9 percent to 7.4 percent.\n\nSuch truths are more than inconvenient for sworn enemies of the law. They can bob and weave in their primaries, but once they reach the general election, their Democratic opponent will slam them on these points. And unless they’re willing to massively flip — pretty unimaginable, given the heat generated by the issue for conservatives — people represented in the numbers and charts above will rightfully fear that their health security is at stake.\n\nI cannot overemphasize the importance of these facts, and, yes, I know better than most that our political discourse too often exists in a fact-free zone. But the success of the ACA strikes at the heart of the dysfunction strategy employed extremely effectively by anti-government conservatives. This is the self-fulfilling-prophecy strategy that campaigns on: “Washington is broken! Send me there and I’ll make sure it stays that way!”\n\nRemember, when government messes up, when it shuts down, when it fails to address the things that matter to most people — economic opportunity, wage stagnation, affordable college — and, instead, squabbles about Planned Parenthood, the winners are those who can say: “See? We told you. Government’s broken!” Never mind that those making that case are the ones doing the breaking.\n\nBut with Obamacare, they’ve been failing, and the more we elevate that case, the closer we get to the road back to Factville.",The success of the Affordable Care Act is a hugely inconvenient truth for its opponents,agree,agree,unrelated


### 4. Compare classifier performance

We are now going to introduce some tools that we can use for evaluating classifier performance. These are:
- Accuracy 
- Confusion Matrices
- Precision and Recall
- F1 score

#### 4a. Accuracy

Accuracy is defined as the ratio of the number of examples that were correctly predicted compared to the number of examples in the test dataset:
$\text{accuracy} = \dfrac{\text{number correct}}{\text{number of examples}}$

Let's calculate accuracy values for the classifier we saw yesterday, and compare this to the classifier which predicts 'unrelated' for all examples:

In [11]:
def calculate_accuracy(test_data, truth_col_name, prediction_col_name):
    number_correct = sum(test_data[truth_col_name] == test_data[prediction_col_name])
    number_examples = test_data.shape[0]
    accuracy = number_correct/number_examples
    return accuracy

In [12]:
# Calculate accuracy for the classifier we created yesterday:
calculate_accuracy(test_data, 'Stance', 'prediction_1')

0.691575178058474

In [13]:
# Calculate accuracy for the classifier that classifies all examples as 'unrelated':
calculate_accuracy(test_data, 'Stance', 'prediction_2')

0.7220320308503522

#### 4b. Confusion Matrices

We have just seen that the classifier that predicts 'unrelated' for everything has a higher accuracy than the one we created yesterday! But there is something unsatisfying about this second classifier, and if we were using it to predict articles that might be Fake News (because the headline and article are unrelated), all articles would be marked as Fake News! This doesn't seem helpful! 

Perhaps accuracy is not the right tool to evaluate our classifier, and we should be using something else...

We are now going to calculate something called a 'confusion matrix' for each of our classifiers:

In [14]:
def calculate_confusion_matrix(test_data, truth_col_name, prediction_col_name):
    return pd.crosstab(test_data[truth_col_name], test_data[prediction_col_name])

In [15]:
calculate_confusion_matrix(test_data, 'Stance', 'prediction_1')

prediction_1,agree,disagree,discuss,related,unrelated
Stance,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
agree,895,342,8,1,657
disagree,239,149,5,0,304
discuss,1977,862,16,1,1608
unrelated,211,1600,20,3,16515


What does the confusion matrix tell us? 

If we look at the first entry of the confusion matrix, with value 895, this tells us that 895 examples in the test dataset had label 'agree' (row value) and were correctly classified by our classifier as 'agree' (column value).

If we look at the entry in the second row and first column (value 239), this tells us that 239 examples in the test dataset had label 'disagree', but were classified as 'agree' by our classifier. 

In [16]:
# Let's also look at the confusion matrix, where the entries are given as a proportion of the size of the
#  test dataset
calculate_confusion_matrix(test_data, 'Stance', 'prediction_1')/test_data.shape[0]

prediction_1,agree,disagree,discuss,related,unrelated
Stance,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
agree,0.035218,0.013458,0.000315,3.9e-05,0.025853
disagree,0.009405,0.005863,0.000197,0.0,0.011962
discuss,0.077795,0.03392,0.00063,3.9e-05,0.063275
unrelated,0.008303,0.06296,0.000787,0.000118,0.649864


In [17]:
# Let's calculate the confusion matrix for the second classifier:
calculate_confusion_matrix(test_data, 'Stance', 'prediction_2')

prediction_2,unrelated
Stance,Unnamed: 1_level_1
agree,1903
disagree,697
discuss,4464
unrelated,18349


In [18]:
# And again, let's make the entries proportions of the full test dataset:
calculate_confusion_matrix(test_data, 'Stance', 'prediction_2')/test_data.shape[0]

prediction_2,unrelated
Stance,Unnamed: 1_level_1
agree,0.074883
disagree,0.027427
discuss,0.175658
unrelated,0.722032


TODO:
- Discussion of confusion matrices (and division by row percentage to look at accuracy for each category)
- F1 and precision/recall discussion.

### 5. Extra Challenge

1. Take a look at some of the examples that were misclassified by our proportion overlap classifier. Why do you think our proportion overlap model failed? Can you think of a rule that would properly classify these examples?  Why not test it out?