## Evaluating Classifiers

Today we are going to discuss metrics for comparing classifiers. We will bring back the rule-based classifier that was introduced yesterday (the classifier that used the word overlap proportion for headlines and articles in order to make its decision; although we will restrict this classifier so that it can only predict 'unrelated' and 'related'), and we will compare the performance of this classifier against a classifier that classifies every article-headline pair as 'unrelated'. We will see that accuracy is only one way to measure performance, and that it may not be the best metric when the training and test examples are unbalanced across classes.

Anytime you see ``______ # TODO: FILL IN HERE.`` in the code, you should replace the ``______`` with your own code.

As always, ask your neighbors or an instructor if you have any questions!

### 1. Load the data 

We have copied the functions that we created yesterday into a separate file (named 'initial_classifier_utils.py'), and we will now import those functions here, along with the other python packages we need:

In [1]:
import initial_classifier_utils as utils
import numpy as np
import pandas as pd
import sys
import os.path
# Adjust settings so that we can fully see the dataset below
pd.set_option('display.max_colwidth', -1)

In [2]:
# Create the training data, if it doesn't already exist:
if os.path.isfile('train_data.csv') == False:
    utils.merge_data("train_stances.csv", "train_bodies.csv", "train_data.csv")

In [3]:
# Create the test data, if it doesn't already exist:
if os.path.isfile('test_data.csv') == False:
    utils.merge_data("competition_test_stances.csv", "competition_test_bodies.csv", "test_data.csv")

In [4]:
# Read in the training and test data and separate out the training data according to the value of the Stance variable:
train_data = pd.read_csv("train_data.csv", encoding = "utf-8")
unrelated_train = train_data[train_data['Stance'] == 'unrelated']
discuss_train = train_data[train_data['Stance'] == 'discuss']
agree_train = train_data[train_data['Stance'] == 'agree']
disagree_train = train_data[train_data['Stance'] == 'disagree']

test_data = pd.read_csv("test_data.csv", encoding = "utf-8")

### 2. Load the classifier and make predictions on the test data

In [5]:
# We are going to compute the headline-article word overlap for each of the examples in the unrelated,
# discuss, agree and disagree categories in the training set.  
# Warning: this may take a minute - it is iterating through almost 50,000 examples!
proportions_train = utils.compute_proportions(unrelated_train, discuss_train, agree_train, disagree_train)

In [6]:
# We are going to modify the make_prediction function that we created yesterday, so that it only predicts 'unrelated'
# and 'related'. It will use the same method as our classifier yesterday used: the proportion overlap for an example
# will be compared with the overlap proportions for 'related' and 'unrelated' before a decision is made
def make_prediction(example, proportions_train):
    # Keep only the 'unrelated' and 'related' proportions:
    keys = ['unrelated', 'related']
    new_proportions = { key: proportions_train[key] for key in keys }
    proportions_stances = list(new_proportions.keys())
    proportion = utils.find_headline_in_article_proportion(example)
    predicted_stance = proportions_stances[np.argmin(np.abs(np.array(list(new_proportions.values())) - proportion))]
    return predicted_stance

In [7]:
# We are going to use a lot of the functions that were created yesterday, but we are also going to define a new one:
# The function below takes the test data set and iterates through each article-headline pair. For each article-headline pair
# it makes a prediction by calculating the headline-article word overlap and comparing this to the mean overlap values 
# for each of the categories in the training set. The function returns a list of predictions for every example in the
# test data set
def make_predictions(test_data, proportions_train):
    predictions_list = []
    for i in range(test_data.shape[0]):
        example = test_data.iloc[i]
        predicted_stance = make_prediction(example, proportions_train)
        # Append the predicted stance to the predictions_list - this will allow us to compare predictions and true 
        # values later on
        predictions_list.append(predicted_stance)
    return predictions_list

In [8]:
# Call the make_predictions function (it may take a minute, the test_data set contains 25,413 examples!) 
# and check that it is the right size (one way to check that the make_predictions function is performing as expected)
predictions = make_predictions(test_data, proportions_train)
len(predictions)

25413

### 3. Compare predictions and ground truth

In [9]:
# We are going to create a new column (called 'new_stance') in the test_data dataframe so that any Stance value that is 
# 'agree', 'discuss', or'disagree' is represented as 'related', while a value of 'unrelated' remains 'unrelated'
test_data['new_stance'] = test_data['Stance']
test_data.loc[test_data['Stance'].isin(['agree', 'disagree', 'discuss']), 'new_stance'] = 'related'

In [10]:
# We are going to append the predictions list we just obtained to the test_data dataframe as an additional column,
# along with one more column that predicts 'unrelated' for every example:
test_data['prediction_1'] = predictions
test_data['prediction_2'] = 'unrelated'

In [11]:
# Let's look at the modified test_data dataframe by viewing the first few examples:
test_data.head()

Unnamed: 0,Body ID,articleBody,Headline,Stance,new_stance,prediction_1,prediction_2
0,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,Apple installing safes in-store to protect gold Watch Edition,unrelated,unrelated,unrelated,unrelated
1,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,El-Sisi denies claims he'll give Sinai land to Palestinians,agree,related,unrelated,unrelated
2,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,Apple to keep gold Watch Editions in special in-store safes,unrelated,unrelated,unrelated,unrelated
3,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,Apple Stores to Keep Gold “Edition” Apple Watch in Custom Safes,unrelated,unrelated,unrelated,unrelated
4,1,Al-Sisi has denied Israeli reports stating that he offered to extend the Gaza Strip.,South Korean woman's hair 'eaten' by robot vacuum cleaner as she slept,unrelated,unrelated,unrelated,unrelated


In [12]:
# Let's look at a few more examples (this time at the end of the dataset):
test_data[-3:]

Unnamed: 0,Body ID,articleBody,Headline,Stance,new_stance,prediction_1,prediction_2
25410,2586,"Remember how much Republicans wanted to repeal Obamacare?\n\nThe Republican majority in the House of Representatives has voted more than 50 times to repeal the law. Conservatives have twice brought challenges to the Supreme Court — a court with powerful voices that often lean in their direction — only to be largely rebuffed both times. The last government shutdown was driven by Republicans who insisted on defunding Obamacare (not to be confused with what may be the next government shutdown, driven by Republicans insisting on defunding Planned Parenthood).\n\nSome suggest that the calls for repealing Obamacare are fading. Sarah Kliff argued that the “near-complete absence of Obama’s health overhaul” in last week’s Republican presidential debate was “remarkable.”\n\nMaybe, but don’t count on it. In GOP presidential candidate Jeb Bush’s white paper on how he would get to 4 percent growth through supply-side tax cuts, his team of economists stresses that repealing the Affordable Care Act will be an “important means of enhancing economic growth.” Front-runner Donald Trump said just last week that he was going to replace Obamacare with “DonaldCare,” which would be both “absolutely great” and “really spectacular.” Repealing health-care reform remains a prominent talking point for Wisconsin Gov. Scott Walker and Sen. Ted Cruz, both GOP presidential candidates.\n\nWell, since we don’t know what’s in it, I can’t comment on DonaldCare.\n\nBut I can tell you this about Obamacare: When it comes to meeting one of its most important goals — providing coverage to the uninsured — it is working extremely well. It’s posting historical gains on this front and, in so doing, both insulating itself from repeal and creating a daunting political challenge for its opponents.\n\nThe facts of the case are thoroughly drawn out in this new analysis by my colleagues Matt Broaddus and Edwin Park (B&P) from the Center on Budget and Policy Priorities:\n\n— As the figure above shows, newly released census data show that the share of those without health coverage fell from 13.3 percent to 10.4 last year.\n\n— That’s the largest single-year drop on record based on data going back to 1987.\n\nhi2\nSource: B&P (see text)\n— In this type of work, the strength of your findings are much bolstered when you see them across multiple sources. As B&P point out, the census findings are “consistent with the historic coverage gains measured in the Centers for Disease Control (CDC) and Prevention’s National Health Interview Survey and several private surveys…at 9.2 percent, the CDC’s estimated uninsured rate for the first quarter of 2015 was the lowest since the CDC began collecting these data in 1997 and more than 40 percent below the peak in 2010.”\n\n— The ACA takes a two-pronged attack on covering the uninsured, subsidizing private coverage through the exchanges and expanding Medicaid in the 25 states (as well as D.C.) that accepted that part of the deal. Both private and public coverage are making significant gains.\n\n— Speaking of anti-Obamacare ideology and its effect on people, B&P provide this revealing calculation: “If the uninsured rate had fallen in non-expansion states at the same rate as in expansion states, an additional 2.6 million uninsured Americans would have gained coverage last year.”\n\n— Coverage last year grew most quickly among households with income below $50,000; their uninsured rate fell from about 20 percent to about 15 percent — in one year! Households above $50,000 had lower uninsured rates to start with, so we would expect smaller changes, but they, too, went from 9 percent to 7.4 percent.\n\nSuch truths are more than inconvenient for sworn enemies of the law. They can bob and weave in their primaries, but once they reach the general election, their Democratic opponent will slam them on these points. And unless they’re willing to massively flip — pretty unimaginable, given the heat generated by the issue for conservatives — people represented in the numbers and charts above will rightfully fear that their health security is at stake.\n\nI cannot overemphasize the importance of these facts, and, yes, I know better than most that our political discourse too often exists in a fact-free zone. But the success of the ACA strikes at the heart of the dysfunction strategy employed extremely effectively by anti-government conservatives. This is the self-fulfilling-prophecy strategy that campaigns on: “Washington is broken! Send me there and I’ll make sure it stays that way!”\n\nRemember, when government messes up, when it shuts down, when it fails to address the things that matter to most people — economic opportunity, wage stagnation, affordable college — and, instead, squabbles about Planned Parenthood, the winners are those who can say: “See? We told you. Government’s broken!” Never mind that those making that case are the ones doing the breaking.\n\nBut with Obamacare, they’ve been failing, and the more we elevate that case, the closer we get to the road back to Factville.",CBO’s Alternate Facts Show Obamacare is Unsustainable,disagree,related,unrelated,unrelated
25411,2586,"Remember how much Republicans wanted to repeal Obamacare?\n\nThe Republican majority in the House of Representatives has voted more than 50 times to repeal the law. Conservatives have twice brought challenges to the Supreme Court — a court with powerful voices that often lean in their direction — only to be largely rebuffed both times. The last government shutdown was driven by Republicans who insisted on defunding Obamacare (not to be confused with what may be the next government shutdown, driven by Republicans insisting on defunding Planned Parenthood).\n\nSome suggest that the calls for repealing Obamacare are fading. Sarah Kliff argued that the “near-complete absence of Obama’s health overhaul” in last week’s Republican presidential debate was “remarkable.”\n\nMaybe, but don’t count on it. In GOP presidential candidate Jeb Bush’s white paper on how he would get to 4 percent growth through supply-side tax cuts, his team of economists stresses that repealing the Affordable Care Act will be an “important means of enhancing economic growth.” Front-runner Donald Trump said just last week that he was going to replace Obamacare with “DonaldCare,” which would be both “absolutely great” and “really spectacular.” Repealing health-care reform remains a prominent talking point for Wisconsin Gov. Scott Walker and Sen. Ted Cruz, both GOP presidential candidates.\n\nWell, since we don’t know what’s in it, I can’t comment on DonaldCare.\n\nBut I can tell you this about Obamacare: When it comes to meeting one of its most important goals — providing coverage to the uninsured — it is working extremely well. It’s posting historical gains on this front and, in so doing, both insulating itself from repeal and creating a daunting political challenge for its opponents.\n\nThe facts of the case are thoroughly drawn out in this new analysis by my colleagues Matt Broaddus and Edwin Park (B&P) from the Center on Budget and Policy Priorities:\n\n— As the figure above shows, newly released census data show that the share of those without health coverage fell from 13.3 percent to 10.4 last year.\n\n— That’s the largest single-year drop on record based on data going back to 1987.\n\nhi2\nSource: B&P (see text)\n— In this type of work, the strength of your findings are much bolstered when you see them across multiple sources. As B&P point out, the census findings are “consistent with the historic coverage gains measured in the Centers for Disease Control (CDC) and Prevention’s National Health Interview Survey and several private surveys…at 9.2 percent, the CDC’s estimated uninsured rate for the first quarter of 2015 was the lowest since the CDC began collecting these data in 1997 and more than 40 percent below the peak in 2010.”\n\n— The ACA takes a two-pronged attack on covering the uninsured, subsidizing private coverage through the exchanges and expanding Medicaid in the 25 states (as well as D.C.) that accepted that part of the deal. Both private and public coverage are making significant gains.\n\n— Speaking of anti-Obamacare ideology and its effect on people, B&P provide this revealing calculation: “If the uninsured rate had fallen in non-expansion states at the same rate as in expansion states, an additional 2.6 million uninsured Americans would have gained coverage last year.”\n\n— Coverage last year grew most quickly among households with income below $50,000; their uninsured rate fell from about 20 percent to about 15 percent — in one year! Households above $50,000 had lower uninsured rates to start with, so we would expect smaller changes, but they, too, went from 9 percent to 7.4 percent.\n\nSuch truths are more than inconvenient for sworn enemies of the law. They can bob and weave in their primaries, but once they reach the general election, their Democratic opponent will slam them on these points. And unless they’re willing to massively flip — pretty unimaginable, given the heat generated by the issue for conservatives — people represented in the numbers and charts above will rightfully fear that their health security is at stake.\n\nI cannot overemphasize the importance of these facts, and, yes, I know better than most that our political discourse too often exists in a fact-free zone. But the success of the ACA strikes at the heart of the dysfunction strategy employed extremely effectively by anti-government conservatives. This is the self-fulfilling-prophecy strategy that campaigns on: “Washington is broken! Send me there and I’ll make sure it stays that way!”\n\nRemember, when government messes up, when it shuts down, when it fails to address the things that matter to most people — economic opportunity, wage stagnation, affordable college — and, instead, squabbles about Planned Parenthood, the winners are those who can say: “See? We told you. Government’s broken!” Never mind that those making that case are the ones doing the breaking.\n\nBut with Obamacare, they’ve been failing, and the more we elevate that case, the closer we get to the road back to Factville.",Why Obamacare failed,disagree,related,related,unrelated
25412,2586,"Remember how much Republicans wanted to repeal Obamacare?\n\nThe Republican majority in the House of Representatives has voted more than 50 times to repeal the law. Conservatives have twice brought challenges to the Supreme Court — a court with powerful voices that often lean in their direction — only to be largely rebuffed both times. The last government shutdown was driven by Republicans who insisted on defunding Obamacare (not to be confused with what may be the next government shutdown, driven by Republicans insisting on defunding Planned Parenthood).\n\nSome suggest that the calls for repealing Obamacare are fading. Sarah Kliff argued that the “near-complete absence of Obama’s health overhaul” in last week’s Republican presidential debate was “remarkable.”\n\nMaybe, but don’t count on it. In GOP presidential candidate Jeb Bush’s white paper on how he would get to 4 percent growth through supply-side tax cuts, his team of economists stresses that repealing the Affordable Care Act will be an “important means of enhancing economic growth.” Front-runner Donald Trump said just last week that he was going to replace Obamacare with “DonaldCare,” which would be both “absolutely great” and “really spectacular.” Repealing health-care reform remains a prominent talking point for Wisconsin Gov. Scott Walker and Sen. Ted Cruz, both GOP presidential candidates.\n\nWell, since we don’t know what’s in it, I can’t comment on DonaldCare.\n\nBut I can tell you this about Obamacare: When it comes to meeting one of its most important goals — providing coverage to the uninsured — it is working extremely well. It’s posting historical gains on this front and, in so doing, both insulating itself from repeal and creating a daunting political challenge for its opponents.\n\nThe facts of the case are thoroughly drawn out in this new analysis by my colleagues Matt Broaddus and Edwin Park (B&P) from the Center on Budget and Policy Priorities:\n\n— As the figure above shows, newly released census data show that the share of those without health coverage fell from 13.3 percent to 10.4 last year.\n\n— That’s the largest single-year drop on record based on data going back to 1987.\n\nhi2\nSource: B&P (see text)\n— In this type of work, the strength of your findings are much bolstered when you see them across multiple sources. As B&P point out, the census findings are “consistent with the historic coverage gains measured in the Centers for Disease Control (CDC) and Prevention’s National Health Interview Survey and several private surveys…at 9.2 percent, the CDC’s estimated uninsured rate for the first quarter of 2015 was the lowest since the CDC began collecting these data in 1997 and more than 40 percent below the peak in 2010.”\n\n— The ACA takes a two-pronged attack on covering the uninsured, subsidizing private coverage through the exchanges and expanding Medicaid in the 25 states (as well as D.C.) that accepted that part of the deal. Both private and public coverage are making significant gains.\n\n— Speaking of anti-Obamacare ideology and its effect on people, B&P provide this revealing calculation: “If the uninsured rate had fallen in non-expansion states at the same rate as in expansion states, an additional 2.6 million uninsured Americans would have gained coverage last year.”\n\n— Coverage last year grew most quickly among households with income below $50,000; their uninsured rate fell from about 20 percent to about 15 percent — in one year! Households above $50,000 had lower uninsured rates to start with, so we would expect smaller changes, but they, too, went from 9 percent to 7.4 percent.\n\nSuch truths are more than inconvenient for sworn enemies of the law. They can bob and weave in their primaries, but once they reach the general election, their Democratic opponent will slam them on these points. And unless they’re willing to massively flip — pretty unimaginable, given the heat generated by the issue for conservatives — people represented in the numbers and charts above will rightfully fear that their health security is at stake.\n\nI cannot overemphasize the importance of these facts, and, yes, I know better than most that our political discourse too often exists in a fact-free zone. But the success of the ACA strikes at the heart of the dysfunction strategy employed extremely effectively by anti-government conservatives. This is the self-fulfilling-prophecy strategy that campaigns on: “Washington is broken! Send me there and I’ll make sure it stays that way!”\n\nRemember, when government messes up, when it shuts down, when it fails to address the things that matter to most people — economic opportunity, wage stagnation, affordable college — and, instead, squabbles about Planned Parenthood, the winners are those who can say: “See? We told you. Government’s broken!” Never mind that those making that case are the ones doing the breaking.\n\nBut with Obamacare, they’ve been failing, and the more we elevate that case, the closer we get to the road back to Factville.",The success of the Affordable Care Act is a hugely inconvenient truth for its opponents,agree,related,related,unrelated


### 4. Compare classifier performance

We are now going to introduce some tools that we can use for evaluating classifier performance. These are:
- Accuracy 
- Confusion Matrices
- Precision and Recall
- F1 score

#### 4a. Accuracy

Accuracy is defined as the ratio of the number of examples that were correctly predicted compared to the number of examples in the test dataset:
$\text{accuracy} = \dfrac{\text{number correct}}{\text{number of examples}}$

Let's calculate accuracy values for the classifier we saw yesterday, and compare this to the classifier which predicts 'unrelated' for all examples:

In [13]:
def calculate_accuracy(test_data, truth_col_name, prediction_col_name):
    number_correct = sum(test_data[truth_col_name] == test_data[prediction_col_name])
    number_examples = test_data.shape[0]
    accuracy = number_correct/number_examples
    return accuracy

In [14]:
# Calculate accuracy for the classifier we created yesterday:
calculate_accuracy(test_data, 'new_stance', 'prediction_1')

0.8271750678786448

In [15]:
# Calculate accuracy for the classifier that classifies all examples as 'unrelated':
calculate_accuracy(test_data, 'new_stance', 'prediction_2')

0.7220320308503522

#### 4b. Confusion Matrices

We have just seen that the classifier that predicts 'unrelated' for everything manages to obtain a high accuracy value. But there is something unsatisfying about this second classifier, and if we were using it to predict articles that might be Fake News (because the headline and article are unrelated), all articles would be marked as Fake News! This doesn't seem helpful! 

Perhaps we should be using other tools, in addition to accuracy, to evaluate our classifier...

We are now going to calculate something called a 'confusion matrix' for each of our classifiers:

In [16]:
def calculate_confusion_matrix(test_data, truth_col_name, prediction_col_name):
    cross_tab = pd.crosstab(test_data[truth_col_name], test_data[prediction_col_name])
    column_names = cross_tab.columns.values
    row_names = cross_tab.index
    # Check each row name has an equivalent column; if not, add column and fill with 0s
    for row in row_names:
        if row not in column_names:
            cross_tab[row] = 0
    # reorder columns so that order matches that of rows and return this
    return cross_tab[row_names] 

In [17]:
cf_mat_1 = calculate_confusion_matrix(test_data, 'new_stance', 'prediction_1')
cf_mat_1

prediction_1,related,unrelated
new_stance,Unnamed: 1_level_1,Unnamed: 2_level_1
related,4491,2573
unrelated,1819,16530


What does the confusion matrix tell us? 

If we look at the first entry of the confusion matrix, with value 4491, this tells us that 4491 examples in the test dataset had label 'related' (row value) and were correctly classified by our classifier as 'related' (column value).

If we look at the entry in the second row and first column (value 1819), this tells us that 1819 examples in the test dataset had label 'unrelated', but were classified as 'related' by our classifier. Similarly, 2573 examples had true label 'related' but were classified as 'unrelated' by our classifier; and 16,530 examples had true label 'unrelated' and were correctly predicted.

In [18]:
# Let's also look at the confusion matrix for the second classifier:
cf_mat_2 = calculate_confusion_matrix(test_data, 'new_stance', 'prediction_2')
cf_mat_2

prediction_2,related,unrelated
new_stance,Unnamed: 1_level_1,Unnamed: 2_level_1
related,0,7064
unrelated,0,18349


Sometimes, when trying to interpret the confusion matrix, it helps to divide the entries in the confusion matrix by either the number of test examples altogether, or by the number of examples in each class (i.e. divide each entry in the matrix by the sum of the elements in its row). Let's do the latter:

In [19]:
def calculate_class_accuracies(confusion_matrix):
    # Calculate row sum for confusion matrix (total number of examples in a particular class):
    row_sums = confusion_matrix.sum(1)
    # Divide each element by its row sum to calculate class accuracies:
    class_acc_mat = confusion_matrix.divide(row_sums, axis = 0)
    # Add column which sums row proportions
    class_acc_mat['row_sum'] = class_acc_mat.sum(1)
    return class_acc_mat

In [20]:
class_acc_mat_1 = calculate_class_accuracies(cf_mat_1)
class_acc_mat_1

prediction_1,related,unrelated,row_sum
new_stance,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
related,0.635759,0.364241,1.0
unrelated,0.099133,0.900867,1.0


What does the scaled confusion matrix above tell us? If we look at the first entry (0.635759), this tells us that ~64% of the examples in our test dataset that were marked as 'related' were properly classified by our classifier, while 36% were incorrectly classified (entry in first row, second column). In comparison, 90% of the 'unrelated' examples in the test dataset were correctly classified, while ~10% were incorrectly classified. This means that our classifier is better at identifying 'unrelated' examples compared to 'related' examples, and that we should think about other features we can use to distinguish 'related' examples from 'unrelated' examples.

What happens when we calculate the class accuracies for the second classifier?

In [21]:
class_acc_mat_2 = calculate_class_accuracies(cf_mat_2)
class_acc_mat_2

prediction_2,related,unrelated,row_sum
new_stance,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
related,0.0,1.0,1.0
unrelated,0.0,1.0,1.0


When we calculate the scaled confusion matrix here, we see that, as expected, 100% of 'unrelated' examples are correctly classified, while 100% of 'related' examples are incorrectly classified.

#### 4c. Precision and Recall

From the confusion matrix, we can calculate some additional quantities. In particular, we can calculate **precision** and **recall** scores for our classifiers. In our case, precision is defined as:

$\text{precision} = \dfrac{\text{Number of unrelated examples correctly classified}}{\text{Number of examples classified as 'unrelated'}}$

In [22]:
def calculate_precision(confusion_matrix):
    unrelated_correct = confusion_matrix[confusion_matrix.index == 'unrelated']['unrelated']
    classified_as_unrelated = confusion_matrix.sum(0)['unrelated']
    precision = (unrelated_correct/classified_as_unrelated)[0]
    return precision

In [23]:
# Let's now calculate the precision value for the first classifier with its confusion matrix:
calculate_precision(cf_mat_1)

0.8653091137517668

In [24]:
# Let's do the same for our second classifier:
calculate_precision(cf_mat_2)

0.7220320308503522

Recall is defined as:
$\text{recall} = \dfrac{\text{Number of unrelated examples correctly classified}}{\text{Number of unrelated examples}}$

In [25]:
def calculate_recall(confusion_matrix):
    unrelated_correct = confusion_matrix[confusion_matrix.index == 'unrelated']['unrelated']
    number_unrelated = confusion_matrix.sum(1)['unrelated']
    recall = (unrelated_correct/number_unrelated)[0]
    return recall

In [26]:
# Let's now calculate the recall value for the first classifier with its confusion matrix:
calculate_recall(cf_mat_1)

0.9008665322360891

In [27]:
# Let's now calculate the recall value for the second classifier with its confusion matrix:
calculate_recall(cf_mat_2)

1.0

In this notebook, we have introduced **accuracy**, **confusion matrices**, **precision** and **recall** as tools for evaluating classifier performance. We have compared two possible classifiers for the Fake News Challenge, and have seen that, while one classifier may get a higher score according to one metric, it may get a lower score when using a different metric (our classifier from yesterday returns higher accuracy and precision scores compared to the classifier that labels everything as 'unrelated', but has a lower recall score). The metric that is used to evaluate classifier performance is often specific to the nature of the problem being studied, and often multiple metrics are used.

### 5. Extra Challenge

1. Take a look at some of the examples that were misclassified by our proportion overlap classifier. Why do you think our proportion overlap model failed? Can you think of a rule that would properly classify these examples?  Why not test it out?
2. Yesterday we created a four-way classifier that was able to predict 'agree', 'disagree', 'discuss' and 'unrelated'. Calculate the confusion matrix for this classifier. In this case, how many rows and columns would the matrix have? Compare the confusion matrix for this classifier to the classifier that predicts 'unrelated' for everything. What does the confusion matrix tell us about each of these models? What are the strengths and weaknesses of both of these models?