CSCI 3832: Lecture 8, investigating classifiers, precision, recall
===========
1/31/2020, Spring 2020, Muzny

Relevant textbook sections: 4.1, 4.3, 4.7

Today, we'll be spending our time investigating some classifiers that we've trained for you.

All three of these classifiers are Naïve Bayes classifiers. For a given new, unlabeled document, they calculate:

$$ P(feature_1, feature_2, feature_3, ..., feature_n | c)P(c)$$

Where $c$ is a candidate class. They then select the class that has the highest probability to be the actual label of the new document.


Task 1: Which Classifier is Which?
-------------------------
We have given you 3 Naïve Bayes classifiers. All three of these are binary classifiers that choose between the label '0' or '1' (these are strings).

- one of these classifiers is an authorship attributor
- one of these classifiers is a language identifier
- one of these classifiers is a sentiment analyser

Your first job is to conduct experiments to determine two things:
1. Which classifier is which?
2. What specific classes do you believe that they are choosing between? (what are better labels for each classifier than '0' and '1'?)
    1. Note: this is a difficult task. It is of utmost importance that you consider the particular data set that they were trained on. I will tell you that they were trained using some of [nltk's available corpora](http://www.nltk.org/nltk_data/).

In [14]:
#TODO: Chakrya Ros
# Feel free to work in groups of 2 - 3/talk to your neighbors

# You'll be turning this notebook in at the end of lecture today 
# as a pdf
# File -> Download As -> .html -> open in a browser -> print to pdf
# (one submission per group)
# Please make a comment on your submission with your name and the name(s)
# of your partners as well!

In [4]:
# load your trained classifiers from pickled files
# (we've already trained your classifiers for you)
import pickle
import matplotlib.pyplot as plt # for graphing
#import nltk  # not necessary, but you can uncomment if you want

# add more imports here as you would like

In [5]:
# This function converts a list of words so that they are featurized
# for nltk's format for bag-of-words
# params:
# words - list of words where each element is a single word 
# return: dict mapping every word to True
def word_feats(words):
    return dict([(word, True) for word in words])

f = open('classifier1.pickle', 'rb')
classifier1 = pickle.load(f)
f.close()

f = open('classifier2.pickle', 'rb')
classifier2 = pickle.load(f)
f.close()

f = open('classifier3.pickle', 'rb')
classifier3 = pickle.load(f)
f.close()

# in a list, if you find that helpful
classifiers = [classifier1, classifier2, classifier3]

In [7]:
classifier3.show_most_informative_features(n=20)

TypeError: 'NoneType' object is not callable

In [17]:
# Here's an example of how to run a test sentence through the classifiers
# edit at your leisure
test = "All that glitters is not gold Fair is foul, and foul is fair: Hover through the fog and filthy air."
# you can either split on whitespace or use nltk's word_tokenize
featurized = word_feats(test.split()) 
for classifier in classifiers:
    print(classifier.prob_classify(featurized).samples())  # will tell you what samples are available
    print(classifier.prob_classify(featurized).prob('0'))  # get the probability for class '0'
    print(classifier.prob_classify(featurized).prob('1'))  # get the probability for class '1'
    print(classifier.classify(featurized))  # just get the label that it wants to assign

dict_keys(['0', '1'])
0.4366186617451996
0.5633813382548005
1
dict_keys(['0', '1'])
1.8810499402744543e-18
0.9999999999999951
1
dict_keys(['0', '1'])
0.46625312782388734
0.5337468721761123
1


In [18]:
# TODO: put in as many experiments as you'd like here (and feel free to add more cells as needed)
# we recommend testing a variety of sentences. You can make these up or get them from sources
# on the web
# test = "សួស្តីខ្ញុំជានិស្សិត"
test = " RT @JohnGGalt: Amazing—after years of attacking Donald Trump the media managedto turn #InaugurationDay into all about themselves."
# you can either split on whitespace or use nltk's word_tokenize
featurized = word_feats(test.split()) 
for classifier in classifiers:
    print(classifier.prob_classify(featurized).samples())  # will tell you what samples are available
    print(classifier.prob_classify(featurized).prob('0'))  # get the probability for class '0'
    print(classifier.prob_classify(featurized).prob('1'))  # get the probability for class '1'
    print(classifier.classify(featurized))  # just get the label that it wants to assign

dict_keys(['0', '1'])
0.2115767801050558
0.7884232198949435
1
dict_keys(['0', '1'])
1.0847049928484364e-18
0.9999999999999926
1
dict_keys(['0', '1'])
0.9905583523654526
0.009441647634547445
0


TODO: Answer the questions outlined at the beginning of this task here (please keep __bold__ formatting in this notebook):

1. Which classifier is which?
    1. classifier1 is __Sentiment Analysis__
    1. classifier2 is __language identifier__
    1. classifier3 is __authorship attributor__
2. What specific classes do you believe that they are choosing between?
    1. classifier1's '0' label should be __Negative__ and its '1' label should be __Positive__
    1. classifier2's '0' label should be __Non-language identifier__ and its '1' label should be __language identifier__
    1. classifier3's '0' label should be __Non-Shakespeare wrote__ and its '1' label should be __Shakespeare wrote__

Task 2: Investigating Accuracy, Precision, and Recall
---------------------------------------------
Textbook: 4.7

When we are determining how well a classifier is doing, we can look at overall accuracy:

$$ accuracy = \frac{true_{pos} + true_{neg}}{true_{pos} + false_{pos} + true_{neg} + false_{neg}} $$



In [None]:
# TODO: implement this accuracy function, 
# then test the accuracy of two of the three classifiers from task 1.

# Params: gold_labels, a list of labels assigned by hand ("truth")
# predicted_labels, a corresponding list of labels predicted by the system
# return: double accuracy (a number from 0 to 1)
def accuracy(gold_labels, predicted_labels):
    true_pos = 0
    true_neg = 0
    for i in range(len(gold_labels)):
        if gold_labels[i] == 1 and predicted_labels[i] == 1:
            true_pos += 1
        else:
            true_neg +=1
       
            
            


# test the accuracy of two of your classifiers.
# Note: this requires knowing what labels your test data should have!
godl_labels = [ 1, 0, 0,1]
predicted= [ 1, 0, 1, 1]

Next, (if you get this far).

Often, however, it is more useful to look at __precision__ and __recall__ to determine how well a classifier is doing. This is especially important if we're dealing with imbalanced classes (one class occurs more frequently than another).

$$ precision = \frac{true_{pos}}{true_{pos} + false_{pos}} $$



$$ recall = \frac{true_{pos}}{true_{pos} + false_{neg}} $$

To make this calculation, we'll need to choose which label is associated with "positive" and which is associated with "negative". For our purposes, we'll choose the label '1' to be our "positive" label.

Answer the following questions:

1. Suppose you wanted a very precise system, but didn't care about recall. How would you achieve this?
    1. __YOUR ANSWER HERE__

2. Suppose you wanted a system with the best recall, but didn't care about precision. How would you achieve this?
    1. __YOUR ANSWER HERE__


In [None]:
# TODO: implement the precision and recall functions, 
# then test the precision/recall of two of the three classifiers from task 1.

# Params: gold_labels, a list of labels assigned by hand ("truth")
# predicted_labels, a corresponding list of labels predicted by the system
# target_label (default value '1') - the label associated with "positives"
# return: double precision (a number from 0 to 1)
def precision(gold_labels, predicted_labels, target_label = '1'):
    pass

# Params: gold_labels, a list of labels assigned by hand ("truth")
# predicted_labels, a corresponding list of labels predicted by the system
# target_label (default value '1') - the label associated with "positives"
# return: double recall (a number from 0 to 1)
def recall(gold_labels, predicted_labels, target_label = '1'):
    pass

In [8]:
pre = (10/(10+10))
print(pre)
reca= (10/(10+5))
print(reca)

0.5
0.6666666666666666


In [1]:
#accuracy= (true_pos + true_neg)/ (true_pos +false_pos+ true_negfalse_pos)

acc = (30+40)/(30+10+40+20)
print(acc)

0.7


In [4]:
#pre = ture_pos /(true_pos+false_pos)
pre = 30/(30+10)
recall = 30/(30+20)
print("pre = ", 30/(30+10))
print("recall = ", 30/(30+20))

pre =  0.75
recall =  0.6


In [5]:
#f1 = 2 * (prec*recall)/(prec + recall)
f1 = 2 * ((pre*recall)/(pre+recall))
print("f1 = ", f1)

f1 =  0.6666666666666665


In [6]:
w = [1,2,0.5]
x = [10,20,30]

print(sum(i*j for i,j in zip(w,x)))

65.0


In [7]:
print(10+(20*2)+(0.5*30))

65.0


In [11]:
pos =  (.09*.07*.29*.04*.08)
neg = (.16*.06*.06*.15*.11)
if pos > neg:
    print("pos")
else:
    print("neg")

neg


In [5]:
spam =  (0.27*0.01*0.16*0.20*0.11*0.5)
not_spam = (.10*.01*.27*.29*.21*.5)
print(spam)
print(not_spam)
if spam > not_spam:
    print("spam ", spam)
else:
    print("Not spam ", not_spam)

4.752e-06
8.221499999999999e-06
Not spam  8.221499999999999e-06


In [2]:
p_good_pos = ((3.0/2.2)(4.2/5.0))/(2.0/5.0)
print(p_good_pos)

TypeError: 'float' object is not callable

In [3]:
100.0/9

11.11111111111111