# Mini-Project - Evaluation Metrics

##### Student Tags

Author: Anderson Hitoshi Uyekita    
Mini-Project: Evaluation Metrics
Course: Data Science - Foundations II  
COD: ND111  
Date: 24/01/2019    

***

## Table of Contents
- [Introduction](#intro)
- [Exercise 1](#part_i_1)
- [Exercise 2](#part_i_2)
- [Exercise 3](#part_i_3)
- [Exercise 4](#part_i_4)
- [Exercise 5](#part_i_5)
- [Exercise 6](#part_i_6)
- [Exercise 7](#part_i_7)
- [Exercise 8](#part_i_8)
- [Exercise 9](#part_i_9)
- [Exercise 10](#part_i_10)
- [Exercise 11](#part_i_11)
- [Exercise 12](#part_i_12)
- [Exercise 13](#part_i_13)
- [Exercise 14](#part_i_14)
- [Exercise 15](#part_i_15)
***

In [1]:
# Importing Libraries.
import numpy as np
import pandas as pd

## General Information

This Jupyter Notebook (in Python 2) aims to create a reproducible archive.

## Introduction <a id='intro'></a>

Go back to your code from the last lesson, where you built a simple first iteration of a POI identifier using a decision tree and one feature. Copy the POI identifier that you built into the skeleton code in evaluation/evaluate_poi_identifier.py. Recall that at the end of that project, your identifier had an accuracy (on the test set) of 0.724. Not too bad, right? Let’s dig into your predictions a little more carefully.  

<br>

<em>
From Python 3.3 forward, a change to the order in which dictionary keys are processed was made such that the orders are randomized each time the code is run. This will cause some compatibility problems with the graders and project code, which were run under Python 2.7. To correct for this, add the following argument to the featureFormat call on line 25 of evaluate_poi_identifier.py:

sort_keys = '../tools/python2_lesson14_keys.pkl'

This will open up a file in the tools folder with the Python 2 key order.
</em>

## Exercise 1 - Number of POIs in Test Set <a id='part_i_1'></a>

>**How many POIs are predicted for the test set for your POI identifier?**

(Note that we said test set! We are not looking for the number of POIs in the whole dataset.)

In [2]:
#!/usr/bin/python


"""
    Starter code for the validation mini-project.
    The first step toward building your POI identifier!

    Start by loading/formatting the data

    After that, it's not our code anymore--it's yours!
"""

import pickle
import sys
sys.path.append("../tools/")
from feature_format import featureFormat, targetFeatureSplit

data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )

### first element is our labels, any added elements are predictor
### features. Keep this the same for the mini-project, but you'll
### have a different feature list when you do the final project.
features_list = ["poi", "salary"]

data = featureFormat(data_dict, features_list)
labels, features = targetFeatureSplit(data)

In [3]:
# Importing the model selection module.
from sklearn.model_selection import train_test_split

# Splitting the dataset into train and test.
features_train, features_test, labels_train, labels_test = train_test_split(features, labels,test_size=0.30, random_state=42)

### it's all yours from here forward!  

# Importing the Scikit Learn Decision Tree Module.
from sklearn import tree

# Creating the Classifier.
clf = tree.DecisionTreeClassifier()

# Fitting/Training with ALL observations.
clf = clf.fit(features_train, labels_train)

# Predicting using the same dataset. OVERFITTING!!
pred = clf.predict(features_test)

# Importing numpy.
import numpy as np

# Importing the Accuracy module.
from sklearn.metrics import accuracy_score

# Calculating the accuracy of the overfitted model.
acc = accuracy_score(labels_test, pred)

In [4]:
# Printing the number of POI
print "Number of POIs:", sum(pred)

Number of POIs: 4.0


>**How many POIs are predicted for the test set for your POI identifier?**

Number of POIs: 4.0

## Exercise 2 - Accuracy of a Biased Identifier <a id='part_i_2'></a>

>**If your identifier predicted 0. (not POI) for everyone in the test set, what would its accuracy be?**

86.21%

In [5]:
# Assuming pred equal to a vector of zeros with length of 29 (len(labels_test)).
accuracy_score(labels_test, np.zeros((len(labels_test),), dtype=int))

0.8620689655172413

## Exercise 3 - Number of True Positives <a id='part_i_3'></a>

Look at the predictions of your model and compare them to the true test labels. Do you get any true positives? (In this case, we define a true positive as a case where both the actual label and the predicted label are 1)

- [ ] Yes, many
- [ ] Yes, only one
- [x] Nope

In [6]:
# Creating a dataframe to work with.
comparison = pd.DataFrame(labels_test, columns = ['correct'], dtype = int )

# Adding the predicted as a column.
comparison['predicted'] = pred.astype(int)

# Creating a columns of comparison.
comparison['comparison'] = (comparison.correct == comparison.predicted)

# Printing the dataframe of comparison.
comparison.head()

Unnamed: 0,correct,predicted,comparison
0,0,0,True
1,0,0,True
2,0,0,True
3,0,0,True
4,0,1,False


In [7]:
# True positive = correct and predict are equal to 1.
true_positive = comparison[(comparison.correct == 1) & (comparison.predicted == 1)]

# Number of rows/observations.
print "Number of True Positives: ", true_positive.shape[0]

Number of True Positives:  0


>**Does your identifier have any true positives?**

- [ ] Yes, many
- [ ] Yes, only one
- [x] Nope

## Exercise 4 - Unpacking Into Precision and Recall<a id='part_i_4'></a>

As you may now see, having imbalanced classes like we have in the Enron dataset (many more non-POIs than POIs) introduces some special challenges, namely that you can just guess the more common class label for every point, not a very insightful strategy, and still get pretty good accuracy!

Precision and recall can help illuminate your performance better. Use the precision_score and recall_score available in sklearn.metrics to compute those quantities.

What’s the precision?

In [8]:
# Importing the metrics module from Scikit Learn package to calculate precision_score.
from sklearn.metrics import precision_score

# Using the function to calculate the precision.
precision_score(labels_test, pred)  

0.0

>**What is the precision of your POI identifier?**

0

## Exercise 5 - Recall of Your POI Identifier <a id='part_i_5'></a>

What’s the recall? 

(Note: you may see a message like UserWarning: The precision and recall are equal to zero for some labels. Just like the message says, there can be problems in computing other metrics (like the F1 score) when precision and/or recall are zero, and it wants to warn you when that happens.) 

Obviously this isn’t a very optimized machine learning strategy (we haven’t tried any algorithms besides the decision tree, or tuned any parameters, or done any feature selection), and now seeing the precision and recall should make that much more apparent than the accuracy did.

In [9]:
# Importing the metrics module from Scikit Learn package to calculate recall score.
from sklearn.metrics import recall_score

# Using the function to calculate the precision.
recall_score(labels_test, pred)

0.0

>**What is the recall of your POI identifier?**

0

## Exercise 6 - How Many True Positives? <a id='part_i_6'></a>

In the final project you’ll work on optimizing your POI identifier, using many of the tools learned in this course. Hopefully one result will be that your precision and/or recall will go up, but then you’ll have to be able to interpret them. 

Here are some made-up predictions and true labels for a hypothetical test set; fill in the following boxes to practice identifying true positives, false positives, true negatives, and false negatives. Let’s use the convention that “1” signifies a positive result, and “0” a negative. 

predictions = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]   
true labels = [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]  

How many true positives are there?

In [10]:
# Creating a function to analize the True Positives.
def is_true_positive(predictions, true_labels):
    """
    Returns the number of occurencies of true positive.
    Both inputs are predictions and true_labels.
    """
    # Creating a dataframe.
    predictions = pd.DataFrame(predictions, columns = ['predictions'])
    predictions['true_labels'] = true_labels

    # True Positives
    true_positive = predictions[(predictions.predictions == 1) & (predictions.true_labels == 1)]
    
    return true_positive.shape[0]

In [11]:
# Given results.
predictions = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1] 
true_labels = [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]

# Using the given inputs.
print "Number of True Positives:", is_true_positive(predictions, true_labels)

Number of True Positives: 6


>**How many true positives are there?**

Number of True Positives: 6

## Exercise 7 - How Many True Negatives? <a id='part_i_7'></a>

Suppose our data looks like this:

predictions = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]   
true labels = [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]

(this is fabricated data, just to give you some practice)

In [12]:
# Given results.
predictions = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]
true_labels = [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]

predictions = pd.DataFrame(predictions, columns = ['predictions'])
predictions['true_labels'] = true_labels

# True Negative = true label and predictions equal to zero.
true_negative = predictions.query('true_labels == "0"').query('predictions == "0"')

# Using the given inputs.
print "Number of True Negatives:", true_negative.shape[0]

Number of True Negatives: 9


>**How many true negatives are there in this example?**

Number of True Negatives: 9

## Exercise 8 - False Positives? <a id='part_i_8'></a>

>**How many false positives are there?**

In [13]:
# False Positives = Is truly false but predict as positive.
true_negative = predictions.query('true_labels == "0"').query('predictions == "1"')

# Using the given inputs.
print "Number of False Positives:", true_negative.shape[0]

Number of False Positives: 3


## Exercise 9 - False Negatives? <a id='part_i_9'></a>

>**How many false negatives are there?**

In [14]:
# False Negatives = Is trully positive but predicted as negative.
true_negative = predictions.query('true_labels == "1"').query('predictions == "0"')

# Using the given inputs.
print "Number of False Negative:", true_negative.shape[0]

Number of False Negative: 2


## Exercise 10 - Precision <a id='part_i_10'></a>

>**What's the precision of this classifier?**

$$\text{Precision} = \frac{\text{True Positive}}{\text{True Positive + False Positive}} = \frac{6}{6 + 3} = \frac{2}{3} = 0.6666$$

In [15]:
# Given results.
predictions = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]
true_labels = [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]

# Using the function to calculate the precision.
print "Precision: ", precision_score(true_labels, np.array(predictions))  

Precision:  0.6666666666666666


## Exercise 11 - Recall <a id='part_i_11'></a>

>**What's the recall of this classifier?**

$$\text{Recall} = \frac{\text{True Positive}}{\text{True Positive + False Negative}} = \frac{6}{6+2} = \frac{3}{4} = 0.75$$

In [16]:
# Using the function to calculate the precision.
print "Recall: ", recall_score(true_labels, np.array(predictions))  

Recall:  0.75


## Exercise 12 - Making Sense of Metrics 1 <a id='part_i_12'></a>


Fill in the blank:

“My true positive rate is high, which means that when a _ _ _ _ is present in the test data, I am good at flagging him or her.”

- [x] POI
- [ ] non-POI

## Exercise 13 - Making Sense of Metrics 2 <a id='part_i_13'></a>

Fill in the blanks.

“My identifier doesn’t have great _ _, but it does have good _ _ _ _. That means that whenever a POI gets flagged in my test set, I know with a lot of confidence that it’s very likely to be a real POI and not a false alarm. On the other hand, the price I pay for this is that I sometimes miss real POIs, since I’m effectively reluctant to pull the trigger on edge cases.”

- [ ] recall/precision
- [x] precision/recall
- [ ] F1 score/recall
- [ ] precision/F1 score

## Exercise 14 - Making Sense of Metrics 3 <a id='part_i_14'></a>

“My identifier doesn’t have great _ _, but it does have good _ _ _ _. That means that whenever a POI gets flagged in my test set, I know with a lot of confidence that it’s very likely to be a real POI and not a false alarm. On the other hand, the price I pay for this is that I sometimes miss real POIs, since I’m effectively reluctant to pull the trigger on edge cases.”

- [x] recall/precision
- [ ] precision/recall
- [ ] F1 score/recall
- [ ] precision/F1 score


## Exercise 15 - Making Sense of Metrics 4 <a id='part_i_15'></a>

“My identifier has a really great _ _.

This is the best of both worlds. Both my false positive and false negative rates are _ _, which means that I can identify POI’s reliably and accurately. If my identifier finds a POI then the person is almost certainly a POI, and if the identifier does not flag someone, then they are almost certainly not a POI.”

- [ ] recall/precision
- [ ] precision/recall
- [x] F1 score/recall
- [ ] precision/F1 score

#### Copying file

In [17]:
# Importing shutil to deal with copy
from shutil import copyfile

# File name
filename = 'evaluate_poi_identifier.ipynb'

# Lesson
lesson = '15-Lesson_15'

# Directory to make a copy
dir_copy = '../../' + lesson + '/00-Mini Project/' + filename

# Copying file.
copyfile(filename, dir_copy)