# Create F1 score calculator

The F1 mean score is used in this competition. 

```
F1 = 2 * (Precision * Recall) / (Precision + Recall)
```

...where...

```
Precision = True Positives / (True Positives + False Positives)
Recall = True Positives / (True Positives + False Negatives)
```

In [10]:
# F1 takes two dictionaries, predicted and actual, and compares the two, returning the F1 score, 
# ...or raising an error
def f1(pred,actual,printBool = True):
    # Error Checking input:
    # Are they the same size?
    if len(pred.keys()) != len(actual.keys()):
        raise AssertionError('Prediction set is not the same size as actual set')
    
    # Create aggregate scores:
    true_positives, false_positives, false_negatives = 0, 0, 0
    
    # For each order
    for order in actual.keys():
        if order not in pred:
            raise KeyError('order in actual not in prediction')
        else:
            # True positives is the size of the intersection
            true_positives += len(actual[order].intersection(pred[order]))
            # False positives are products in the prediction not in actual
            false_positives += len(pred[order] - actual[order])
            # False negatives are products in actual not in prediction
            false_negatives += len(actual[order] - pred[order])
    
    # Calculate F1
    precision = true_positives / (true_positives + false_positives) 
    recall = true_positives / (true_positives + false_negatives)
    
    # If there were zero true_positives, trivially call F1 0 to avoid division by zero:
    F1 = 0 if true_positives == 0 else 2 * ( (precision * recall) / (precision + recall) )
    
    if printBool: 
        print("True Positives:  " + str(true_positives))
        print("False Positives: " + str(false_positives))
        print("False Negatives: " + str(false_negatives))
        print("Precision:       " + str(precision))
        print("Recall:          " + str(recall))
        print("----------------------------")
        print("F1: " + str(F1))
          
    return F1, true_positives, false_positives, false_negatives
    

# Test F1

## Perfect Match 

Let's check this implementation of F1 - first of all, if I submit the exact same dictionary for pred and actual, F1 should be 1.000.

In [5]:
# Create orders 0 through 99, then create between 0 and 100 random products
import numpy as np

random_products = {}

for i in range(100):
    basket_size = np.random.randint(0, 100)
    random_products[i] = set()
    if basket_size == 0:
        random_products[i].add("None")
    else:
        for j in range(basket_size):
            random_products[i].add(str(np.random.randint(0,10000)))

        
# Calculate F1 on random_products against itself
# Confirm that F1 score is 1.0
test1 = f1(random_products, random_products, False)
assert test1[0] == 1.0

## One extra product

For every order in random_products, append a false product, confirm that false positives == 100

In [6]:
import copy

# Copy random_products
random_products_plus_one_wrong = copy.deepcopy(random_products)

# For each item, add product 10001, which cannot be in any order
for i in range(100):
    random_products_plus_one_wrong[i].add("10001")
    

# Confirm that false products (the third entry in the returned tuple) == 100
test2 = f1(random_products_plus_one_wrong, random_products, False)
assert test2[2] == 100

print('\n')

# If I call random_products_plus_one_wrong the "actual results" and random_products the predictions,
# that flips false positives to false negatives. 
# Assert that there are 100 false negatives here
test3 = f1(random_products, random_products_plus_one_wrong, False)
assert test3[3] == 100

# Thus, F1 is symmetrical. Assert this: 
assert test2[0] == test3[0]





## Totally Wrong

Create a new set of orders that gets every single order wrong. Assert that F1 == 0.0


In [7]:
all_wrong = {}

for i in range(100):
    all_wrong[i] = set()
    all_wrong[i].add("10001")

# Since product "10001" was not in any order, this should get a 0 F1 score
test4 = f1(all_wrong, random_products, False)
assert test4[0] == 0

## Missing Orders

Make sure f1 throws an exception if the order ids are not the same size

In [8]:
wrong_size = {}

for i in range(99):
    wrong_size[i] = set()
    
# Try and catch an assertion error
try: 
    f1(wrong_size, random_products, False)
except(AssertionError):
    pass
   

## Not same order ids

Ensure f1 throws a Key error if the orders don't match, even if size matches

In [9]:
wrong_orders = {}

for i in range(1, 101):
    wrong_orders[i] = set()
    
# Try and catch a KeyError
try:
    f1(wrong_orders, random_products, False)
except(KeyError):
    pass