## Load Truth and Predictions

In [1]:
import pandas as pd
from tqdm import tqdm

**Ground Truth**

In [2]:
df_truth = pd.read_csv('labeled_sameMovie_truth.txt', sep='\t', header=None)
print(f'df_truth.shape: {df_truth.shape}')
df_truth.head(3)

df_truth.shape: (598, 3)


Unnamed: 0,0,1,2
0,7012,1050,0
1,46,20,0
2,26,72,0


In [3]:
# Convert to a dict with double index
dict_truth = {}
for idx, (idx_a, idx_b, truth) in tqdm(df_truth.iterrows(), total=df_truth.shape[0]):
    dict_truth[(int(idx_a), int(idx_b))] = int(truth)
len(dict_truth)

100%|██████████████████████████████████████████████████████████████████████████████| 598/598 [00:00<00:00, 7571.79it/s]


598

In [4]:
dict_truth_inverse = {0:[], 1:[]}
for idx, (idx_a, idx_b, truth) in tqdm(df_truth.iterrows(), total=df_truth.shape[0]):
    if truth >= 0.5:
        dict_truth_inverse[1].append((int(idx_a), int(idx_b)))
    else:
        dict_truth_inverse[0].append((int(idx_a), int(idx_b)))
len(dict_truth_inverse[0]) + len(dict_truth_inverse[1])

100%|██████████████████████████████████████████████████████████████████████████████| 598/598 [00:00<00:00, 9646.01it/s]


598

**Predictions**

In [5]:
df_pred = pd.read_csv('./inferred-predicates_hw8/SAMEMOVIE.txt', sep='\t', header=None)
print(f'df_pred.shape: {df_pred.shape}')
df_pred.head(3)

df_pred.shape: (109876, 3)


Unnamed: 0,0,1,2
0,103,1115,0.009894
1,13,4453,0.000428
2,79,2012,0.00125


The size of the my DataFrame with predictions is smaller than that of the DataFrame with the truth because the truth DataFrame containts many true non-matches, which weren't even generated by my code as nearly all of those non-matches didn't even fall within the same block, making it impossible for me to even predict their non-matchess.

In [6]:
# Convert to a dict with double index
dict_pred = {}
for idx, (idx_a, idx_b, call) in tqdm(df_pred.iterrows(), total=df_pred.shape[0]):
    dict_pred[(int(idx_a), int(idx_b))] = 1 if call >= 0.5 else 0
len(dict_pred)

100%|████████████████████████████████████████████████████████████████████████| 109876/109876 [00:11<00:00, 9581.69it/s]


109876

In [7]:
dict_pred_inverse = {0:[], 1:[]}
for idx, (idx_a, idx_b, call) in tqdm(df_pred.iterrows(), total=df_pred.shape[0]):
    if call >= 0.5:
        dict_pred_inverse[1].append((int(idx_a), int(idx_b)))
    else:
        dict_pred_inverse[0].append((int(idx_a), int(idx_b)))
len(dict_pred_inverse[0]) + len(dict_pred_inverse[1])

100%|████████████████████████████████████████████████████████████████████████| 109876/109876 [00:11<00:00, 9952.73it/s]


109876

### Precision

"Of all calls I made, how many were correctly made"

In [8]:
TP = 0
FP = 0

# dict_pred_inverse[1] = all the calls I made
for match in dict_pred_inverse[1]:
    # Consider only the samples in the labeled dataset
    if match in dict_truth.keys():
        if dict_truth.get(match) == 1:
            TP += 1
        else:
            FP += 1
        
precision = TP / (TP + FP)

print(f'TP: {TP:>4}')
print(f'FP: {FP:>4}')
print(f'Precision: {precision:.5f}')

TP:  173
FP:    2
Precision: 0.98857


### Recall

"Of all calls I should have made, how many did I make?"

In [9]:
TP = 0
FN = 0

# dict_truth_inverse[1] = all the calls I should have made
for match in dict_truth_inverse[1]: 
    if dict_pred.get(match) == 1:
        TP += 1
    else:
        FN += 1
        
recall = TP / (TP + FN)

print(f'TP: {TP:>4}')
print(f'FN: {FN:>4}')
print(f'Recall: {recall:.5f}')

TP:  173
FN:   16
Recall: 0.91534


### F1-Score

In [10]:
F1 = (2 * precision * recall) / (precision + recall)
print(f'F1 Score: {F1:.5f}')

F1 Score: 0.95055


Matheus Schmitz

USC ID: 5039286453