# Evaluation of Blocking

For each entity we select the largest cluster (block) for that entity. True positives are items that are in the largest cluster and are correctly associated with that entity, false positives are items in the cluster that are not related to the entity and false negatives are items that should be in the cluster but aren't. 

We indicate true positives with $TP$, false positives with $FP$ and false negatives with $FN$,
then we have the equations for $precision$, $recall$ and $f{\text -}measure$:

$$ precision = \frac{TP}{TP + FP} \\ $$
$$ recall = \frac{TP}{TP + FN} \\ $$
$$ f{\text -}measure = \frac{2 \cdot precision \cdot recall}{precision + recall} = \frac{2 \cdot TP}{2 TP + FP + FN} \\ $$


In [2]:
import paths
import json
import evaluation

In [6]:
algorithm = "hdbscan"

evaluation.generate_entity2clusters(algorithm)

with open(paths.RESULTS_DIR + "/evaluation/" + algorithm + "_entity2clusters.json","r") as file:
    entity2clusters: dict[str, dict[str, list[str]]] = json.load(file)
    
with open(paths.RESULTS_DIR + "/clustering/" + algorithm + "/" + algorithm + "_clusters.json", "r") as file:
    alg_clusters = json.load(file)

In [8]:
TP = 0
FP = 0
FN = 0

In [9]:
# Remove clusters with id -1 and count the number of false negatives items in those clusters
for clusters in entity2clusters.values():
    for cluster_id, values in clusters.items():
        if cluster_id == '-1':
            FN += len(values)
            del clusters[cluster_id]
            break

In [10]:
for entity, clusters in entity2clusters.items():
    if len(clusters) == 0:
        continue
    # Get the cluster with the most elements
    max_cluster = max(clusters.values(), key=len)
    
    # Get the id of the cluster with the most elements
    for key,value in clusters.items():
        if len(value) == len(max_cluster):
             max_cluster_id = key
             
    # Calculate False Positives (FP) as elements in the largest cluster that should not be there
    for item in alg_clusters[max_cluster_id]:
        if item not in max_cluster:
            FP += 1   
                                 
    # Calculate True Positives (TP) as elements in the largest cluster
    TP += len(max_cluster)
    
    # Calculate False Negatives (FN) as elements in the other clusters
    for cluster in clusters.values():
        if cluster != max_cluster:
            FN += len(cluster)
     
precision = TP / (TP + FP) if (TP + FP) > 0 else 0
recall = TP / (TP + FN) if (TP + FN) > 0 else 0
f_measure = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F-measure: {f_measure}")

Precision: 0.0019691966869804177
Recall: 0.8970652650021901
F-measure: 0.003929766929130837


## PostProcesing of results from LLM query

In [5]:
llm_results_file = paths.RESULTS_DIR + "/pairwise_matching/RAW_matching_results_cluster_397.json"

with open(llm_results_file, 'r', encoding='utf-8') as file:
    llm_results_cluster_397: dict[str, list[list[str]]] = json.load(file)

result_cluster_397 = llm_results_cluster_397["397"]

post_processed_results: list[tuple[str, str, str]] = []

for [item1, item2, result] in result_cluster_397:
    if "same" in result:
        post_processed_results.append((item1, item2, "MATCH"))
    elif "different" in result:
        post_processed_results.append((item1, item2, "NO_MATCH"))

post_processed_results_file = paths.RESULTS_DIR + "/pairwise_matching/matching_results_cluster_397_post_processed.json"

with open(post_processed_results_file, "w") as file:
    json.dump(post_processed_results, file, indent=4)




In [7]:
TP = 0
FP = 0
FN = 0

In [9]:
entity2cluster397: list[str] = entity2clusters["ENTITY#011"]["397"]

for item1, item2, result in post_processed_results:
    if result == "MATCH":
        if item1 in entity2cluster397 and item2 in entity2cluster397:
            TP += 1
        else:
            FP += 1
    elif item1 in entity2cluster397 and item2 in entity2cluster397:
        FN += 1

precision = TP / (TP + FP) if (TP + FP) > 0 else 0
recall = TP / (TP + FN) if (TP + FN) > 0 else 0
f_measure = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F-measure: {f_measure}")

Precision: 0.7
Recall: 1.0
F-measure: 0.8235294117647058
