## Tutorial 4: Evaluation

This brief tutorial explains Holoclean's built-in performance evaluation. At this time, it is a simple process. We input a 'ground truth' dataset with all of the correct values for each cell. This ground truth needs to be created for each dataset that you wish to evaluate Holoclean on. 

Holoclean then compares its inferred values to the correct values in the ground truth and outputs simple measures of precision and recall. Let's see an example.

Here's the code again for running Holoclean on the hospital dataset. See our [Complete Pipeline Tutorial](Tutorial_2.ipynb) if you haven't already!

In [1]:
import warnings; warnings.simplefilter('ignore')

from holoclean.holoclean import HoloClean, Session
from holoclean.errordetection.sql_dcerrordetector import SqlDCErrorDetection

holo = HoloClean(holoclean_path="..")
session = Session(holo)

data_path = "data/hospital.csv"
data = session.load_data(data_path)

dc_path = "data/hospital_constraints.txt"
dcs = session.load_denial_constraints(dc_path)


detector = SqlDCErrorDetection(session)

error_detector_list =[]
error_detector_list.append(detector)
clean, dirty = session.detect_errors(error_detector_list)

repaired = session.repair()

100%|██████████| 20/20 [00:10<00:00,  1.94it/s]


And the evaluation as described above

In [2]:
session.compare_to_truth("data/hospital_clean.csv")

We have detected 185 errors out of 509 total
The top-1 precision  is : 0.999
The top-1 recall is : 0.978 over the 185 errors found during error detection


We can also perform a slightly more sophisticated version of the current evaluation. Instead of checking only Holoclean's most likely value, we can have it check if any of the k most likely values are the true value.

This change is done simply by specifying a parameter when initializing Holoclean

In [3]:
holo.spark_session.stop()

holo_2 = HoloClean(holoclean_path="..", k_inferred = 5)
session = Session(holo_2)

data = session.load_data(data_path)
dcs = session.load_denial_constraints(dc_path)

detector = SqlDCErrorDetection(session)

error_detector_list =[]
error_detector_list.append(detector)
clean, dirty = session.detect_errors(error_detector_list)

repaired = session.repair()

session.compare_to_truth("data/hospital_clean.csv")

100%|██████████| 20/20 [00:10<00:00,  1.89it/s]


We have detected 185 errors out of 509 total
The top-5 precision  is : 1.000
The top-5 recall is : 1.000 over the 185 errors found during error detection
The  MAP precision  is : 0.999
The MAP recall is : 0.978 over the 185 errors found during error detection


We're working on more sophisticated evaluation techniques, including an interactive evaluator that enables users to self-examine a sample of the inferred values, so stay tuned!