## This document contains the functions from the [Experiments notebook](https://github.com/irlabamsterdam/TPDLTextRedaction/blob/main/notebooks/Experiments.ipynb) of the [TPDLTextRedaction repo](https://github.com/irlabamsterdam/TPDLTextRedaction).

In [1]:
# load the functions of the OCR model
%run OCR-functions.ipynb

# Make sure that we have the train and test split

In [2]:
# load the functions to create the train and test split
%run Data-split-functions.ipynb

# create the data, set force_new_split to false so that we can skip 
# this phase if it has been done before and 
# extended to false to skip the 'no_annotation' pages
split_data(TRAIN_SPLIT, False, False)

# An extra function to run the algorithm

We also provide a small function to run the algorithm and plot all steps, together with a complete function to run the algorithm that also contains all the parameters used in the functions so that you can experiment with different things yourself.

In [3]:
def run_algorithm(input_image_path: str,
                  text_pre_closing_kernel_size: tuple = (2, 2),
                  text_pre_guassian_blur_size: tuple = (3, 3),
                  box_pre_horizontal_closing_size: tuple = (1, 3),
                  box_pre_vertical_closing_size: tuple = (3, 1),
                  box_pre_bilat_filter_size: int = 5,
                  box_pre_filter_sigma_color: int = 75,
                  box_pre_filter_sigma_space: int = 75,
                  tesseract_confidence: int = 65,
                  contour_opening_kernel_size: tuple = (5, 5)):
    """
    This functions implements the complete redaction detection algorithm and contains the options
    to set the parameters used as to experiment with different settings.
    :param input_image_path: string specifying the path to the input image
    :param text_pre_closing_kernel_size: size of the closing kernel for the text preprocessing step
    :param text_pre_guassian_blur_size: size of the kernel for the Gaussian blur for the text
    preprocessing step
    :param box_pre_horizontal_closing_size: size of the horizontal closing operation for the redaction 
    box preprocessing step
    :param box_pre_vertical_closing_size:size of the vertical closing operation for the redaction 
    box preprocessing step
    :param box_pre_bilat_filter_size: Size of the bilateral filter kernel for the redaction box
    preprocssing step.
    :param box_pre_filter_sigma_color: color sigma ofr the bilateral filter of the redaction box
    preprocessing step
    :param box_pre_filter_sigma_space: space sigma ofr the bilateral filter of the redaction box
    preprocessing ste
    :param tesseract_confidence: integer specifying the confidence level for Tesseract to 
    consider something to be text
    :param contour_opening_kernel_size: kernel size of the opening operation in the contour detection step.
    """
    
    input_image = load_image(input_image_path)
    # Do the preprocessing
    image_text_pre = text_preprocessing(input_image, text_pre_closing_kernel_size)
    
    image_box_pre = redaction_box_preprocessing(input_image, 
                                                box_pre_horizontal_closing_size,
                                                box_pre_vertical_closing_size,
                                                box_pre_bilat_filter_size,
                                                box_pre_filter_sigma_color,
                                                box_pre_filter_sigma_space)
    # Remove the text
    image_without_text, total_words_area, text_boundaries = remove_text(image_text_pre, image_box_pre,
                                                                       tesseract_confidence)
    # First contour detection step
    image_with_contours, contours = determine_contours(image_without_text, contour_opening_kernel_size)
    # final contouring filtering step
    final_image_with_contours, final_contour_image, final_contours, total_contour_area, total_text_area  = filter_contours(input_image, contours, text_boundaries)
    
    # Automatically calculate some statistics on the number of redacted boxes, and the total percentage of 
    # the page that is redacted.
    # Check how much of the text area is redacted (%)
    percentage_redacted_textarea = ((total_contour_area / total_text_area) * 100) if total_contour_area and total_text_area else 0

    # Check how much of character area is redacted (%)
    total_area = total_contour_area + total_words_area
    percentage_redacted_words = ((total_contour_area / total_area) * 100) if total_contour_area else 0
    num_of_redacted_regions = len(final_contours)
    
    return final_contours, percentage_redacted_words, num_of_redacted_regions

## Measuring model performance

In this part of the notebook we will run the scores on the testsets and evaluate using PQ. For this, we will be using the `annotations.json` file, which contains the annotations of the redacted text contours of the images. After this we will also time the different parts of the algorithm so that we get a clear understanding of what parts of the algorithms are most time consuming.

In [4]:
import json
def read_json(file_name):
    with open(file_name, 'r') as json_file:
        return json.load(json_file)

In [5]:
# Load in the files containing all the gold standard regions 
gold_standard = read_json(os.path.join('..', 'datasets', 'gold_standard_complete.json'))

In [6]:
# rewrite this to just work with only the output polygons
def evaluate_detection(input_image_filename):
    gold_standard_contours = gold_standard[os.path.split(input_image_filename)[-1]]
    predicted_contours, _, _ = run_algorithm(input_image_filename)
    
    # Get the attributes of all the boxes in the gold standard annotation
    polygons = [r['shape_attributes'] for r in gold_standard_contours['regions']]

    # set the total sum of the IOU
    sum_IoU = 0
    TP = []
    FN = []
    FP = []

    predicted_polygons = [Polygon(np.squeeze(contour)) for contour in predicted_contours]
    ground_truth_polygons = []
    
    for polygon in polygons:
        if polygon['name'] == 'rect':
            bottom_left = [polygon['x'], polygon['y']]
            bottom_right = [polygon['x']+polygon['width'], polygon['y']]
            top_right = [polygon['x']+polygon['width'], polygon['y']+polygon['height']]
            top_left = [polygon['x'], polygon['y']+polygon['height']]
            
            gold_standard_polygon_xy = [bottom_left, bottom_right, top_right, top_left]
        else:
            # If not a rectangle we have a more complex shape and we just add all points to it
            gold_standard_polygon_xy  = [[polygon['all_points_x'][i], polygon['all_points_y'][i]] for i in range(0, len(polygon['all_points_x']))]

        ground_truth_polygons.append(Polygon(gold_standard_polygon_xy))
        
    # Loop over all of the predicted and ground truth polygons and check if there
    # is enough overlap to be a True Positive
    for predicted_polygon in predicted_polygons:
        for ground_truth_polygon in ground_truth_polygons:
            
            # make sure that the polygone is not self-intersecting
            if not ground_truth_polygon.is_valid:
                ground_truth_polygon = ground_truth_polygon.buffer(0)
                
            polygon_intersection = predicted_polygon.intersection(ground_truth_polygon).area
            polygon_union = predicted_polygon.area + ground_truth_polygon.area - polygon_intersection
        
            if (polygon_intersection / polygon_union) >= 0.5:
                sum_IoU += (polygon_intersection / polygon_union)
                TP.append([ground_truth_polygon, predicted_polygon])

    # Calculate false positives, false negatives and true positives and the PQ
    # score.
    FP = [polygon for polygon in predicted_polygons if polygon not in [item[1] for item in TP]]
    FN = [polygon for polygon in ground_truth_polygons if polygon not in [item[0] for item in TP]]
    
    return {'TP': len(TP), 'FP': len(FP), 'FN': len(FN), 'IOU': sum_IoU}

## Evaluation Using PQ

To evaluate on all the images, we have to load in the types of redacted text (which are also present in the json file). We then count the TP, FN, and FP for all images, and use this to calculate the final scores. We will work with dataframes here, as this is the easiest way to assign labels to the different pages. For the definition of the PQ metric, please see the original paper from Kirilov et al.

In [7]:
data_csv = pd.read_csv(os.path.join('..', 'datasets', 'data_complete.csv')).set_index('File')
test_dir = os.path.join('..', 'datasets', 'test')

In [8]:
def evaluate_dataframe(images_dir):
    all_scores = {}
    for filename in tqdm(os.listdir(images_dir)):
        path = os.path.join(images_dir, filename)
        image_scores = evaluate_detection(path)
        all_scores[filename] = image_scores
    return all_scores

In [9]:
all_scores = evaluate_dataframe(test_dir)

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
100%|██████████| 284/284 [06:49<00:00,  1.44s/it]


In [10]:
results = pd.DataFrame(all_scores).T
#add the scores and statistic to the dataframe
complete_dataframe = data_csv.join(results, how='right')

In [11]:
# save the dataframe
complete_dataframe.to_csv(os.path.join('results', 'ocr_results.csv'))

In [12]:
def metric_calculation(dataframe):
    '''
    The metric calculations as done in https://github.com/irlabamsterdam/TPDLTextRedaction/blob/main/notebooks/Experiments.ipynb
    @param  pd.DataFrame    The dataframe for one class with the following columns { IOU, TP, FN, FP }
                            where the IOU is the sum of IOU scores and the others a total count.
    @return dict            The metric scores for this class
    '''
    
    SQ = dataframe['IOU'].sum() / dataframe['TP'].sum()
    RQ = dataframe['TP'].sum() / (dataframe['TP'].sum() + 0.5*dataframe['FN'].sum() + 0.5*dataframe['FP'].sum())
    PQ = SQ*RQ
    P = dataframe['TP'].sum() / (dataframe['TP'].sum() + dataframe['FP'].sum())
    R = dataframe['TP'].sum() / (dataframe['TP'].sum() + dataframe['FN'].sum())
    
    number_of_segments =  dataframe['TP'] + dataframe['FN']
    dataframe['num_segments'] = number_of_segments
    return round(SQ, 2), round(RQ, 2) , round(PQ, 2), round(P, 2), round(R, 2), dataframe

In [13]:
# Make the results table, where we group based on redaction type.
type_results_table = {}
for label, df in complete_dataframe.groupby('type'):
    
    # skip the 'no_annotations' type
    if label == 'no_annotation': continue
        
    SQ, RQ, PQ, P, R, dataframe = metric_calculation(df)
    redacted_type = df['type'].tolist()[0]
    type_results_table[redacted_type] = {'PQ': PQ, 'SQ': SQ, 'RQ': RQ, 'P': P, 'R': R}

SQ, RQ, PQ, P, R, dataframe = metric_calculation(complete_dataframe)
type_results_table['total'] = {'PQ': PQ, 'SQ': SQ, 'RQ': RQ, 'P': P, 'R': R}

In [14]:
# show the PQ results
pd.DataFrame(type_results_table).T

Unnamed: 0,PQ,SQ,RQ,P,R
black,0.79,0.85,0.93,0.94,0.92
border,0.52,0.88,0.59,0.83,0.45
color,0.75,0.87,0.86,0.84,0.88
gray,0.59,0.84,0.71,0.78,0.65
total,0.68,0.86,0.79,0.87,0.73


## Time the model

In [15]:
def time_algorithm(input_image_path):
    '''
    Time the image loading and model prediction
    @param  string    The path to the image
    @return dict      The times of the individual parts and total time
    '''
    
    load_start = time.time()
    input_image = load_image(input_image_path)
    load_end = time.time()
    # Do the preprocessing
    pre_start = time.time()
    image_text_pre, image_box_pre = text_preprocessing(input_image), redaction_box_preprocessing(input_image)
    pre_end = time.time()
    # Remove the text
    remove_start = time.time()
    image_without_text, _, text_boundaries = remove_text(image_text_pre, image_box_pre)
    remove_end = time.time()
    # First contour detection step
    contour_det_start = time.time()
    image_with_contours, contours = determine_contours(image_without_text)
    contour_det_end = time.time()
    # final contouring filtering step
    final_step_start = time.time()
    final_image_with_contours, final_contour_image, final_contours, _, total_text_area  = filter_contours(input_image, contours, text_boundaries)
    final_step_end = time.time()
    
    times = {'loading': load_end-load_start,
            'preprocessing': pre_end-pre_start,
           'text_removal': remove_end-remove_start,
           'contour detection': contour_det_end-contour_det_start,
           'final filtering': final_step_end-final_step_start}
    
    times['total'] = sum(times.values())
    
    return times

In [17]:
# do this over all the images and average
load_times = []
preprocessing_times = []
text_removal_times = []
contour_detection_times = []
final_filtering_times = []
total_times = []

# time the model for all test images
for filename in tqdm(os.listdir(test_dir)):
    image_path = os.path.join(test_dir, filename)
    times = time_algorithm(image_path)
    load_times.append(times['loading'])
    preprocessing_times.append(times['preprocessing'])
    text_removal_times.append(times['text_removal'])
    contour_detection_times.append(times['contour detection'])
    final_filtering_times.append(times['final filtering'])
    total_times.append(times['total'])

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
100%|██████████| 284/284 [06:37<00:00,  1.40s/it]


In [18]:
# print the average times
print("Average loading time is %.3f seconds" % np.mean(load_times))
print("Average preprocessing time is %.3f seconds" % np.mean(preprocessing_times))
print("Average text removal time is %.3f seconds" % np.mean(text_removal_times))
print("Average contour detection time is %.3f seconds" % np.mean(contour_detection_times))
print("Average final filtering time is %.3f seconds" % np.mean(final_filtering_times))
average_predicting_time = np.mean(preprocessing_times) + np.mean(text_removal_times) + np.mean(contour_detection_times) + np.mean(final_filtering_times)
print("Average predicting time is %.3f seconds" % average_predicting_time)
print("Average total time is %.3f seconds" % np.mean(total_times))

Average loading time is 0.064 seconds
Average preprocessing time is 0.047 seconds
Average text removal time is 1.258 seconds
Average contour detection time is 0.020 seconds
Average final filtering time is 0.006 seconds
Average predicting time is 1.330 seconds
Average total time is 1.395 seconds
