# Consensus
Consensus is support for:  
- Classification (Label IoU)
- Box (IoU)
- Polygon (IoU)
- Semantic Segmentation (IoU)
- Point (distance scoring)

## IoU

<p align="center">
  <img src="https://storage.googleapis.com/kaggle-media/competitions/rsna/IoU.jpg" width="350" title="IoU">
</p>

Let's begin with some necessary imports:

In [1]:
import dtlpy as dl
from dtlpy.ml import metrics, predictions_utils
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

## Classification Consensus and Majority Vote

To get the item IoU score, we first calculate each annotator's annotation set against all the other annotation sets. Here, we'll have 5 annotators annotating the same item for example. 

But first we'll use the dtlpy SDK to get all items and all annotations:

In [4]:
first_item = dl.items.get(item_id='6215d3f73750a54742c4d33d')
second_item = dl.items.get(item_id='6215d3fee2b78c63e4ae501f')
third_item = dl.items.get(item_id='6215d4053750a5fe47c4d343')
fourth_item = dl.items.get(item_id='6215d40ce2b78c3b85ae5022')
fifth_item = dl.items.get(item_id='6215d415e2b78c36b7ae5028')

first_annotations = first_item.annotations.list()
second_annotations = second_item.annotations.list()
third_annotations = third_item.annotations.list()
fourth_annotations = fourth_item.annotations.list()
fifth_annotations = fifth_item.annotations.list()

[2024-09-17 11:49:59][ERR][dtlpy:v1.95.6][services.api_client:1398] [Response <404>][Reason: Not Found][Text: {"status":404,"message":"item not found while resolving dataset"}]


NotFound: ('404', 'item not found while resolving dataset')

Let's see which labels each annotator tagged:

In [3]:
print(f'{first_item.name} with list of annotations: {[annotation.label for annotation in first_annotations]}')
print(f'{second_item.name} with list of annotations: {[annotation.label for annotation in second_annotations]}')
print(f'{third_item.name} with list of annotations: {[annotation.label for annotation in third_annotations]}')
print(f'{fourth_item.name} with list of annotations: {[annotation.label for annotation in fourth_annotations]}')
print(f'{fifth_item.name} with list of annotations: {[annotation.label for annotation in fifth_annotations]}')

NameError: name 'first_item' is not defined

So in order to create the annotators' scoring, we'll go over all and to the IoU calculation and save it in a list:

In [None]:
items_list = [first_item, second_item, third_item, fourth_item, fifth_item]
n_annotators = len(items_list)
items_scores = np.zeros((n_annotators, n_annotators))
for i_item in range(n_annotators):
    for j_item in range(n_annotators):
        # note: the results matrix is symmetric so calculation can be done only on one side of the diagonal
        # we do both sides to show that the score is the same: measure_itemx(x, y) == measure_itemx(y, x)
        success, results = predictions_utils.measure_item(items_list[i_item], items_list[j_item] ,ignore_labels=False)
        items_scores[i_item, j_item] = results['total_mean_score']

The returned Result object contains a pandas DataFrame with all matching and scores 

In [None]:
success, results = predictions_utils.measure_item(first_item, second_item, ignore_labels=False)
results[dl.AnnotationType.CLASSIFICATION].to_df()

We'll use the seaborn package to plot the metrix

In [None]:
import seaborn as sns
sns.heatmap(items_scores, 
            annot=True, 
            cmap='Blues',
            xticklabels=['Annotator A','Annotator B','Annotator C', 'Annotator D', 'Annotator E'],
            yticklabels=['Annotator A','Annotator B','Annotator C', 'Annotator D', 'Annotator E'])

Annotator A had ['A', 'B'] and B had ['C', 'B']. The union is ['A', 'B', 'C'], intersection is only ['B'] which should give 33% match as we can see in the metrix.
Counting the appearances of each label give the fllowing:

In [None]:
# count per label
from collections import Counter
all_annotations = [first_annotations, second_annotations,third_annotations,fourth_annotations,fifth_annotations]
all_labels = [annotation.label for annotations in all_annotations for annotation in annotations]
counter = Counter(all_labels)
for label, count in counter.items():
    print('{}: {}'.format(label, count))

And if we want to output all the majority annotations (3 or more annotator gave the same label) we will get 
['B', 'C'] as the output

## Box IoU Matching

Box matching is basiclly the same. We'll get the items and annotations and "show()" the annotation of each item:

In [None]:
first_item = dl.items.get(item_id='6214bc0d3750a50f50c44841')
second_item = dl.items.get(item_id='6214be90fed92a9f043ba217')
first_annotations = first_item.annotations.list()
second_annotations = second_item.annotations.list()

In [None]:
plt.figure()
plt.imshow(first_annotations.show())
plt.title('first')
plt.figure()
plt.imshow(second_annotations.show())
plt.title('second')

Now lets overlay the annotation on top of each other so see the matching:

In [None]:
plt.imshow(first_annotations.show())
plt.imshow(second_annotations.show())

Running the comparison over the two items will give the results Dataframe:

In [None]:
success, results = predictions_utils.measure_item(first_item, second_item,ignore_labels=True)
results[dl.AnnotationType.BOX].to_df()

We used the "ignore_labels=True" flag so the matching ignores the label. This means the yellow and the red anntoations at the top right are a match. If we will run the same functino without the flag we will get the following:

In [None]:
success, results = predictions_utils.measure_item(first_item, second_item, ignore_labels=False)
results[dl.AnnotationType.BOX].to_df()

Now we are getting only two matches (two gressn and two blues) and 3 unmatched annotations (one of the blues, red and yellow)

DEBUG: View the annotaqtion comparison matrix for each two items

In [None]:
results['box'].matches._annotations_raw_df[0]

List of all the matching scores and the mean over the item:

In [None]:
print(results['box'].to_df()['annotation_score'])
print(results['total_mean_score'])

## Polygon and Segmentation

Save as all the above, IoU scoring for two example images and anntoations:

In [None]:
first_item = dl.items.get(item_id='6214d07599cb175c9cd73d8f')
second_item = dl.items.get(item_id='6214d07c9d80b05b8310ba9b')
first_annotations = first_item.annotations.list()
second_annotations = second_item.annotations.list()

In [None]:
plt.figure()
plt.imshow(first_annotations.show())
plt.title('first')
plt.figure()
plt.imshow(second_annotations.show())
plt.title('second')

In [None]:
success, results = predictions_utils.measure_item(first_item, second_item,ignore_labels=True,match_threshold=0)
results[dl.AnnotationType.SEGMENTATION].to_df()

In [None]:
results[dl.AnnotationType.SEGMENTATION].matches._annotations_raw_df[0]

And the total score for this items is:

In [None]:
print(results['total_mean_score'])

## Three Sets Comparison
In order to match across multiple annotators, we are calculating the scoring metrix between all couples of anntoators.
In this example we'll see 3 annotators with bounding box annotations:

In [None]:
first_item = dl.items.get(item_id='6214ea1d0ec695cd9c35dfbd')
second_item = dl.items.get(item_id='6214ea29e2b78c7ca1adc6b7')
third_item = dl.items.get(item_id='6214ea310ec695600635dfc6')
first_annotations = first_item.annotations.list()
second_annotations = second_item.annotations.list()
third_annotations = third_item.annotations.list()

In [None]:
plt.figure()
plt.imshow(first_annotations.show())
plt.title('first')
plt.figure()
plt.imshow(second_annotations.show())
plt.title('second')
plt.figure()
plt.imshow(third_annotations.show())
plt.title('third')

Plotting all three on top of each other with different thickness to differ:

In [None]:
plt.imshow(first_annotations.show(thickness=20))
plt.imshow(second_annotations.show(thickness=10))
plt.imshow(third_annotations.show(thickness=3))

In [None]:
items = [first_item, second_item, third_item]
n_annotators = len(items)
items_scores = np.zeros((n_annotators, n_annotators))
for i_item in range(n_annotators):
    for j_item in range(i_item, n_annotators):
        success, results = predictions_utils.measure_item(items[i_item], items[j_item], ignore_labels=True)
        items_scores[i_item, j_item] = results['total_mean_score']      
        items_scores[j_item, i_item] = results['total_mean_score']

In [None]:
import seaborn as sns
sns.heatmap(items_scores, 
            annot=True, 
            cmap='Blues',
            xticklabels=['Annotator A','Annotator B','Annotator C'],
            yticklabels=['Annotator A','Annotator B','Annotator C'])