# Tutorial: scoring for quality tasks


This tutorial will describe the flow for calculating scores for quality tasks and walk you through the creation of each one. Here, quality tasks are defined as any of the following (to understand more about each of these tasks, refer to the main Dataloop documentation linked):

1. [Qualification tasks](https://dataloop.ai/docs/qualification-honeypot)
2. [Honeypot tasks](https://dataloop.ai/docs/qualification-honeypot#honeypot)
3. [Consensus tasks](https://dataloop.ai/docs/consensus)


In general, an annotator will receive an assignment to complete their annotation task. For a given item in a consensus task, each assignment will be cross-compared with every other assignment. In the case of qualification and honeypot tasks, each item will only have one assignment associated with it. 


### _Behind the scenes_

Task scores will be determined by all items in the task. To calculate an item score, the following steps will be taken:
- for each item, all annotations will be collected by assignment
- for qualification and honeypot tasks, all annotations *not* associated with the assignment will be considered the ground truth
    - we assume there is only *one* assignment for qualification and honeypot tasks
- for consensus tasks, there are no ground truth annotations because we are comparing each assignee to every other assignee
    - during comparison, each set of assignee annotations will be compared twice, one as a reference set and once as the test set


The flow for the functions and entities used to calculate the score are illustrated here:
![Flow chart for scoring](scoring_flow.jpg)



## Annotation scores created

During scoring, these scores will be created for each annotation:
- label score
- geometry score
- attribute score
- overall annotation score


1. __Label score__ is a score of annotation label agreement that is either 0 or 1, with the default being 1 (i.e. in the case that two annotations show there is no label, it will be considered "agreed"). 
2. __Geometry score__ is the annotations' geometric overlap (such as IOU), while ignoring labels. Bounding boxes with high overlap and label disagreement will be scored.
3. __Attribute score__ is the score for attribute agreement (score of 0 or 1), with a default of NA, depending on whether attributes exist in the dataset recipe.
4. __Overall annotation__ score is the average of all scores associated with this annotation. 


Additional scores that will be added for each item:

1. __Label confusion__ is the count (per item) for a given reference classification label, how many were classified as the test set classification label (reference label is the `entityID`,  and the test label is `relative` context). For example, if 10 annotations in the reference item had the label "cat" and in the test set 8 annotations were matched with the label "cat", 2 annotation with the label "dog", the item's label confusion score would have the following format:
```
label_confusion_score = Score(type=ScoreType.LABEL_CONFUSION,
                                   value=row['counts'],
                                   entity_id=row['first_label'],
                                   relative=row['second_label'])
```

2. __Item overall__ is the average of all annotation scores associated with this item (note, this is *not* an average of the overall annotation score for each annotation). 


To understand these scores, we will create a dataset and quality task to demonstrate the use of each of these functions.


## Qualification task




## Honeypot task




## Consensus task



## Snacks dataset example, confusion matrix

In this example, we are using a classification dataset that has three labels: ice cream, popsicle, and pizza. 

