Permalink
Fetching contributors…
Cannot retrieve contributors at this time
61 lines (46 sloc) 8.28 KB

Session 3: Data Processing & CrowdTruth Metrics

Session Summary

In this session we discussed about optimal ways for representing and analyzing crowdsourcing results by applying the CrowdTruth metrics. We have prepared of collection of Jupyter Notebooks (also available as Colab notebooks that can be run from a Google Drive account) that illustrate how to run the metrics on the tasks discussed in Session 2:

Closed Tasks: the crowd picks from a set of annotations that is known beforehand

Open-Ended Tasks: the crowd dynamically creates the list of annotations, or the set of annotations is too big to compute beforehand

Session Excercises

  1. Install the CrowdTruth package & follow the How to run guide in order to get started.

  2. Explore (some of) the notebooks above that implement CrowdTruth for different annotation tasks.

  3. Compare the results of the CrowdTruth metrics when the same tasks is processed with a closed vs. open-ended annotation vector, by referring to the trade-off between the degree of expressivity in crowd annotations and potential for ambiguity and disagreement. The following notebook can be used as an example:

  1. Dimensionality reduction techniques are useful to reduce some of the noise in crowd annotations, particularly for open-ended tasks as they produce very diverse labels. These techniques can be applied to both input units and annotations. Compare the results of the CrowdTruth metrics for an annotation task before & after dimensionality reduction in the following crowd tasks:
  1. Implement the annotation vector you designed in Session 2 as a CrowdTruth pre-processing configuration.