Fetching contributors…
Cannot retrieve contributors at this time
128 lines (90 sloc) 10.8 KB

Getting Started with CrowdTruth

The CrowdTruth framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The central part of the framework is the collection of CrowdTruth metrics that capture and interpret inter-annotator disagreement in crowdsourcing. The CrowdTruth metrics model the inter-dependency between the three main components of a crowdsourcing system -- workers, input data, and annotations. The goal of the metrics is to capture the degree of ambiguity in each of these three components.

This document shows how to get started using the CrowdTruth Python package to process data collected from crowdsourcing microtasks. A detailed description of the CrowdTruth metrics is available in this paper. You can follow the full CrowdTruth Tutorial to learn and practice the specifics of CrowdTruth approach. Other useful resources are:

If you use this software in your research, please consider citing:

  author    = {Anca Dumitrache and Oana Inel and Lora Aroyo and Benjamin Timmermans and Chris Welty},
  title     = {CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement},
  year      = {2018},
  url       = {},


To install the stable version from PyPI, install pip for your OS, then install package using:

pip install crowdtruth

To install the latest version from source, download the library and install it using:

python install

How to run

After installing the CrowdTruth package, you can run the metrics on your own crowdsourced data. We currently support automated processing of files generated by Amazon Mechanical Turk and Figure Eight. It is also possible to define your own custom file format.

1. Define the configuration

The pre-processing configuration defines how to interpret the raw crowdsourcing input. To do this, we need to define a configuration class.

import crowdtruth
from crowdtruth.configuration import DefaultConfig

class TestConfig(DefaultConfig):

Our test class inherits the default configuration DefaultConfig. The following attributes can be used to customize the configuration to the task:

  • inputColumns: list of input columns from the .csv file with the input data
  • outputColumns: list of output columns from the .csv file with the answers from the workers
  • customPlatformColumns: a list of columns from the .csv file that defines a standard annotation tasks, in the following order - judgment id, unit id, worker id, started time, submitted time. This variable is used for input files that do not come from AMT or FigureEight (formarly known as CrowdFlower).
  • csv_file_separator: string that separates between the columns in the file, default value is ,
  • annotation_separator: string that separates between the crowd annotations (the columns defined in outputColumns), default value is ,
  • none_token: string corresponding to the name of the annotation vector component that counts how many workers picked no answer for a given unit; set to NONE by default
  • remove_empty_rows: boolean variable controlling whether to remove empty judgments from the data, or to replace them with none_token; default value is True
  • open_ended_task: boolean variable defining whether the task is open-ended (i.e. the possible crowd annotations are not known beforehand, like in the case of free text input) or not (i.e. the crowd picks from a pre-selected list of annotations)
  • annotation_vector: list of possible crowd answers, obligatory when open_ended_task is False
  • processJudgments: method that defines additional processing of the raw crowd data

2. Pre-process the data

After declaring the configuration of our input file, we are ready to pre-process the crowd data:

data, config = crowdtruth.load(
    file = ...,
    config = TestConfig()

To process all of the files in one folder with the same pre-defined configuration, replace the file attribute of crowdtruth.load with directory.

3. Calculate the metrics

The pre-processed data can then be used to calculate the CrowdTruth metrics:

results =, config)

The method returns a dictionary object with the following keys:

  • units: quality metrics for the input units
  • workers: quality metrics for the workers
  • annotations: quality metrics for the crowd annotations

Example tasks

Below you can find a collection of Jupyter Notebooks that show how to use the CrowdTruth package on different types of crowdsourcing tasks. Check also the tutorial slidecks for more explanations of the task design slides & how to run the CrowdTruth metrics slides in the python notebooks:

Closed Tasks: the crowd picks from a set of annotations that is known beforehand

Open-Ended Tasks: the crowd dynamically creates the list of annotations, or the set of annotations is too big to compute beforehand

An example of a Jupyter Notebook that shows how to use the CrowdTruth package with a custom platform input file can be seen below:

Multiple choice tasks: the crowd picks multiple annotation out of a set list of choices that are the same for every input unit