Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Clone this wiki locally
Start using CrowdTruth right now, completely free, and explore all its possiblities. Follow the installation guide for Windows or Linux, learn how to get started using CrowdTruth or checkout our crowdsourcing templates. Please clone the repository and contribute to the open-source framework.
The CrowdTruth Framework implements an approach to machine-human computing for collecting annotation data on text, images, sounds and videos. The approach is focussed specifically on collecting gold standard data for training and evaluation of cognitive computing systems. The original framework was inspired by the IBM Watson project for providing improved (multi-perspective) gold standard (medical) text annotation data for the training and evaluation of various IBM Watson components, such as Medical Relation Extraction, Medical Factor Extraction and Question-Answer passage alignment.
The CrowdTruth framework supports the composition of CrowdTruth gathering workﬂows, where a sequence of micro-annotation tasks can be configured and sent out to a number of crowdsourcing platforms (e.g. CrowdFlower and Amazon Mechanical Turk) and applications (e.g. Expert annotation game Dr. Detective). The CrowdTruth framework has a special focus on micro-tasks for knowledge extraction in medical text (e.g. medical documents, from various sources such as Wikipedia articles or patient case reports). The main steps involved in the CrowdTruth workﬂow are:
- Exploring & processing of input data
- Collecting of annotation data
- Applying disagreement analytics on the results
These steps are realised in an automatic end-to-end workﬂow, that can support a continuous collection of high quality gold standard data with feedback loop to all steps of the process. Have a look at our presentations and papers for more details on the research.