Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Unshared Task for the 3rd Workshop on Argument Mining, ACL 2016, Berlin

Contact person: Ivan Habernal,


We provide four variants of the task across various registers. Each variant is split into three parts:

  • Development set: We encourage your to perform the exploratory analysis, task definition, annotation experiments, etc. on this set.
  • Test set: This small set might serve as a benchmark for testing your annotation model (or even a computer model, if you go that far) and reporting agreement measures (if applicable).
  • Crowdsourcing set: a bit larger set if you plan any crowdsourcing experiments

Note that these various splits are only a recommendation and not obligatory, you are absolutely free to use the entire dataset if your task requires so.


Variant A: Debate portals

  • Samples from two-sided debate portal
    • Posts are identified by #id and are delimited by two empty lines
    • There are four types of posts: normal post, dispute of previous post, support of previous post, and clarification of previous post
    • Empty line is a paragraph break
  • 8 devel files
  • 2 test files
  • 18 crowdsourcing files

Variant B: Debate transcript

  • Two speeches (opening and closing) for both the proponent and the opponent from Intelligence squared debates
    • The entire debate has more participants, but the entire transcript would be extremely long for the purposes of the unshared task
  • 3 devel files
  • 2 test files
  • 5 crowdsourcing files

Variant C: Opinionated newswire article

  • Editorial articles from Room for debate from N.Y.Times
    • Each article has a debate title and debate description
  • 8 devel files
  • 2 test files
  • 12 crowdsourcing files

Variant D: Discussion under opinionated articles

  • Discussions from Room for debate
    • The IDs of the debates correspond to IDs of the articles from Variant C (for instance, Dd001.txt has a corresponding article Cd001.txt)
    • Each discussion starts with a debate title, debate description, and title of the corresponding article
  • 8 devel files
  • 2 test files
  • 12 crowdsourcing files


Data are stored in plain text format (UTF-8 encoding). The name of each file consists of the variant name, the sub-set, and the number in the subset, for example Bd002.txt is a B category file from the d development set with number 2.


See LICENSE.txt or README.txt in the particular folders.


data/links.txt contains the full list of URLs and their corresponding files. We used selenium-firefox-driver and JSoup for scraping the content of Room for Debate. See for details. Each file was also formatted to fit into 80 characters long lines, see src/bash/ Variant C and D were created semi-automatically, Variant A and B manually.


Supplementary data for the Unshared Task at the 3rd Argument Mining workshop, ACL 2016






No releases published


No packages published