Skip to content

UKPLab/argmin2016-unshared-task

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

Unshared Task for the 3rd Workshop on Argument Mining, ACL 2016, Berlin

Contact person: Ivan Habernal, habernal@ukp.informatik.tu-darmstadt.de

http://www.ukp.tu-darmstadt.de/

http://www.tu-darmstadt.de/

Content

We provide four variants of the task across various registers. Each variant is split into three parts:

  • Development set: We encourage your to perform the exploratory analysis, task definition, annotation experiments, etc. on this set.
  • Test set: This small set might serve as a benchmark for testing your annotation model (or even a computer model, if you go that far) and reporting agreement measures (if applicable).
  • Crowdsourcing set: a bit larger set if you plan any crowdsourcing experiments

Note that these various splits are only a recommendation and not obligatory, you are absolutely free to use the entire dataset if your task requires so.

Variants

Variant A: Debate portals

  • Samples from two-sided debate portal createdebate.com
    • Posts are identified by #id and are delimited by two empty lines
    • There are four types of posts: normal post, dispute of previous post, support of previous post, and clarification of previous post
    • Empty line is a paragraph break
  • 8 devel files
  • 2 test files
  • 18 crowdsourcing files

Variant B: Debate transcript

  • Two speeches (opening and closing) for both the proponent and the opponent from Intelligence squared debates
    • The entire debate has more participants, but the entire transcript would be extremely long for the purposes of the unshared task
  • 3 devel files
  • 2 test files
  • 5 crowdsourcing files

Variant C: Opinionated newswire article

  • Editorial articles from Room for debate from N.Y.Times
    • Each article has a debate title and debate description
  • 8 devel files
  • 2 test files
  • 12 crowdsourcing files

Variant D: Discussion under opinionated articles

  • Discussions from Room for debate
    • The IDs of the debates correspond to IDs of the articles from Variant C (for instance, Dd001.txt has a corresponding article Cd001.txt)
    • Each discussion starts with a debate title, debate description, and title of the corresponding article
  • 8 devel files
  • 2 test files
  • 12 crowdsourcing files

Data

Data are stored in plain text format (UTF-8 encoding). The name of each file consists of the variant name, the sub-set, and the number in the subset, for example Bd002.txt is a B category file from the d development set with number 2.

License

See LICENSE.txt or README.txt in the particular folders.

Reproducibility

data/links.txt contains the full list of URLs and their corresponding files. We used selenium-firefox-driver and JSoup for scraping the content of Room for Debate. See de.tudarmstadt.ukp.argumentation.data.roomfordebate.DataFetcher for details. Each file was also formatted to fit into 80 characters long lines, see src/bash/runFoldOnAllFiles.sh. Variant C and D were created semi-automatically, Variant A and B manually.

About

Supplementary data for the Unshared Task at the 3rd Argument Mining workshop, ACL 2016

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published