The MIT License

This directory contains the data of the Potsdam Twitter Sentiment Corpus (PotTS, ISLRN 714-621-985-491-3). To open the files of this corpus, you need to download and launch MMAX2 (a freely distributed annotation tool) and then select one of the *.mmax projects from the directories corpus/annotator-1/ or corpus/annotator-2/.

This folder comprises the following files and directories:

  • corpus/ -- directory containing corpus files;

    • annotator1/ -- directory containing MMAX projects for the first annotator;
      • markables/ -- directory containing annotation files for the first annotator;
    • annotator2/ -- directory containing MMAX projects for the second annotator;
      • markables/ directory containing annotation files for the second annotator;
    • basedata/ and source/ -- original corpus tokenization;
    • custom/, scheme/, and style/ -- auxiliary MMAX2 data;
  • docs/ -- directory containing annotation guidelines and other accompanying documents;

  • scripts/ -- directory containing scripts that were used to process corpus data;

    • examples/ -- directory containing examples of input files for the scripts;
    • -- auxiliary module used for annotation alignment;
    • -- auxiliary module for AWK-like input/output operations;
    • -- auxiliary module for handling CONLL sentences;
    • -- script for measuring corpus agreement;
    • -- script for aligning annotation from the corpus with the automatically processed CONLL data;

You can see the examples of invocations in the script files or by just typing --help to see their usage.


We strongly recommend using the annotation of annotator-2 on the branch eexpression-revision (run git checkout eexpression-revision after cloning this project).