Skip to content

WladimirSidorenko/PotTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

The MIT License

This directory contains the data of the Potsdam Twitter Sentiment Corpus (ISLRN 714-621-985-491-3). To open the files of this corpus, you need to download and launch MMAX2—a freely distributed annotation tool—and then select one of the *.mmax projects from the directories corpus/annotator-1/ or corpus/annotator-2/.

Folder Structure

The folders of this project are structured as follows:

  • corpus/ – directory containing corpus files;

    • annotator1/ – directory containing MMAX projects for the first annotator;
      • markables/ – directory containing annotation files for the first annotator;
    • annotator2/ – directory containing MMAX projects for the second annotator;
      • markables/ – directory containing annotation files for the second annotator;
    • basedata/ and source/ – original corpus tokenization;
    • custom/, scheme/, and style/ – auxiliary MMAX2 data;
  • docs/ – directory containing annotation guidelines and other accompanying documents;

  • scripts/ – directory containing scripts that were used to process corpus data;

    • examples/ – directory containing examples of input files for the scripts;
    • align.py – auxiliary module used for annotation alignment;
    • alt_fio.py – auxiliary module for AWK-like input/output operations;
    • conll.py – auxiliary module for handling CONLL sentences;
    • measure_corpus_agreement.py – script for measuring corpus agreement;
    • merge_conll_mmax.py – script for aligning annotation from the corpus with the automatically processed CONLL data;

You can see the examples of invocations in the script files or by just typing --help to see their usage.

Note

I strongly recommend using the annotation of annotator-2 on the branch eexpression-revision (run git checkout eexpression-revision after cloning this project).