Skip to content
This repository has been archived by the owner on Oct 26, 2020. It is now read-only.

Latest commit

 

History

History
125 lines (84 loc) · 5.1 KB

README.md

File metadata and controls

125 lines (84 loc) · 5.1 KB

conj_head: Head Identification in Coordinating Conjunctions

Contents

  1. Documentation
  2. Included Files
  3. Using This Module
  4. RunTime Details
  5. Results
  6. References

Documentation

The problem statement, the different approaches to the problem, and even more discussion can be read in Chapter 5 of the thesis document.

Included Files

  1. Annotations directory

    Contains the manually annotated data for the experiment. The annotations were used to present the results in documentation. More about the annotations can be read in this document.

  2. Extra Data (Not Annotated) directory

    Contains some results that are not annotated but are useful for evaluation of the algorithm. Unless needed to extend the algorithm, not needed by an end-user. More details about the files in this category can be read in the documentation for Annotations folder here and here.

  3. makefile

    File that can be used to run the module on the local system. Check here for details.

  4. README.md

    The current file

  5. requirements.txt

    File containing the necessary packages needed to run this project. Install them by cloning the repository to your system, and then type in terminal:

    pip install -r requirements.txt

    You might want to use pip3 instead of pip, depending on your system.

  6. scripts directory

    Contains the main script that runs the module. Needs to be copied into the correct udapy folder location (see makefile for details), before it can be used.

  7. RunTimes directory

    Contains the files containing the time taken to run the block for Afrikaans and Arabic data. The data is organised into 2 lines per run, for a total of 100 runs. Format:

    Time when block starts processing
    Time when block stops processing and next block takes over
    

Using This Module

To start with the module, clone this repository in your system, and then run the commands in the given order:

make getdata

Downloads the required dependencies using requirements.txt file, UDv2.4 data using the link here and then prepares working copies of the treebanks in the current directory.

make stats

Report all the instances of mis-directed dependencies of CCONJ UPOS and cc deprel in *.stats file, across all treebanks in UDv2.4. Requires UDv2.4 data in HOME folder, but downloads and unzips the package if not found.

make correction

Runs the script by first copying it into correct location. Generates the corrected Corrects the instances detected in *.direction file, creates a *2.conllu file with all the corrections, and *2.direction file with the instances not handled by the algorithm.

make clean

Removes all .conllu and the files generated by this makefile.

RunTime

RunTime based on 100 runs, as run on Ubuntu 18.04 (64-bit) on a 4-core Intel i5-6300 HQ processor.

Language RunTime (in ms)
af 81.33 ± 7.094
ar 317.05 ± 23.996

Results

For a through analysis of the results of the experiment, it is advised to read the documentation, for it contains thorough investigation of the results, with examples.

References

  1. Chiara Alzetta, Felice Dell’Orletta, Simonetta Montemagni, and Giulia Venturi. Dangerous relations in dependency treebanks. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories, pages 201–210, Prague, Czech Republic, 2017. URL https://www.aclweb.org/anthology/W17-7624.

  2. Martin Popel, Zdeněk Žabokrtský, and Martin Vojtek. Udapi: Universal API for Universal Dependencies. In Proceedings of the NoDaLiDa 2017 Workshop onUniversal Dependencies (UDW 2017), pages 96–101, Gothenburg, Sweden, May 2017. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W17-0412.

  3. Leon Stassen. And-languages and WITH-languages. Linguistic Typology, 4(1):1–54, 2000. doi: https://doi.org/10.1515/lity.2000.4.1.1 URL https://www.degruyter.com/view/journals/lity/4/1/article-p1.xml

  4. Viacheslav Chirikba. Evidential category and evidential strategy in Abkhaz. Typological Studies in Language, 54:243–272, 2003. URL https://benjamins.com/catalog/tsl.54.14chi.

  5. Winfried Boeder. The South Caucasian languages. Lingua, 115(1): 5–89, 2005. ISSN 0024-3841. doi: https://doi.org/10.1016/j.lingua.2003.06.002. URL http://www.sciencedirect.com/science/article/pii/S0024384103001244.

  6. Nivre, Joakim; Abrams, Mitchell; Agić, Željko; et al., 2019, Universal Dependencies 2.4, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-2988.