Code for the paper "Cliche expressions in literary and genre novels", presented at the Latech-CLfL 2018 workshop.
This repository is intended for documentation purposes, as the relevant data cannot be made publicly available.
- The Riddle of Literary Quality corpus of 401 novels and survey data; http://literaryquality.huygens.knaw.nl/
- Cliche expressions that formed the basis for the book ISBN: 978-94-004-0511-0; https://www.dathoorjemijnietzeggen.nl/
- Lassy small, Lassy large; http://www.let.rug.nl/~vannoord/Lassy/
- Corpus Gesproken Nederlands (Spoken Dutch corpus); http://lands.let.ru.nl/cgn/
See requirements.txt
.
Install with pip3 install -r requirements.txt
cliche_queries.txt
: Original file with cliches is converted to regular expressions (using regular expressions...) with sed scriptconv.sed
runqueries
: cliches are counted in novels by running each query on all the novelspostprocess
: produces HTML files, plots, and CSV files.
The lassyextract.sh
and lassyngrams.sh
scripts were used to extract a table
of n-gram counts from the SONAR part of Lassy Large using Colibri-Core. Run
them from the Lassy Large Data
directory.
The rest of the analysis is done in the notebook.
@InProceedings{vancranenburgh2018cliche,
author={van Cranenburgh, Andreas},
title={Cliche Expressions in Literary and Genre Novels},
year={2018},
booktitle={Proceedings of LaTeCH-CLfL workshop},
pages={34--43},
url={http://www.aclweb.org/anthology/W18-4504}
}