What determines canonicity? Is this purely subjective, or can we partly attribute it to textual features?
- Collect a corpus of Dutch novels 1800-2000
- Investigate canonicity with distant reading
- Release an open access dataset of textual features and metadata
- Create an online demo
- World domination
This Researcher-in-Residence project ran from April to October 2021. The results are described in two blog posts:
- https://lab.kb.nl/about-us/blog/dataset-dutch-novels-1800-2000
- https://lab.kb.nl/about-us/blog/machine-learning-canonicity-dutch-novels-1800-2000
The code in this repository is mostly intended for documentation purposes. Re-building the dataset and fully reproducing all results requires data from a variety of sources; please get in touch if you are interested in this.
- @andreasvc: https://andreasvc.github.io/
- @Veldhoen: https://lab.kb.nl/person/sara-veldhoen