This repository provides the compiled full-text corpus of the Allgemeine Literatur-Zeitung (General Literature Gazette) published from 1785 to 1849.
Its current version (V2, May 2019) contains 26,612 pages of full-texts from 261 volumes which is equivalent to 120,369,005 tokens, including review volumes (the main part of ALZ), supplementary and intelligence notes. In the folder v2_201905 you can find a table of overview as well as the whole corpus in xml and txt format.
The files v2_201905/romantik.tsv and v2_201905/musik.tsv contain search results of keywords which are relevant to romanticism and music. For more datails see the scripts (v2_201905/find_romantik.py & v2_201905/find_musik.py).
This work was conducted within the Graduate School “Romanticism as a Model” (http://modellromantik.uni-jena.de) supported by the German Research Foundation (DFG) under Grant No.: GRK 2041/1.
This work is licensed under CC-BY-SA 4.0: https://creativecommons.org/licenses/by-sa/4.0/
Please cite the following paper if you use our corpus:
Udo Hahn, Tinghui Duan. 2019. Corpus Assembly as Text Data Integration from Digital Libraries and the Web. In JCDL ’19: Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries, June 02–06, 2019, Urbana-Champaign, IL, USA. [Paper]