From Manuscript to Text Analytics


Tuesday 2019-07-09, 09:00 - 16:00

DH2019 Conference, Utrecht, Netherlands

Building: Tivoli Vredenburg

Address: Vredenburgkade 11, 3511 WC Utrecht, Netherlands

Room: Pandora Foyer


In one day, we take participants through the entire workflow from having real manuscripts in your hands to performing complicated database computations on the texts these manuscripts contain.


As this workshop is packed with insights and practices, what you get out of this workshop is directly proportional to your preparation.

Click on the program items to see the kind of preparation is recommended.


Last minute note

Mladen will not be here due to sudden logistical problems.

Our apologies!

time title tutor
9:00 - 9:30 Intro, aim, agenda all
Stage 1 From physical manuscript to digital manuscript
9:30 -10:00 Explanation on the variety of digitization technologies Cornelis van Lit
10:00 -10:30 Practicum on evaluating digitized manuscripts Cornelis van Lit
10:30 - 11:00 Break
Stage 2 From digital manuscript to text extraction
11.00-11.30 Explanation on pattern recognition and deep learning Mladen Popović Practicum on pattern recognition Maruf Dhali
12:00 - 13:15 Lunch Gracefully offered by Cornelis in Café Luden (10 min walk, follow us)
Stage 3 From text extraction to database
13.15-13.30 Explanation on preparing texts in a uniform manner Wido van Peursen
13.30-14.15 Practicum: counting in the Dead Sea Scrolls and tagging in Akkadian Dirk Roorda
Stage 4 From database to text-analysis
14.15-14.45 Text analysis for texts from manuscripts (part 1) Pierre van Hecke
14:45 - 15:15 Break
15.15-15.45 Text analysis for texts from manuscripts (part2) Mathias Coeckelbergs
15.45-16.00 Conclusion, follow-ups


Introduction of all tutors of this workshop.


You can join the Ancient-Data Slack workspace in advance. Sign up by clicking the link

In this workspace, there are two channels:

  • DH2019 where you can discuss the workshop and leave feedback;
  • DH2019-who-is-who where you can introduce yourself to each other and to the workshop organizers.
  • It is recommended to join this workspace in order to get the most out of this workshop, get last minute announcements, and to stay in contact for follow-up.

If you are not registered as a participant, you can ask Dirk for an invite.

Scholarly communication

Consider submitting a data paper in the Brill-DANS Reseasrch Data Journal.

There are several intensely studied corpora in the humanities. For such corpora it pays off to curate and prepare them for repeated processing. Although this is not research proper, it lays the foundations for game-changing research, and as such corpus construction merits inclusion in the academic record. A data paper in this peer-reviewed journal achieves exactly that. The journal is Open Access, so have a look at some examples.

