Skip to content
Vishal Thamizharasan edited this page Aug 26, 2019 · 15 revisions

Abstract

The current display system used at CDLI requires that a user reads a text to absorb visual and text information simultaneously, and to interpret the mapping between them, since image and transliteration are shown side by side (example: https://cdli.ucla.edu/P315663). Experts in cuneiform studies are usually able to discern this mapping only for their areas of expertise; non-experts and informal learners, on the other hand, have no direct means of affiliating image and annotation content.

Previously work had been done on extracting and classifying cuneiform characters which involved the use of scanned 3d cuneiform dataset Nils M. Kriege et al. [1], D. Fisseler et al.[2], Hubert Mara et al. [3]. Extraction of these characters from the 3d models required the use of specialised software. Also in some cases after individual wedges were extracted, manual post-processing was needed to specify the affiliation of the wedges to the cuneiform signs/characters which required a lot of time. The main issue is these techniques only work on 3d datasets (testing purposes), and generating them would require specialised equipment to scan the tablets. Since CDLI has a large collection of 2d cuneiform images (100,000+) it would be beneficial if a system could conduct OCR on them.

I formulated my project as an object (line/sign) detection and classification task where I detect individual cuneiform characters and give them a class. I do not use any language model for cuneiform script ie; relation between characters, zipf's law, histogram for the characters to verify/improve the detection.

  • Example if the model has learned these properties it would identify a word sequence "the the" would not appear in English text and would lower the score/confidence for the detection.
  • Another example is spelling mistakes: Lugal (lu-gal) corresponds to "man"-"big" (king), but if the detection system incorrectly detects it as lu-kisal (kisal and gal looks similar), the language model would identify lukisal does not exist in the cuneiform dictionary and hence correct it as lugal or lower the confidence/score of the detection while training.

Firstly ad-hoc/traditional image processing techniques were used for line and sign detection. It works relatively well for line detection on clean and properly scribed cuneiform tablets. But it is not good for Sign detection (ridge detection and connect components). It is very complex to make image processing techniques understand which wedges belong to which cuneiform signs.

I then looked at object detection models that would help with character detection. Most of the high performance state of the art object detection models are supervised which required labelled datasets. Currently CDLI does not have annotated 2d images (class names with coordinates) ie; no bounding box coordinates for the characters with their class names. This can be done manually using various tools like BBox label, Yolo Mark, but the process is very time consuming.

Therefore to solve this issue, we generate synthetic cuneiform image with annotation automatically. Here we try to make the synthetic images look as real as possible with respect to character-shape/orientation, lighting, reflection, tablet texture and color. The location of the characters are saved in an annotation file. With this we can start training for object detection and classification.

The goal of the project involves developing a system:

  1. that takes as input: cuneiform text and image to generate segments equivalent to the number of lines of transliteration. Appropriate segment indexing should enable us to further map the text and segments.

  2. detect and recognise cuneiform characters to generate transliterations directly from the image.

The following image shows the process:

Methods

Traditional Method

Learning based method