Skip to content

DT-UCPH/sp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Samaritan Pentateuch

DOI License: CC BY-NC 4.0

In this repo you find our Text-Fabric dataset of the Samaritan Pentateuch. The dataset is work in progress, and so far, we have added a number of word features, which you find in the tf folder. The features are similar to those of the Biblia Hebraica Stuttgartensia Amstelodamensis (BHSA), so we refer to the BHSA feature documentation for more explanation of the features.

The CACCHT project: Creating Annotated Corpora of Classical Hebrew Text

This dataset is developed as part of the CACCHT project, which is a collaboration of Christian Canu Højgaard, Martijn Naaijer, Martin Ehrensvärd, Robert Rezetko, Oliver Glanz, and Willem van Peursen. The goal of CACCHT is to prepare and publish ancient Semitic texts digitally that can be used for research.

Text

The text was provided by the Samaritanus-project based at Martin-Luther-Universität Halle-Wittenberg, directed by Stefan Schorch, and is based on a transcription MS Dublin Chester Beatty Library 751 (Gen 1-Deut 32:36) + MS Garizim 1 (Deut 32:36b-34), cf. Stefan Schorch (ed.), The Samaritan Pentateuch: A critical editio maior. Berlin: de Gruyter, 2018-.

We have made a small change in the original verse division. Instead of assigning the additions after Genesis 30:36 to the verse numbers 36a, 36b, and 36 c, we group these under verse 36.

Use of the dataset

You can use the dataset freely for research and education. If you do so, please refer to it in the following way:

Christian Canu Højgaard, Martijn Naaijer, & Stefan Schorch. (2023). Text-Fabric Dataset of the Samaritan Pentateuch. Zenodo. https://doi.org/10.5281/zenodo.7734632

You can also refer to specific versions of the dataset.

Versions

This repo is work in progress. Before version 2.0, the dataset consisted of the text of Genesis. In 3.0 all morphemes have been added for the entire Samaritan Pentateuch. Parsing of the morphemes (verbal tense, gender etc.) is completed for Genesis only. Morphology will be implemented gradually for Exodus-Deuteronomy. If a feature has not been implemented yet for those books, the values are '?'.

Version

  • 0.1 9. November 2022 First data of the book of Genesis.
  • 1.0 29. December 2022
  • 2.0 23. February 2023 Addition of g_cons_raw of Exodus-Deuteronomy.
  • 3.0 3. June 2023 Addition of all morphemes of Genesis-Deuteronomy

Features

Here and there we still need to decide which value a feature should have for a specific object. In this case, the value is "absent".

Currently, the following features exist for all books:

  • g_cons
  • lex
  • sp
  • g_vbs
  • g_pfm
  • g_lex
  • g_vbe
  • g_nme
  • g_uvf
  • g_prs
  • vt
  • ps
  • prs_ps
  • nu
  • prs_nu
  • gn
  • prs_gn