Skip to content

biblenerd/awesome-bible-developer-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Bible Developer Resources

Awesome License: CC0-1.0 PRs Welcome

A curated list of awesome resources for developers (and other nerds) working with biblical texts and related tools.

For non-specialist enthusiasts (like myself), it can be challenging to get started and understand what types of formats and datasets are available. This awesome list will (hopefully) assist folks in getting started and identifying helpful biblical resources.

For specialists, sometimes it can be hard to find links to relevant resources and associated tools and documentation (not least because so many links become dead over time)—this can help find available resource information and serve as a reference for onboarding colleagues to projects working with biblical corpora.

📝 ©️ ®️ ™️ I've attempted to include badges for licenses for resources where known. When no badge is shown, assume the resource is copyright © its respective author(s). The underlying corpora within resources may have separate licensing terms.

Contents

Formats

Formats / encodings used specifically for storing biblical texts and/or associated metadata (e.g., commentary, footnotes, annotations, tags, references, etc.).

Comparisons

Which format(s) should you use for your next project?

It depends on your goals and requirements. Here is some (hopefully) helpful information as you deliberate:

  • Many formats predate Unicode standards and so may require specific fonts to render primary source languages. Modern formats should be Unicode-compliant.
  • Most biblical texts / translations used in the Bible translation field are distributed in USFM and USX formats (USFM is one of the oldest SFM formats used to digitally store biblical translations).
  • Many biblical primary source texts made available from academic centers/projects use USFM, TEI, Text-Fabric, or plaintext formats.
  • Some formats are only intended for biblical texts, whereas others can store texts themselves along with additional metadata, and others still are only for metadata, linguistic tagging, other annotations, etc. Be sure to review documentation for what each format supports.
  • Article: Bible File Encoding for Bible Translators, Publishers, and Software Developers by Kahunapule Michael Johnson

Conversion Tools

  • Converting SFM Bibles to OSIS — Wiki page discussing the conversion of USFM to OSIS format.

    • usfm2osis License: GPL v3 — Python scripts for converting USFM to OSIS XML.
    • u2o License: Unlicense — Another Python USFM to OSIS bible format converter.
    • SWORD Module Tools — Converts USFM to SWORD module and also has tools to take regular OSIS Bibles and convert them into SWORD modules, along with some other OSIS-related conversions.
  • Haiola — Accepts USFM, USFX, and DBL USX file(s) input and outputs to a variety of user-friendly formats including (but not limited to) HTML, EPUB3, SWORD Project modules, Microsoft Word XML, and PDF.

  • ptx2pdf — XeTeX-based macro package for typesetting USFM-formatted (Paratext output) scripture files and outputting as a PDF.

  • SWORD to JSON Converter License — Tool to convert SWORD modules to JSON.

  • USFM to Other Formats — Java utility to convert from USFM to other formats.

CES

The Corpus Encoding Standard (CES) is in a SGML format (ISO 8879:1986) that is/was TEI-conformant and which is also available in XML format (XCES). CES is hosted by the Vassar College Department of Computer Science.

OSIS

Open Scripture Information Standard (OSIS) XML format by CrossWire Bible Society. Used in SWORD Project (along with TEI). OSIS is closely related to TEI but not necessarily conformant (it is more so a customization of TEI).

PAULA

The Potsdamer Austauschformat Linguistischer Annotationen / Potsdam Interchange Format for Linguistic Annotations (PAULA) by the Collaborative Research Center (Sonderforschungsbereich / SFB 632) Information Structure: The Linguistic Means of Structuring Utterances, Sentences and Texts, funded by the German Research Foundation (DFG) is an XML interchange format for linguistic annotations. A distinction of this format is that each layer of linguistic annotation, such as part-of-speech annotations, lemmatizations, syntax trees, coreference annotation etc. are stored in separate XML files which refer to the same raw data.

  • Main Website — Main website that includes references and links to current documentation.
  • ANNIS — ANNIS is an open source, cross platform, web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation. ANNIS provides the means for visualizing and retrieving PAULA data (and ANNIS is technically its own distinct format as well and can work with other formats).

TEI

The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains guidelines for the representation of texts in digital form. The TEI guidelines are used by several notable content-based projects including:

List of TEI resources:

Text-Fabric

Text-Fabric is a format and accompanying Python library / API for working with ancient texts and associated linguistic annotations as annotated graphs. It can accept corpora in a variety of formats and convert these to TF format.

USFM

Unified Standard Format Markers (USFM) was developed within Paratext for Bible translations.

  • Paratext USFM page — Main Paratext web page for USFM with a brief history of the format and links to its documentation and Paratext stylesheets.
  • USFM GitHub Repository — Main USFM GitHub repo.
  • USFM Documentation — USFM docs.
  • USFM Tools License: MIT — Python tools for parsing and rendering USFM files.
  • Paratext Software — A freely-available professional Bible translation software program with a graphical user interface (GUI) for working with USFM-formatted Bible texts (can also work with USX).
  • Bibledit Software License: GPL v3 — A freely-available professional Bible translation software program with a graphical user interface (GUI) for working with USFM-formatted Bible texts.

USFX

Unified Scripture Format XML (USFX) is an XML format that was derived from USFM.

License: LGPL v2,1+ or License: Common Public License 0.5+ (see LICENSING.txt).

USX

Unified Scripture XML (USX) is an XML format that is closely related to USFM. Used by Paratext and the DBL.

  • USX GitHub Repository — Main USX GitHub repo.
  • USX Documentation — USX docs.
  • Paratext USX page — Main Paratext web page for USX with links to its documentation.
  • Digital Bible Library (DBL) — United Bible Societies (UBS) online digital asset and licensing management platform. DBL gathers, validates, and safeguards a large collection of quality, standardized, digital Scripture texts and publication assets in hundreds of languages that are predominantly in USX format.
  • DBL USX Documentation — DBL docs for USX format.

Zefania

Simplistic XML format for Bible translations. License: GPL v3

Biblical Corpora

Biblical corpora (i.e., bible translations and primary source texts) and related resources available for download and offline use in various formats.

Coptic

Hebrew

Greek

LXX

New Testament

English Translations

  • American Standard Version (ASV) Bible Public Domain — Full text, footnotes, and formatting of the ASV Bible (1901) in USX format.

  • Multilingual Parallel Bible Corpus License: CC0-1.0 — A multilingual parallel corpus (aligned by verse) created from over 100 translations of the Bible intended for linguistic research and natural language processing (NLP) applications (e.g., statistical machine translation (SMT) training data, linguistic structure projected learning, etc.). There is also a repo with associated tools.

  • New English Translation of the Septuagint (NETS) — Translation of LXX under Computer Assisted Tools for Septuagint Studies (CATSS) Project. Distributed as PDF files and also available in print.

  • Open Bibles License: CC BY-SA 4.0 or Public Domain — GitHub repository of public domain and freely (libre) licensed Bibles in XML formats (including USFX, OSIS, and Zefania). Available English translations include KJV, BBE, OEB, and WEB. Also contains Hebrew Leningrad Codex, Clementine Latin Vulgate, and translations in other languages.

  • Open English Bible License: CC0-1.0 — A public domain revision of the Twentieth Century New Testament, which was a translation of the New Testament published in the early twentieth century based on the Greek text of Westcott and Hort. Available in USFM and user-friendly format; has associated GitHub repo.

  • OSIS Bibles — A collection of freely licensed translations of biblical text in OSIS format. Licensing varies per corpus.

  • SWORD Project License: GPL — CrossWire Bible Society's free Bible software project. Many modules containing Bible texts (and other resources) are available.

  • The 2001 Translation Public Domain — A free collaborative Bible translation continually corrected and refined by volunteers. Translates OT from the LXX and NT from Greek and Aramaic texts. There are a number of distinctives this translation follows that differ significantly from mainstream English Bible translation conventions. Downloads available in HTML, ePub, Kindle/MOBI, and Microsoft Word document formats.

  • unfoldingWord® Literal Text (ULT) License: CC BY-SA 4.0 — An open-licensed update of the ASV, intended to provide a ‘form-centric’ understanding of the Bible. It increases the translator’s understanding of the lexical and grammatical composition of the underlying text by adhering closely to the word order and structure of the originals. Available in USFM format in source repo.

  • unfoldingWord® Simplified Text (UST) License: CC BY-SA 4.0 — An open-licensed translation, intended to provide a ‘functional’ understanding of the Bible. It increases the translator’s understanding of the text by translating theological terms as descriptive phrases. Available in USFM format in source repo.

  • World English Bible (WEB) Public Domain — WEB is a modern English translation that is based on the American Standard Version (ASV) of the Holy Bible first published in 1901, the Biblia Hebraica Stutgartensa (BHS) Old Testament, and the Greek Majority Text New Testament. The companion Deuterocanon/Apocrypha is derived from the Revised Version Apocrypha and the Brenton translation of the Septuagint into English. It is in draft form and is currently being edited for accuracy and readability (although the Protestant Christian canon is essentially complete). Available in numerous formats including USFX, USFM, SWORD modules, Microsoft Word XML, HTML, plaintext, PDF, e-reader formats, XeTeX, and more.

APIs

Biblical corpora (i.e., bible translations and primary source texts) and related resources available for online use via application programming interfaces (APIs).

Lexical Resources

Lexica, dictionaries, and related resources for helping understand words in primary-source Bible texts in the English language.

Hebrew Lexica

Greek Lexica

Other Awesome Lists

Lists of resources prepared by others that may also be helpful for developers working with Bible texts and related resources.

  • Awesome Bible Data — List of biblical data including translations, tagged original language texts, second temple literature, early church writings, dictionaries, and cross references.

  • Awesome Bible NLP — A curated list of resources dedicated to Biblical Natural Language Processing (NLP).

  • Biblical Humanities Dashboard — This dashboard lists available data resources that are important for digital biblical humanities.

  • Hackathon.Bible — A resource site with Bible texts and datasets for building apps and new innovations for Bible engagement.

About

A curated list of awesome resources for developers (and other nerds) working with biblical texts and related tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published