The Structure of a Book in Sefaria

JonMosenkis edited this page Nov 8, 2017 · 6 revisions

Index and Versions

Each "book" in Sefaria has a single index and one or more versions.

The Index

The Index of a book contains meta-data about the book, such as who the author is, how the book is categorized in our library and information about the structure of the book - referred to as the book's "Schema".


Each index may have one or more Versions associated with it. Versions may be in different languages, but each version must be structured exactly the same. For example, on Genesis we have 18 versions at the time this was written - in multiple languages including Hebrew, English, French, Spanish and German. While each version may differ slightly in it's text, they all follow the exact same structure. When retrieving texts from any version, one should be able to get that versions edition of the text. For example, here is the text for Genesis 1:1 across several of our versions:

בראשית ברא אלהים את השמים ואת הארץ

בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ׃

In the beginning God created the heaven and the earth.

[de] Elokim destes céus e desta terra.

In the beginning God created the heavens and the earth.

In the beginning God created the heaven and the earth.

The Index Schema

The structure of a book is defined in the Index schema. For information on how to create a Schema, please refer to Index Records for Simple & Complex Texts. A schema is essentially a tree, with named nodes as required. The leaves of the tree will be JaggedArrayNodes.

JaggedArrayNodes and JaggedArrays

The text in Sefaria is structured in JaggedArrays - nested arrays, with the lowest level being an array of strings. The position of an item in the array reflects the structural information about that item.

Let's illustrate this with an example: A simple schema that consists of a single JaggedArrayNode. The entire book exists within a single array:

[['verse 1', 'verse 2'], ['foo'], ['hello', 'world', 'foo', 'bar']]

The first item below the root is itself an array which represents the first chapter. The "chapter" is just an array of strings - where each string represents a verse.

Sections and Segments

In Sefaria, we refer to the strings at the bottom of the JaggedArray as segments. The arrays that contain those strings are called sections. In general, segments reflect the maximum resolution that we are capable of reaching in a given text - one cannot directly reference a part of a segment.

Going Below the Segment level

On occasion it becomes necessary to convey information about a specific word or location within a segment. A common example is text formatting - the structure we outlined so far does not allow for any method of telegraphing that some words need to be formatted in one way or another.

A common solution is to use html/xml tags within a segment. A major drawback of this is that such data is version specific. Unfortunately, this is the best method we currently have for including sub-segment level data.

Text Formatting

Formatting text is relatively straightforward. One must only wrap the text that is to be formatted in an appropriate html tag. So, if I wish to mark a word as bold, I would simply wrap it in a <b> tag. This is commonly used in commentaries that start off with a quote from the text they are commenting on (דיבור המתחיל).


Footnotes contain two parts - a marker (usually a number) and text. Footnotes can be added to text using a <sup> tag and an <i class="footnote"> tag for the marker and text respectively. An example of a segment with a footnote:

Lorem ipsum dolor sit amet, <sup>1</sup><i class="footnote">The text inside the footnote</i>consectetur adipiscing elit.

Inline Reference

Sometime a commentary will refer to a specific word or phrase within another text, but will not quote such a text. In printed books, a marker similar to a footnote can be found. Placing an entire commentary inside the parent text as we did with footnotes is not acceptable. In such a case, we display a small marker within the segment that will become apparent to the user when a commentary that requires such a reference is selected in the sidebar.

Creating such a reference requires data to exist not only within the text segment, but also on the link between the commentator and the parent text. The parent segment will contain an <i> tag with the css properties data-commentator and data-order. data-commentator must match the "collective_title", and data-order will usually match the segment number of the linked comment (the exception to this rule is when individual comments are so long that they have their own internal structure - in such cases the section number can be used). In addition an optional data-label can be added - this is an all purpose override of any display logic - whatever this is set to is what will be displayed.

The link object requires the field inline_reference. The value of inline_reference will be a dictionary with keys data-commentator, data-order and data-label(optional) whose values will be identical to what was set in the <i> tag in the parent segment.

Parent segment: Lorem ipsum <i data-commentator="Child" data-order="1"></i>consectetur adipiscing elit.

Link Object:

    'refs': ['Parent 1:1', 'Child 1:1'],
    'type': 'commentary',
    'inline_reference': {
        'data-commentator': "Child",
        'data-order': 1

Here as well, the data-commentator field must match the collective_title of the commentator.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.