<a href="https://colab.research.google.com/github/Clear-Bible/missional-ai/blob/main/07_Alignments.ipynb" target="_parent">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Bible Alignments

Clear Bible is building a repository of Bible alignments and supporting Python code at https://github.com/Clear-Bible/alignments. 

This notebook shows a simple example of loading and displaying the alignment data for Mark 1-2.

A subset of the data has been included in this repository for the NA27 Greek New Testament, and the Lexham English Bible. Full data is not included because of copyright restrictions.

In [9]:
%%capture
# quietly install the code and dependencies
# comment out the capture line for debugging information
!pip install "bible-alignments>=0.2.10"

In [13]:
from bible_alignments import grapecity
from bible_alignments import config

# reset paths for the notebook
# this overwrites the values from bible-alignments
import os
from pathlib import Path

# reset paths for the notebook
# this overwrites the values from bible-alignments
config.ROOT = Path(os.getcwd())

# these four lines are for Colab use: you can comment them out for running locally
if "missional-ai" not in os.listdir():
  !git clone https://github.com/Clear-Bible/missional-ai.git
if "missional-ai" in os.listdir():
  config.ROOT = Path(os.getcwd()) / "missional-ai"

DATAPATH = config.ROOT / "data"
print(f"DATAPATH = {DATAPATH}")

config.ALIGNMENTS = DATAPATH / "alignments"
config.SOURCES = DATAPATH / "sources"
config.TARGETS = DATAPATH / "targets"
config.NAMES = DATAPATH / "names"


Cloning into 'missional-ai'...
remote: Enumerating objects: 174, done.[K
remote: Counting objects: 100% (174/174), done.[K
remote: Compressing objects: 100% (108/108), done.[K
remote: Total 174 (delta 82), reused 115 (delta 49), pack-reused 0[K
Receiving objects: 100% (174/174), 9.27 MiB | 6.48 MiB/s, done.
Resolving deltas: 100% (82/82), done.
DATAPATH = /Users/sboisen/git/Clear-Bible/missional-ai/missional-ai/data


In [14]:
# display the source data for Mark 1:1
from bible_alignments import gcsource
sourcerd = gcsource.Reader(sourceid="NA27", targetid="LEB")

_fields: tuple = ("identifier", "text", "lemma", "pos", "morph", "gloss")
print("\t     ".join(_fields))
for k in sourcerd.keys():
    if k.startswith("41001001"):
        s = sourcerd[k]
        print("\t     ".join(f"{getattr(s, f):10}" for f in _fields))

identifier	     text	     lemma	     pos	     morph	     gloss
410010010011	     Ἀρχὴ      	     ἀρχή      	     noun      	     n- -nsf-  	     [the] beginning
410010010021	     τοῦ       	     ὁ         	     det       	     ra -gsn-  	     of the    
410010010031	     εὐαγγελίου	     εὐαγγέλιον	     noun      	     n- -gsn-  	     good news 
410010010041	     Ἰησοῦ     	     Ἰησοῦς    	     Name      	     nr -gsm-  	     of Jesus  
410010010051	     Χριστοῦ   	     Χριστός   	     Name      	     nr -gsm-  	     Christ    
410010010061	     υἱοῦ      	     υἱός      	     noun      	     n- -gsm-  	     [the] son 
410010010071	     θεοῦ      	     θεός      	     noun      	     n- -gsm-  	     of God.   


In [16]:
# likewise with the target data
from bible_alignments import gctarget
targetrd = gctarget.Reader(sourceid="NA27", targetid="LEB")

_fields: tuple = ("identifier", "text", "transType", "isPunc", "isPrimary")
print("\t     ".join(_fields))
for k in targetrd.keys():
    if k.startswith("41001001"):
        s = targetrd[k]
        print("\t     ".join(f"{getattr(s, f):10}" for f in _fields))

identifier	     text	     transType	     isPunc	     isPrimary
41001001001	     The       	               	              0	              0
41001001002	     beginning 	     k         	              0	              1
41001001003	     of        	     m         	              0	              0
41001001004	     the       	     k         	              0	              1
41001001005	     gospel    	     k         	              0	              1
41001001006	     of        	     m         	              0	              0
41001001007	     Jesus     	     k         	              0	              1
41001001008	     Christ    	     k         	              0	              1
41001001009	     .         	               	              1	              0


In [17]:
rd = grapecity.Reader(sourceid="NA27", targetid="LEB", languageid="eng", processid="manual")

In [18]:
# show an overview of the loaded alignment data
# note there are about 50% more English tokens than Greek ones: many of these are likely to be punctuation.
rd.display()

Source:	NA27	(1249 words)
Target:	LEB	(2034 words)
Process:	manual
1130 alignments


In [19]:
# display the alignments for Mark 1:1
for k in rd.keys():
    if k.startswith("41001001"):
        rd[k].display()

41001001.1: ['Ἀρχὴ']	['beginning']
41001001.2: ['τοῦ']	['the']
41001001.3: ['εὐαγγελίου']	['gospel', 'of']
41001001.4: ['Ἰησοῦ']	['Jesus', 'of']
41001001.5: ['Χριστοῦ']	['Christ']


In [20]:
# display the alignments for Mark 1:2
for k in rd.keys():
    if k.startswith("41001002"):
        rd[k].display()

41001002.1: ['Καθὼς']	['Just', 'as']
41001002.2: ['γέγραπται']	['written', 'it', 'is']
41001002.3: ['ἐν']	['in']
41001002.4: ['Ἠσαΐᾳ']	['Isaiah']
41001002.5: ['τῷ']	['the']
41001002.6: ['προφήτῃ']	['prophet']
41001002.7: ['ἰδοὺ']	['Behold']
41001002.8: ['ἀποστέλλω']	['sending', 'I', 'am']
41001002.9: ['ἄγγελόν']	['messenger']
41001002.10: ['μου']	['my']
41001002.11: ['πρὸ']	['before']
41001002.12: ['προσώπου']	['face']
41001002.13: ['σου']	['your']
41001002.14: ['ὃς']	['who']
41001002.15: ['κατασκευάσει']	['prepare', 'will']
41001002.16: ['ὁδόν']	['way']
41001002.17: ['σου']	['your']
