# Motivation

Now, that I've extracted the people names and geopolitical entities contained within the Bible, I want to know how often God or Yahwah is either the nominal subject or direct object in the Bible. The nominal subject is the person or thing that is doing something in a clause. The direct object is the person or thing that is the recipient of the action in a clause. I'd like to know how often God is doing something in the Bible and how often is something being done to God. 

# Set up

This is my typical set up. I import the modules I will use, set my project directory, remove column and row limits, and allow Jupyter to display all of the output from each cell.

In [1]:
import os
import pandas as pd
import numpy as np
import sqlite3
import spacy
from datetime import datetime

# Set project folder as directory
os.chdir(r'C:/Users/david/Projects/Bible Analytics')

# Remove row and column limits
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

# Display all output from each cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# Accessing data

In [2]:
database = 'Data/SQL database.db'

In [3]:
conn = sqlite3.connect(database)
 
df = pd.read_sql_query('SELECT * FROM t_web', conn)
 
conn.close

<function Connection.close()>

In [4]:
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31102 entries, 0 to 31101
Data columns (total 9 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     31102 non-null  object
 1   old_new  31102 non-null  object
 2   group    31102 non-null  int64 
 3   id       31102 non-null  int64 
 4   b        31102 non-null  int64 
 5   c        31102 non-null  int64 
 6   v        31102 non-null  int64 
 7   t        31102 non-null  object
 8   clean_t  31102 non-null  object
dtypes: int64(5), object(4)
memory usage: 2.1+ MB


Unnamed: 0,name,old_new,group,id,b,c,v,t,clean_t
0,Genesis,OT,1,1001001,1,1,1,"In the beginning God{After ""God,"" the Hebrew has the two letters ""Aleph Tav"" (the first and last letters of the Hebrew alphabet) as a grammatical marker.} created the heavens and the earth.",In the beginning God created the heavens and the earth.
1,Genesis,OT,1,1001002,1,1,2,Now the earth was formless and empty. Darkness was on the surface of the deep. God's Spirit was hovering over the surface of the waters.,Now the earth was formless and empty. Darkness was on the surface of the deep. God's Spirit was hovering over the surface of the waters.
2,Genesis,OT,1,1001003,1,1,3,"God said, ""Let there be light,"" and there was light.","God said, ""Let there be light,"" and there was light."
3,Genesis,OT,1,1001004,1,1,4,"God saw the light, and saw that it was good. God divided the light from the darkness.","God saw the light, and saw that it was good. God divided the light from the darkness."
4,Genesis,OT,1,1001005,1,1,5,"God called the light Day, and the darkness he called Night. There was evening and there was morning, one day.","God called the light Day, and the darkness he called Night. There was evening and there was morning, one day."


# BEGIN

First, I will define an nlp object by loading a large, English language, trained pipeline. I'm using the large pipeline because I want my model to be able to predict the nominal subjects and direct objects as accurately as possible. The large pipeline has been trained on significantly more data and should do a better job with this task. Next, I define four variables called verses_ns, count_ns, verses_do and count_do, and I set each of these variables to 0. Then I begin iterating through the Biblical text. 

The very first thing I do within my FOR loop is define two more temporary variables, sub_count_ns and sub_count_do, and set them to 0. In order to know how many verses contain God as a nominal subject or direct object, I will evaluate these temporary variables to see if their value is greater than 0 after iterating through each word in every verse.

Next, I'm using a TRY block to handle any errors that may occur. Within this TRY block I will define doc by applying our nlp object to the each verse's text. Then, I will use a nested FOR loop to iterate through each token. Within this nested FOR loop, I will use a conditional statement to determine if a token is either "God" or "Yahweh". If so, I will use additional conditional statements to determine is that token is either a nominal subject or direct object. If so, I will increase the temporary count for each, respectively.

After iterating through the entire verse, I will use conditional statements to determine if the count for either is greater than zero. If so, I will add the verse count to the appropriate permanent count and increase the appropriate verse count.

If an exception is found, the same code is repeated using the unclean text. If another exception is found, a note will be posted to the output.

In [30]:
nlp = spacy.load("en_core_web_lg")

verses_ns = 0
count_ns = 0

verses_do = 0
count_do = 0

# Ignore this
start = datetime.now()
# Stop ignoring


for index, row in df.iterrows():
    
    sub_count_ns = 0
    sub_count_do = 0
    
    try:
        
        doc = nlp(row['clean_t'])        
    
        for token in doc:
            
            if token.text in(['God', 'Yahweh']):
                
                if token.dep_=='nsubj':
                    sub_count_ns+=1
                
                elif token.dep_=='dobj':
                    sub_count_do+=1
        
        if sub_count_ns>0:
            
            count_ns+=sub_count_ns
            verses_ns+=1
        
        if sub_count_do>0:
            
            count_do+=sub_count_do
            verses_do+=1
        
            
    except:    
        
        try:
            
            doc = nlp(row['t'])        
        
            for token in doc:
                
                if token.text in(['God', 'Yahweh']):
                    
                    if token.dep_=='nsubj':
                        sub_count_ns+=1
                    
                    elif token.dep_=='dobj':
                        sub_count_do+=1
            
            if sub_count_ns>0:
                
                count_ns+=sub_count_ns
                verses_ns+=1
            
            if sub_count_do>0:
                
                count_do+=sub_count_do
                verses_do+=1
                    
        except:
            
            print('Check out this verse:')
            print(row['name'], row['c'], row['v'])        
            print()

# Ignore this
stop = datetime.now()

print('This process took', stop-start)
print()
df.info()

This process took 0:03:42.508037

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31102 entries, 0 to 31101
Data columns (total 9 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     31102 non-null  object
 1   old_new  31102 non-null  object
 2   group    31102 non-null  int64 
 3   id       31102 non-null  int64 
 4   b        31102 non-null  int64 
 5   c        31102 non-null  int64 
 6   v        31102 non-null  int64 
 7   t        31102 non-null  object
 8   clean_t  31102 non-null  object
dtypes: int64(5), object(4)
memory usage: 2.1+ MB


In [31]:
print('God is the nominal subject', count_ns, 'times in', verses_ns, 'verses.')
print('God is the direct object', count_do, 'times in', verses_do, 'verses.')
print('There are', len(df), 'verses in this translantion of the Bible and 1,189 chapters.')

God is the nominal subject 3582 times in 3296 verses.
God is the direct object 562 times in 512 verses.
There are 31102 verses in this translantion of the Bible and 1,189 chapters.


In [33]:
3296/31102
512/31102

0.10597389235418944

0.01646196386084496

# Results

God is the nominal subject 3,582 times across 3,296 verses in the Bible. This means He's the "doer" in about 11% of the verses of the Bible. Clearly, He is very active is the Bible. By contrast, He is the direct object only 562 times across 512 verses. An example would be something like, "And the people of Israel turned to God." This only occurred in about 2% of the verses in the Bible. 

I was actually surprised by this. I expected the number of times God did something and the number of times something was done to God to be close. Instead, there is a huge difference between the two. Almost ironically, the God who should be served, adored and worshiped is actually putting forth way more effort than His people. In other words, He was actively engaging with his people in 11% of the Biblical record while his people were actively engaging with Him in only 2% of the record. 