# Basically Bible Analytics 

## Starting with Basic Metrics

In this notebook, I will look at some basic metrics for the Bible. For instance, it is fairly easy to learn that there are sixty-six books in the Bible, but I don't think I have every heard anyone share how many chapters or verses are in the Bible. I will explore some of these basic questions.

In order to do this, I need the Bible text. I was able to obtain it from Kaggle here: https://www.kaggle.com/oswinrh/bible#t_asv.csv. If you link to this site, you will see many available versions, but I decided to use the Bible in Basic English because my goal is to eventually apply text analytics to the text.

In [2]:
import pandas as pd
import sqlite3

In [None]:
bible = pd.read_csv(r'C:\Bible Research\Translations\Bible in Basic English\t_bbe.csv')

### Basic structure of the data

In [3]:
bible.head()

Unnamed: 0,id,b,c,v,t
0,1001001,1,1,1,At the first God made the heaven and the earth.
1,1001002,1,1,2,And the earth was waste and without form; and ...
2,1001003,1,1,3,"And God said, Let there be light: and there wa..."
3,1001004,1,1,4,"And God, looking on the light, saw that it was..."
4,1001005,1,1,5,"Naming the light, Day, and the dark, Night. An..."


In [4]:
bible.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31103 entries, 0 to 31102
Data columns (total 5 columns):
id    31103 non-null int64
b     31103 non-null int64
c     31103 non-null int64
v     31103 non-null int64
t     31103 non-null object
dtypes: int64(4), object(1)
memory usage: 1.1+ MB


Before analyzing the data, I want to save this data frame in a SQL database. I begin by connecting to the SQL database I created, *biblesql*.

In [5]:
conn = sqlite3.connect(r"C:\Bible Research\SQL database\biblesql.db")

Next, I store the dataframe *bible* as a SQL table called *bible_bbe*.

In [10]:
bible.to_sql('bible_bbe', conn, if_exists='replace', index=False)

Finally, I query the SQL table I created.

In [11]:
pd.read_sql('select * from bible_bbe where b = 5 limit 5', conn)

Unnamed: 0,id,b,c,v,t
0,5001001,5,1,1,These are the words which Moses said to all Is...
1,5001002,5,1,2,It is eleven days' journey from Horeb by the w...
2,5001003,5,1,3,"Now in the fortieth year, on the first day of ..."
3,5001004,5,1,4,"After he had overcome Sihon, king of the Amori..."
4,5001005,5,1,5,"On the far side of Jordan in the land of Moab,..."


### Basic counts

We already know that there are sixty-six books in the Bible, but I'll run some code to confirm. I will then find out how many chapters and verses are in the Bible. I will also find out how many words are in this particular version of the Bible. This obviously changes by version.

In [16]:
pd.read_sql('select count(distinct(b)) as books from bible_bbe', conn)

Unnamed: 0,books
0,66


There are 66 books,

In [32]:
pd.read_sql('select count(c) as chapters from (select distinct(b), c from bible_bbe)', conn)

Unnamed: 0,chapters
0,1189


1,189 chapters, 

In [33]:
pd.read_sql('select count(v) as verses from bible_bbe', conn)

Unnamed: 0,verses
0,31103


And 31,103 verses in th Bible in Basic English.

Getting a word count is a little more complex. This site was helpful: https://www.geeksforgeeks.org/python-program-to-count-words-in-a-sentence

In [34]:
l = []

for index, row in bible.iterrows():
   value = len(row['t'].split())
   l.append(value)

print('There are', sum(l), 'words in this translation.')

There are 840357 words in this translation.


My next question is, how long would it take the average person to read the Bible all the way through? Everyone reads at a different pace, but this site tells us that the average person can read 300 words a minute: #   https://www.google.com/search?q=how+many+words+can+the+average+person+read+per+minute&rlz=1C1CHBF_enUS855US855&oq=how+many+words+can+the+average+per&aqs=chrome.0.0j69i57j0l6.8896j0j7&sourceid=chrome&ie=UTF-8

If we divide 840357 by 300, this should tell us how many minutes it would take an average person to read the Bible.

In [35]:
840357/300

2801.19

In [36]:
print('Which is', 840357/18000, 'hours')

Which is 46.6865 hours


This is roughly 46 hours and 40 minutes. This site (http://www.euxton.com/bible.htm) says that it takes 70 hours and 40 minutes to read the Bible at "pulpet rate." I assume this is much slower than the average person can read to themselves.

Next, how many words are in each book of the Bible, what proportion does each book account for, and how long would each book take an average person to read?

In [37]:
b = []
words = []
chapter = []

for i in bible.b.unique():
    value = i
    b.append(value)
    
    book = []
    
    for index, row in bible[bible.b == i].iterrows():
        value2 = len(row['t'].split())
        book.append(value2)
        
    words.append(sum(book))
    chapter.append(bible[bible.b==i].c.nunique())

In [38]:
books = pd.DataFrame()

books['b'] = b
books['c'] = chapter
books['words'] = words
books['proportion'] = books.words/sum(books.words)
books['minutes'] = books.words/250
books['hours'] = books.minutes/60

In [39]:
books.head()

Unnamed: 0,b,c,words,proportion,minutes,hours
0,1,50,38244,0.045509,152.976,2.5496
1,2,40,32329,0.038471,129.316,2.155267
2,3,27,24939,0.029677,99.756,1.6626
3,4,36,32357,0.038504,129.428,2.157133
4,5,34,29250,0.034807,117.0,1.95


This seems like a good time to store this dataframe as a SQL table.

In [55]:
books.to_sql('books', conn, if_exists='replace', index=False)

This is interesting, but at this point we are refering to the books of the Bible by their order rather than their given name. I'm going to read in a key to attach names, which will be a little more insightful. I will also save this dataframe as a SQL table.

In [41]:
book_key = pd.read_csv(r'C:\Bible Research\key_english.csv')
book_key.to_sql('book_key', conn, if_exists='replace', index=False)

In [43]:
pd.read_sql('select * from book_key limit 5', conn)

Unnamed: 0,b,name,old_new,group
0,1,Genesis,OT,1
1,2,Exodus,OT,1
2,3,Leviticus,OT,1
3,4,Numbers,OT,1
4,5,Deuteronomy,OT,1


This dataframe contains the book order, which will allow me to tie this information to the dataframe I already have. It also contains the book name as well as which testement each belongs to and a group variable. The group variable refers to which type of book each is. For instance, Genisis is part of the Law, so it's in group 1. Jude is an epistle, so it's in group 7.

Now, I will merge the two dataframes and sort to see which books are the longest and shortest.

In [77]:
merged = pd.read_sql('select k.*, e.c, e.words, e.proportion, e.minutes, e.hours from books AS e inner join book_key AS k on e.b = k.b order by e.proportion desc', conn)

In [78]:
merged.to_sql('bible_metrics', conn, if_exists='replace', index=False)

In [87]:
pd.read_sql('select * from bible_metrics', conn)

Unnamed: 0,b,name,old_new,group,c,words,proportion,minutes,hours
0,19,Psalms,OT,3,150,48740,0.057999,194.960,3.249333
1,24,Jeremiah,OT,4,52,46304,0.055100,185.216,3.086933
2,26,Ezekiel,OT,4,48,41366,0.049224,165.464,2.757733
3,23,Isaiah,OT,4,66,40078,0.047692,160.312,2.671867
4,1,Genesis,OT,1,50,38244,0.045509,152.976,2.549600
5,4,Numbers,OT,1,36,32357,0.038504,129.428,2.157133
6,2,Exodus,OT,1,40,32329,0.038471,129.316,2.155267
7,5,Deuteronomy,OT,1,34,29250,0.034807,117.000,1.950000
8,42,Luke,NT,5,24,28249,0.033615,112.996,1.883267
9,14,2 Chronicles,OT,2,36,27122,0.032274,108.488,1.808133


It looks like the shortest book of the Bible is 2 John, which only takes a little over a minute to read. I'm not surprised to see Psalms in first place, but I thought Gensis would be second.

In [86]:
pd.read_sql('select sum(proportion) as proportion from bible_metrics where b in (1,2,3,4,5)', conn)

Unnamed: 0,proportion
0,0.186967


Eventhough Gensis is the fifth longest book of the Bible, the first five books, the Torah, accounts for 19% of the Bible. When you finish Deuteronomy, you're 1/5 of the way through.

Lastly, I want to see the tables I created in my SQL database.

In [91]:
cursor = conn.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cursor.fetchall())

[('bible_bbe',), ('book_key',), ('books',), ('bible_metrics',)]
