# Basically Bible Analytics 

## Starting with Basic Metrics

In this notebook, I will look at some basic metrics for the Bible. For instance, it is fairly easy to learn that there are sixty-six books in the Bible, but I don't think I have every heard anyone share how many chapters or verses are in the Bible. I will explore some of these basic questions.

In order to do this, I need the Bible text. I was able to obtain it from Kaggle here: https://www.kaggle.com/oswinrh/bible#t_asv.csv. If you link to this site, you will see many available versions, but I decided to use the Bible in Basic English because my goal is to eventually apply text analytics to the text.

In [1]:
import pandas as pd
bible = pd.read_csv(r'C:\Bible Research\Translations\Bible in Basic English\t_bbe.csv')

### Basic structure of the data

In [2]:
bible.head()

Unnamed: 0,id,b,c,v,t
0,1001001,1,1,1,At the first God made the heaven and the earth.
1,1001002,1,1,2,And the earth was waste and without form; and ...
2,1001003,1,1,3,"And God said, Let there be light: and there wa..."
3,1001004,1,1,4,"And God, looking on the light, saw that it was..."
4,1001005,1,1,5,"Naming the light, Day, and the dark, Night. An..."


In [3]:
bible.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31103 entries, 0 to 31102
Data columns (total 5 columns):
id    31103 non-null int64
b     31103 non-null int64
c     31103 non-null int64
v     31103 non-null int64
t     31103 non-null object
dtypes: int64(4), object(1)
memory usage: 1.2+ MB


### Basic counts

We already know that there are sixty-six books in the Bible, but I'll run some code to confirm. I will then find out how many chapters and verses are in the Bible. I will also find out how many words are in this particular version of the Bible. This obviously changes by version.

In [4]:
print('There are',bible.b.nunique(), 'books,', len(bible.groupby(['b', 'c']).size().reset_index().rename(columns={0:'count'})), 'chapters, and',  bible.id.nunique(), 'verses in this translation.')

There are 66 books, 1189 chapters, and 31103 verses in this translation.


Getting a word count is a little more complex. This site was helpful: https://www.geeksforgeeks.org/python-program-to-count-words-in-a-sentence

In [5]:
l = []

for index, row in bible.iterrows():
   value = len(row['t'].split())
   l.append(value)

print('There are', sum(l), 'words in this translation.')

There are 840357 words in this translation.


My next question is, how long would it take the average person to read the Bible all the way through? Everyone reads at a different pace, but this site tells us that the average person can read 300 words a minute: #   https://www.google.com/search?q=how+many+words+can+the+average+person+read+per+minute&rlz=1C1CHBF_enUS855US855&oq=how+many+words+can+the+average+per&aqs=chrome.0.0j69i57j0l6.8896j0j7&sourceid=chrome&ie=UTF-8

If we divide 840357 by 300, this should tell us how many minutes it would take an average person to read the Bible.

In [29]:
840357/300

2801.19

In [31]:
print('Which is', 840357/18000, 'hours')

Which is 46.6865 hours


This is roughly 46 hours and 40 minutes. This site (http://www.euxton.com/bible.htm) says that it takes 70 hours and 40 minutes to read the Bible at "pulpet rate." I assume this is much slower than the average person can read to themselves.

Next, how many words are in each book of the Bible, what proportion does each book account for, and how long would each book take an average person to read?

In [34]:
b = []
words = []
chapter = []

for i in bible.b.unique():
    value = i
    b.append(value)
    
    book = []
    
    for index, row in bible[bible.b == i].iterrows():
        value2 = len(row['t'].split())
        book.append(value2)
        
    words.append(sum(book))
    chapter.append(bible[bible.b==i].c.nunique())

In [37]:
books = pd.DataFrame()

books['b'] = b
books['c'] = chapter
books['words'] = words
books['proportion'] = books.words/sum(books.words)
books['minutes'] = books.words/250
books['hours'] = books.minutes/60

In [38]:
books.head()

Unnamed: 0,b,c,words,proportion,minutes,hours
0,1,50,38244,0.045509,152.976,2.5496
1,2,40,32329,0.038471,129.316,2.155267
2,3,27,24939,0.029677,99.756,1.6626
3,4,36,32357,0.038504,129.428,2.157133
4,5,34,29250,0.034807,117.0,1.95


This is interesting, but at this point we are refering to the books of the Bible by their order rather than their given name. I'm going to read in a key to attach names, which will be a little more insightful.

In [43]:
book_key = pd.read_csv(r'C:\Bible Research\key_english.csv')
book_key

Unnamed: 0,b,name,old_new,group
0,1,Genesis,OT,1
1,2,Exodus,OT,1
2,3,Leviticus,OT,1
3,4,Numbers,OT,1
4,5,Deuteronomy,OT,1
...,...,...,...,...
61,62,1 John,NT,7
62,63,2 John,NT,7
63,64,3 John,NT,7
64,65,Jude,NT,7


This dataframe contains the book order, which will allow me to tie this information to the dataframe I already have. It also contains the book name as well as which testement each belongs to and a group variable. The group variable refers to which type of book each is. For instance, Genisis is part of the Law, so it's in group 1. Jude is an epistle, so it's in group 7.

Now, I will merge the two dataframes and sort to see which books are the longest and shortest.

In [44]:
books = book_key.merge(books, how='inner', on='b')

In [46]:
books.sort_values('proportion')

Unnamed: 0,b,name,old_new,group,c,words,proportion,minutes,hours
62,63,2 John,NT,7,1,343,0.000408,1.372,0.022867
63,64,3 John,NT,7,1,385,0.000458,1.540,0.025667
56,57,Philemon,NT,7,1,490,0.000583,1.960,0.032667
30,31,Obadiah,OT,4,1,689,0.000820,2.756,0.045933
64,65,Jude,NT,7,1,733,0.000872,2.932,0.048867
...,...,...,...,...,...,...,...,...,...
0,1,Genesis,OT,1,50,38244,0.045509,152.976,2.549600
22,23,Isaiah,OT,4,66,40078,0.047692,160.312,2.671867
25,26,Ezekiel,OT,4,48,41366,0.049224,165.464,2.757733
23,24,Jeremiah,OT,4,52,46304,0.055100,185.216,3.086933


Wow! It looks like the shortest book of the Bible is 2 John, which only takes a little over a minute to read. I'm not surprised to see Psalms in first place, but I actually thought Gensis would be second. It's not even close!

In [49]:
round(sum(books[books.group == 1].proportion),2)

0.19

Eventhough Gensis is only the fifth longest book of the Bible, the first five books, the Law, accounts for 19% of the Bible. When you finish Deuteronomy, you're a fifth of the way through.

This concludes the basic analytics of the Bible. 