# Motivation

Several years ago, a work friend encouraged me to read the entire Bible from cover to cover. Even though I was well into my thirties and was raised in a Christian home, I'd never done this. There were times I tried, but I usually sputter out around Exodus or Leviticus.

But John was so excited about the Bible that every day he would walk through the door and say something like, "I just read Judges. It was amazing!" As time went by, he continued to share his progress and excitement, and my desire to read the Bible grew. One day I took the plunge and I have been reading every since.

Years later, I've read or listening to the entire Bible every year except one.  It is the most exciting book I've ever read, and I would love to inspire others to take the same plunge. To do this, I'm going to start with some basic analytics and then dig deeper into everything I can think of. I hope something I share will spark the same desire in others.

I will use the World English Bible translation as my data source. I have a limited number of translations to choose from and I want to do text analytics using every day English. I downloaded the text from Kaggle. You can find it here: https://www.kaggle.com/oswinrh/bible#t_asv.csv

# Set up
Importing the necessary packages for this study, setting up my project folder as the directory for easier navigation, removing row and column display limits, and telling Jupyter to display all of the output from each cell, not just the last.

In [None]:
import os
import pandas as pd
import numpy as np
import sqlite3
import spacy

# Set project folder as directory
os.chdir(r'C:/Users/david/Projects/Bible Analytics')

# Remove row and column limits
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

# Display all output from each cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# Getting counts

One of the most basic analytics we can do with the Bible text is a counts analysis. In this next section, I will pull the data from our SQL database, tokenize the text using spaCy, and count the number of non-punctuation tokens in each verse. I will then use these counts to generate some basic insights. 

## Access the data from our SQL database

In [None]:
database = 'Data/SQL database.db'

In [None]:
conn = sqlite3.connect(database)
 
df = pd.read_sql_query('SELECT * FROM t_web', conn)
 
conn.close

In [None]:
df.info()
df.head()

## NLP

Now that I've pulled the data, I'm going to process it using spaCy. I'll wrap the code in a FOR loop so that I can process each row of data using the same approach. 

In the first line of code I define a language object called nlp. When defining nlp, I will load a small, English language, trained pipe line called "en_core_web_sm". I only need this object to tokenize (split the text into individual words, numbers, punctuations, etc.) the data so that I can get a word count for each verse. As such, I won't load anything bigger than this. 

Next, I create an empty list into which the word count for each row of text can be inserted. I'll name this word_count. I'm also going to time this process, so we can see how quickly this happens. Feel free to ignore the *datatime, start, stop* stuff. 

After this, I use a FOR loop to iterate through the rows of data. This code is pretty typical and you can find tons of resources with more detail about iterating through dataframes. Within the FOR loop, I'm defining "doc" by applying our nlp object to each row of text data. Then, I'm creating a variable called "count" and setting it's initial value to zero. For each row, count will reset to zero before the text data is processed. Once this is done, I'm creating a nested FOR loop which will iterate through every token in our doc object. I'm not interested in punctuation, so I use an IF statement to tell the code to ignore punctuation tokens. Then I increase the count variable by 1 for each non-punctuation token. This continues until all non-punctuation tokens have been accounted for. Once this happens, the nested FOR loop will end and the final count for that row of data will be appended to our predefined list, word_count. This continues until all of rows of data have been analyzed. The last thing I'll do before analyzing this data is insert all of the word counts into our dataframe.

Shew... take a breath. We're done with the technical stuff for now!

In [None]:
nlp = spacy.load("en_core_web_sm")

word_count = []

# Ignore this
from datetime import datetime
start = datetime.now()
# Stop ignoring

for index, row in df.iterrows():
    
    doc = nlp(row['clean_t'])
    
    count = 0
    
    for token in doc:
        if not token.is_punct:
            
            count+=1
            
    word_count.append(count)
    
df['word_count'] = pd.Series(word_count)

# Ignore this
stop = datetime.now()

print('This process took', stop-start)
print()
df.info()
df.head()

Yay! If I were to physically count each word in the Bible, it would take months and there would be a high likelihood of mistakes. Thanks to technology, I'm able to accomplish the same task in just over three minutes with no chance of mistakes... as long as I got everything right. 

Looking at the first five rows of our data, the counts appear to be in order.

# Let the analysis begin!!

In [None]:
books = len(df['b'].unique())
chapters = len(df[['b', 'c']].drop_duplicates())
verses = len(df[['b', 'c', 'v']].drop_duplicates())
words = df['word_count'].sum()

print('There are {} books in the Bible, {} chapters in those books, {} verses in those chapters, and {} words in those verses.'.format(books, chapters, verses, words))

The number of books and chapters shouldn't change from one translation to the next, but we already know the number of verses will be lower in this particular translation and the number of words will almost certainly be different.

## What's the shortest verse in the Bible?

Notice, that I filtered to verses whose count exceeds 0. In this translation, some verses have no words because the translating team determined that those verses where not in the earliest manuscripts.

In [None]:
df[df['word_count']>0][['name', 'c', 'v', 'clean_t', 'word_count']].sort_values(['word_count']).head()

We have a three-way tie. There is the classic John 11:35 - "Jesus wept", there is also 1 Thessalonians 5:16 - "Rejoice always", and my new favorite, Job 3:2 - "Job answered." 

# What are the longest books in the Bible? Shortest?

In [None]:
temp = df[['name', 'word_count']].groupby(['name'])['word_count'].sum()

book_count = pd.DataFrame(temp).sort_values(['word_count'], ascending=False).reset_index()

book_count.head()
book_count.tail()


Surprisingly, Jeremiah is the longest book in this translation of the Bible. Then Psalms, Ezekiel, Genesis and Isaiah. 3 John is the shortest, followed by 2 John, Philemon, Jude and Obadiah.

# How long does it take to read the Bible?
According to scholarwithin.com, the average adult can silently read 238 words per minute and read aloud 183.

https://scholarwithin.com/average-reading-speed#:~:text=Adult%20Average%20Reading%20Speed,-It%20has%20been&text=Silent%20reading%20adults%20average%20238,average%20183%20words%20per%20minute.

## Calculating

In [None]:
minutes = 769817/238
hour_days = minutes/60
half_hour_days = minutes/30
quarter_hour_days = minutes/15
ten_minute_days = minutes/10
in_a_year = minutes/365

round(hour_days)
round(half_hour_days)
round(quarter_hour_days)
round(ten_minute_days)

## What it takes
Based on the average number of words an adult can silently read per minute, it would take most adults 54 hours to silently read the entire Bible. That's 108 days if you read for half an hour per day, 216 days if you read for fifteen minutes per day, and 323 days if you read for ten minutes. So, the average adult can read the entire Bible in just under a year by silently reading for around 10 minutes per day. 

Of course, this requires a ton of discipline, which means some people will be better at this approach than others. Personally, I've never had much luck setting aside ten minutes a day to read. I either read for way longer or completely miss a day, then two, then a week... well, you get the point. Some of us are just better at this than others.

So what about the rest of us?

# You could always listening to the audio Bible
Most of the time, I use the audio function on my phone to listen to the Bible while taking a morning walk. This seems to work best for me. So, the question is, how long does it take to listen to the entire Bible? 

According to scholarwithin.com, the average person can read aloud 183 words per minutes. I will assuming this is a good estimate for the person reading my preferred translation of the Bible. However, before jumping straight into the calculations, we should acknowledge a couple other factors that may interfere:
1. My reader starts each book and chapter with a short intro. This may not seem like much, but remember there are 66 books in the Bible and 1,189 chapters. This adds up. 
2. Sometimes, I want to listen to a passage again for clarification or because I've never thought about before.

We can't do anything about the second factor, but we can adjust for the first. My reader says around ten words to introduce each book and something like, "Genesis chapter one" before reading each chapter. That's ten additional words for each book and three additional words per chapter. 

In [None]:
3*1189 + 10*66

## Calculating time

In [None]:
minutes = (769817 + 4227)/183

# Listening for 30 minutes each weekday
days1 = minutes/30
weeks1 = days/5

# Listening for 15 minutes every day
days2 = minutes/15
weeks2 = days2/7

# Listening for 20 minutes each weekday
days3 = minutes/20
weeks3 = days3/5

print('You could listen to the entire Bible in', 
      round(weeks1), 'weeks if you listened for thirty minutes each weekday,', 
      round(weeks2), 'weeks if you listened for 15 minutes every single day, and',
      round(weeks3), 'weeks if you listened for 20 minutes each weekday.')

## What it takes

I usually go for a half hour walk in the mornings, so based on the above calculations it would take 28 weeks to listen to the entire Bible during my walks, assume I never miss a day and I only do this on weekdays. I could listen to the entire Bible in around 28 weeks, At this rate, if I start listening to Genesis on January 1st, I would finish Revelations halfway through July.

# Other possibilities

Here's another possibilities. We each take daily showers... ish. Well, let's just assume we do. And let's say this basic standard of hygiene takes fifteen minutes a day. If you have a shower speaker, you could listen to the entire Bible in 40 weeks. 

Perhaps, you still drive to work each morning. Let's assume an average of 20 minutes and that you work five days a week. You could have the Bible finished in 42 weeks by listening on the drive to work, or in half that time by listening both ways.

There's also time doing daily chores; washing dishes, doing laundry, cleaning the living room. These are fairly mindless tasks and with the technology we have today, you no longer have to focus all of your attention on reading the Bible. You can pop in some ear buds and work away. 

I know this may not be a simple as it sounds. Depending on where you are in life, you may have a toddler who won't allow you to properly use the bathroom without barging in or a new born who cries while you're folding clothes. There are those among us whose every free moment is spoken for, and I get that. There are also those among us who feel like listening to some old man read through the Bible is just the worst. If so, you should check out the Streetlights app. It's pretty amazing!

The point is, for most of us there are many opportunities to listen to God's word every day, without setting aside ten minutes each morning to reading without distraction. I hope reading this will inspire you to think about the possibilities. 