# Motivation

I would like to prep the data for use and store it in an SQL database that is easily accessed.

# Set up

This is my typical set up. I import the packages I will use, set my project directory, remove column and row limits, and allow Jupyter to display all of the output from each cell.

In [None]:
import os
import pandas as pd
import numpy as np
import sqlite3
import re

# Set project folder as directory
os.chdir(r'C:/Users/david/Projects/Bible Analytics')

# Remove row and column limits
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

# Display all output from each cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# Pulling the data

I only have a few translations of the Bible to choose from, but I think the text from the World English Bible translation will work nicely as my data source. I am going to do text analytics, so using every day English will make this easier. I downloaded the text from Kaggle. You can find the text I used here: https://www.kaggle.com/oswinrh/bible#t_asv.csv

In [None]:
df = pd.read_csv('Translations/World English Bible/t_web.csv')

In [None]:
df.info()
df.head()

The World English Bible contains extra text as definitions, which can be helpful, but I am only interested in the actual text. The definitions are contained within curly brackets, so I will use a regular expression to handle these. However, I do not want to throw away any part of the original text, so I will create another column to contain the cleaned text.

I also discovered that this translation collapses some verses, such as Romans 14:23-25, into a single observation and then marks the collapsed verses with chapter and verse like this (14:24). I would prefer this not be be the case, but all of the text appears to be present. However, I want to get rid of the markers since they are not part of the Biblical text. I will use regular expressions to handle these cases, as well.

Finally, when I remove the chapter and verse markers I am left with additional whitespace. I will remove this as well.

To do this I will create a function called *clean* and apply it to each row of text.

In [None]:
df[(df['b']==45) & (df['c']==14) & (df['v']==23)]

In [None]:
def clean(my_str):    
    
    clean = re.sub('{[^>]+}', '', my_str)
    clean = re.sub('\(+[\d+]+[:]+[\d+]+\)', '', clean)
    clean = re.sub('  ', ' ', clean)
    
    return clean

In [None]:
df[(df['b']==1) & (df['c']==1) & (df['v']==1)]['t']

In [None]:
df[(df['b']==1) & (df['c']==1) & (df['v']==1)]['t'].apply(clean)

In [None]:
df[(df['b']==45) & (df['c']==14) & (df['v']==23)]['t']

In [None]:
df[(df['b']==45) & (df['c']==14) & (df['v']==23)]['t'].apply(clean)

In [None]:
df['clean_t'] = df['t'].apply(clean)

In [None]:
df.info()
df.head()

# MERGING WITH BOOK NAMES

Next, I want to add the actual book names. Currently, each line of text is marked with a book number, but this is not very helpful as a reference. Fortunately, I have another dataset called "key_english" which is a reference dataset. It contains both the book names and the book numbers as well as some interesting information about which part of the Bible each book comes from, e.g. Old Testament, New Testament. I will import this data and merge it with the text data.

In [None]:
key = pd.read_csv('Jupyter/Jupyter data/key_english.csv')

In [None]:
df = key.merge(df, how='inner', left_on='b', right_on='b')

df.info()
df.head()

I don't like the column order becasue "b" is separate from 'c' and 'v'. I'll change that in the next line of code.

In [None]:
df.columns

In [None]:
df = df[['name', 'old_new', 'group', 'id', 'b', 'c', 'v', 't', 'clean_t']]

In [None]:
df.head()

# CREATING SQL DATABASE

I'll create a SQL database for this project. This database will contain all of the data I pull or produce.

In [None]:
database = 'Data/SQL database.db'

In [None]:
conn = sqlite3.connect(database) 
print(sqlite3.version)
conn.close()

# PUSHING BIBLE DATAFRAME TO SQL DATABASE

In [None]:
conn = sqlite3.connect(database)
df.to_sql('t_web', conn, if_exists='replace', index=False)
conn.close()

# VIEWING TABLES IN SQL DATABASE

In [None]:
conn = sqlite3.connect(database)
cursor = conn.cursor()

cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")

for i in cursor.fetchall():
    print(i[0])
    
conn.close()

# VIEW COLUMN NAMES IN t_web

In [None]:
conn = sqlite3.connect(database)
cursor = conn.cursor()

cursor.execute("SELECT * FROM t_web")

for i in list(cursor.description):
    
    print(i[0])
    
conn.close()

# VIEW FIRST TEN ROWS OF t_web

In [None]:
conn = sqlite3.connect(database)
cursor = conn.cursor()

print(pd.read_sql_query("SELECT * FROM t_web LIMIT 10", conn))

conn.close

# WRAP UP

That's it. I've pulled and cleaned the text for the World English Bible, created a SQL database using SQLite3, and pushed the data to this database. I've also viewed the data in the database to confirm it was stored as expected. I can now use this data for additional analyses.