# Master Bible Data Prep

This notebook is used to create the master_bible dataset. Data is first put together as a pandas Dataframe, then converted into various formats.

You need the following python packages:

In [1]:
import os
import pandas as pd
import regex as re
import csv
import sqlite3

If your working directory is the top-level directory of the github repository, then this code will fetch the filenames for the csvs stored in the `bible_csvs` folder

In [2]:
bible_dir = "bible_csvs/"
bible_csvs = os.listdir(bible_dir)
bible_csvs

['AMP.csv',
 'NKJV.csv',
 'KJV.csv',
 'NASB.csv',
 'NIV.csv',
 'ESV.csv',
 'KSGM.tsv']

The CSV files were downloaded from [here](http://my-bible-study.appspot.com/).

The `book_index.txt` file contains more information about the chapters than what is provided in the CSV files we downloaded. We import it and join the information when we import.

In [3]:
bible_index = pd.read_csv('metadata/book_index.txt',names=['book','osisID','title','total_chapters','testament'],skiprows=1)

This function is used to import the files. Note that the `if/else` statement is used to take care of the `.tsv` file in the directory.

In [4]:
def import_bible(filename):
    if filename == 'KSGM.tsv':
        dat = pd.read_csv(bible_dir+filename,
                          sep='\t',
                          names=['book','chapter','verse','text'],
                          header=None)
        dat['version'] = re.sub('\\.tsv','', filename)
    else:
        dat = pd.read_csv( bible_dir + filename,
                          lineterminator='\n',
                          header=None, 
                          names=['book','chapter','verse','text'], 
                          escapechar='\\')
        dat['version'] = re.sub('\\.csv','', filename)
    
    dat = bible_index[['testament','book','title']].merge(dat,on=['book'])
    dat = dat
    
    return dat 

Here, we generate the master bible dataset.

In [5]:
master_bible = pd.concat(pd.Series(bible_csvs).apply(import_bible).tolist(),axis=0).reset_index(drop=True).astype({'book':'int64','title':'string','chapter':'int64','verse':'int64','text':'string','version':'string'})

In [6]:
master_bible

Unnamed: 0,testament,book,title,chapter,verse,text,version
0,OT,1,Genesis,1,1,IN THE beginning God (prepared formed fashione...,AMP
1,OT,1,Genesis,1,2,The earth was without form and an empty waste ...,AMP
2,OT,1,Genesis,1,3,And God said Let there be light; and there was...,AMP
3,OT,1,Genesis,1,4,And God saw that the light was good (suitable ...,AMP
4,OT,1,Genesis,1,5,And God called the light Day and the darkness ...,AMP
...,...,...,...,...,...,...,...
186712,NT,777,Gospel of Mary Magdalene,4,105,and take in the purest human form that we acqu...,KSGM
186713,NT,777,Gospel of Mary Magdalene,4,106,like He commanded us.,KSGM
186714,NT,777,Gospel of Mary Magdalene,4,107,We are to preach the gospel and not lay down a...,KSGM
186715,NT,777,Gospel of Mary Magdalene,4,108,"After Levi had said these things, they left an...",KSGM


In [7]:
master_bible.dtypes

testament    object
book          int64
title        string
chapter       int64
verse         int64
text         string
version      string
dtype: object

Now we save as various formats.

In [8]:
output_dir = 'MASTER_BIBLE/'
master_bible.to_pickle(output_dir + 'master_bible.pkl')

In [9]:
master_bible.to_csv(output_dir + 'master_bible.csv',index=False,quoting=csv.QUOTE_NONNUMERIC)

In [20]:
conn = sqlite3.connect(output_dir + 'master_bible.db')

In [19]:
master_bible.to_sql('master_bible',con=conn,if_exists='replace')
tbls = pd.Series(bible_csvs).apply(lambda x: re.sub('\\..*','',x)).tolist()

for i in tbls:
    conn.execute(f"""
    CREATE VIEW {i} AS
    select * from master_bible
    where version = '{i}'
    """)
    
conn.close()