# Motivation

The next thing I want to do with the Biblical text is topic modeling. In order to do this I will first collapse all of the text contained in each chapter of the Bible into a single observation. I will then use the Latent Dirichlet Allocation (LDA) algorithm to analyze each chapter, so that I can uncover the various topics that are covered in Scriptures. 

# Set up

This is my typical set up. I import the packages I will use, set my project directory, remove column and row limits, and allow Jupyter to display all of the output from each cell.

In [1]:
import os
import pandas as pd
import numpy as np
import sqlite3
import spacy
from datetime import datetime

# Set project folder as directory
os.chdir(r'C:/Users/david/Projects/Bible Analytics')

# Remove row and column limits
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

# Display all output from each cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# Accessing data

In [2]:
database = 'Data/SQL database.db'

In [3]:
conn = sqlite3.connect(database)
 
df = pd.read_sql_query('SELECT * FROM t_web', conn)
 
conn.close

<function Connection.close()>

In [4]:
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31102 entries, 0 to 31101
Data columns (total 9 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     31102 non-null  object
 1   old_new  31102 non-null  object
 2   group    31102 non-null  int64 
 3   id       31102 non-null  int64 
 4   b        31102 non-null  int64 
 5   c        31102 non-null  int64 
 6   v        31102 non-null  int64 
 7   t        31102 non-null  object
 8   clean_t  31102 non-null  object
dtypes: int64(5), object(4)
memory usage: 2.1+ MB


Unnamed: 0,name,old_new,group,id,b,c,v,t,clean_t
0,Genesis,OT,1,1001001,1,1,1,"In the beginning God{After ""God,"" the Hebrew has the two letters ""Aleph Tav"" (the first and last letters of the Hebrew alphabet) as a grammatical marker.} created the heavens and the earth.",In the beginning God created the heavens and the earth.
1,Genesis,OT,1,1001002,1,1,2,Now the earth was formless and empty. Darkness was on the surface of the deep. God's Spirit was hovering over the surface of the waters.,Now the earth was formless and empty. Darkness was on the surface of the deep. God's Spirit was hovering over the surface of the waters.
2,Genesis,OT,1,1001003,1,1,3,"God said, ""Let there be light,"" and there was light.","God said, ""Let there be light,"" and there was light."
3,Genesis,OT,1,1001004,1,1,4,"God saw the light, and saw that it was good. God divided the light from the darkness.","God saw the light, and saw that it was good. God divided the light from the darkness."
4,Genesis,OT,1,1001005,1,1,5,"God called the light Day, and the darkness he called Night. There was evening and there was morning, one day.","God called the light Day, and the darkness he called Night. There was evening and there was morning, one day."


# Begin

The first thing I will do is combine all of the text in each chapter into a single observation. This is actually quite simple. I will group our data by book and chapter and apply the sum function to our clean text column of data. Then I will reset the index so that book and chapter show up as columns. Finally, I will merge this data with out key dataset that contains the books names. I could have merged with the original dataset and produced the same result, but the key is readily available and much easier to process. Finally, I will reorder the columns to suit my preferences.

In [5]:
chapter_text = pd.DataFrame(df.groupby(['b','c'])['clean_t'].sum())
chapter_text.reset_index(inplace=True)

In [6]:
chapter_text.head(2)

Unnamed: 0,b,c,clean_t
0,1,1,"In the beginning God created the heavens and the earth.Now the earth was formless and empty. Darkness was on the surface of the deep. God's Spirit was hovering over the surface of the waters.God said, ""Let there be light,"" and there was light.God saw the light, and saw that it was good. God divided the light from the darkness.God called the light Day, and the darkness he called Night. There was evening and there was morning, one day.God said, ""Let there be an expanse in the midst of the waters, and let it divide the waters from the waters.""God made the expanse, and divided the waters which were under the expanse from the waters which were above the expanse, and it was so.God called the expanse sky. There was evening and there was morning, a second day.God said, ""Let the waters under the sky be gathered together to one place, and let the dry land appear,"" and it was so.God called the dry land Earth, and the gathering together of the waters he called Seas. God saw that it was good.God said, ""Let the earth put forth grass, herbs yielding seed, and fruit trees bearing fruit after their kind, with its seed in it, on the earth,"" and it was so.The earth brought forth grass, herbs yielding seed after their kind, and trees bearing fruit, with its seed in it, after their kind: and God saw that it was good.There was evening and there was morning, a third day.God said, ""Let there be lights in the expanse of sky to divide the day from the night; and let them be for signs, and for seasons, and for days and years;and let them be for lights in the expanse of sky to give light on the earth,"" and it was so.God made the two great lights: the greater light to rule the day, and the lesser light to rule the night. He also made the stars.God set them in the expanse of sky to give light to the earth,and to rule over the day and over the night, and to divide the light from the darkness. God saw that it was good.There was evening and there was morning, a fourth day.God said, ""Let the waters swarm with swarms of living creatures, and let birds fly above the earth in the open expanse of sky.""God created the large sea creatures, and every living creature that moves, with which the waters swarmed, after their kind, and every winged bird after its kind. God saw that it was good.God blessed them, saying, ""Be fruitful, and multiply, and fill the waters in the seas, and let birds multiply on the earth.""There was evening and there was morning, a fifth day.God said, ""Let the earth bring forth living creatures after their kind, cattle, creeping things, and animals of the earth after their kind,"" and it was so.God made the animals of the earth after their kind, and the cattle after their kind, and everything that creeps on the ground after its kind. God saw that it was good.God said, ""Let us make man in our image, after our likeness: and let them have dominion over the fish of the sea, and over the birds of the sky, and over the cattle, and over all the earth, and over every creeping thing that creeps on the earth.""God created man in his own image. In God's image he created him; male and female he created them.God blessed them. God said to them, ""Be fruitful, multiply, fill the earth, and subdue it. Have dominion over the fish of the sea, over the birds of the sky, and over every living thing that moves on the earth.""God said, ""Behold, I have given you every herb yielding seed, which is on the surface of all the earth, and every tree, which bears fruit yielding seed. It will be your food.To every animal of the earth, and to every bird of the sky, and to everything that creeps on the earth, in which there is life, I have given every green herb for food."" And it was so.God saw everything that he had made, and, behold, it was very good. There was evening and there was morning, the sixth day."
1,1,2,"The heavens and the earth were finished, and all the host of them.On the seventh day God finished his work which he had made; and he rested on the seventh day from all his work which he had made.God blessed the seventh day, and made it holy, because he rested in it from all his work which he had created and made.This is the history of the generations of the heavens and of the earth when they were created, in the day that Yahweh God made earth and the heavens.No plant of the field was yet in the earth, and no herb of the field had yet sprung up; for Yahweh God had not caused it to rain on the earth. There was not a man to till the ground,but a mist went up from the earth, and watered the whole surface of the ground.Yahweh God formed man from the dust of the ground, and breathed into his nostrils the breath of life; and man became a living soul.Yahweh God planted a garden eastward, in Eden, and there he put the man whom he had formed.Out of the ground Yahweh God made every tree to grow that is pleasant to the sight, and good for food; the tree of life also in the midst of the garden, and the tree of the knowledge of good and evil.A river went out of Eden to water the garden; and from there it was parted, and became four heads.The name of the first is Pishon: this is the one which flows through the whole land of Havilah, where there is gold;and the gold of that land is good. There is aromatic resin and the onyx stone.The name of the second river is Gihon: the same river that flows through the whole land of Cush.The name of the third river is Hiddekel: this is the one which flows in front of Assyria. The fourth river is the Euphrates.Yahweh God took the man, and put him into the garden of Eden to dress it and to keep it.Yahweh God commanded the man, saying, ""Of every tree of the garden you may freely eat:but of the tree of the knowledge of good and evil, you shall not eat of it: for in the day that you eat of it you will surely die.""Yahweh God said, ""It is not good that the man should be alone; I will make him a helper suitable for him.""Out of the ground Yahweh God formed every animal of the field, and every bird of the sky, and brought them to the man to see what he would call them. Whatever the man called every living creature, that was its name.The man gave names to all cattle, and to the birds of the sky, and to every animal of the field; but for man there was not found a helper suitable for him.Yahweh God caused a deep sleep to fall on the man, and he slept; and he took one of his ribs, and closed up the flesh in its place.He made the rib, which Yahweh God had taken from the man, into a woman, and brought her to the man.The man said, ""This is now bone of my bones, and flesh of my flesh. She will be called Woman, because she was taken out of Man.""Therefore a man will leave his father and his mother, and will join with his wife, and they will be one flesh.They were both naked, the man and his wife, and were not ashamed."


In [7]:
key = pd.read_csv('Jupyter/Jupyter data/key_english.csv')

In [8]:
chapter_text = key.merge(chapter_text, how='inner', left_on='b', right_on='b')

chapter_text.info()
chapter_text.head(2)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1189 entries, 0 to 1188
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   b        1189 non-null   int64 
 1   name     1189 non-null   object
 2   old_new  1189 non-null   object
 3   group    1189 non-null   int64 
 4   c        1189 non-null   int64 
 5   clean_t  1189 non-null   object
dtypes: int64(3), object(3)
memory usage: 65.0+ KB


Unnamed: 0,b,name,old_new,group,c,clean_t
0,1,Genesis,OT,1,1,"In the beginning God created the heavens and the earth.Now the earth was formless and empty. Darkness was on the surface of the deep. God's Spirit was hovering over the surface of the waters.God said, ""Let there be light,"" and there was light.God saw the light, and saw that it was good. God divided the light from the darkness.God called the light Day, and the darkness he called Night. There was evening and there was morning, one day.God said, ""Let there be an expanse in the midst of the waters, and let it divide the waters from the waters.""God made the expanse, and divided the waters which were under the expanse from the waters which were above the expanse, and it was so.God called the expanse sky. There was evening and there was morning, a second day.God said, ""Let the waters under the sky be gathered together to one place, and let the dry land appear,"" and it was so.God called the dry land Earth, and the gathering together of the waters he called Seas. God saw that it was good.God said, ""Let the earth put forth grass, herbs yielding seed, and fruit trees bearing fruit after their kind, with its seed in it, on the earth,"" and it was so.The earth brought forth grass, herbs yielding seed after their kind, and trees bearing fruit, with its seed in it, after their kind: and God saw that it was good.There was evening and there was morning, a third day.God said, ""Let there be lights in the expanse of sky to divide the day from the night; and let them be for signs, and for seasons, and for days and years;and let them be for lights in the expanse of sky to give light on the earth,"" and it was so.God made the two great lights: the greater light to rule the day, and the lesser light to rule the night. He also made the stars.God set them in the expanse of sky to give light to the earth,and to rule over the day and over the night, and to divide the light from the darkness. God saw that it was good.There was evening and there was morning, a fourth day.God said, ""Let the waters swarm with swarms of living creatures, and let birds fly above the earth in the open expanse of sky.""God created the large sea creatures, and every living creature that moves, with which the waters swarmed, after their kind, and every winged bird after its kind. God saw that it was good.God blessed them, saying, ""Be fruitful, and multiply, and fill the waters in the seas, and let birds multiply on the earth.""There was evening and there was morning, a fifth day.God said, ""Let the earth bring forth living creatures after their kind, cattle, creeping things, and animals of the earth after their kind,"" and it was so.God made the animals of the earth after their kind, and the cattle after their kind, and everything that creeps on the ground after its kind. God saw that it was good.God said, ""Let us make man in our image, after our likeness: and let them have dominion over the fish of the sea, and over the birds of the sky, and over the cattle, and over all the earth, and over every creeping thing that creeps on the earth.""God created man in his own image. In God's image he created him; male and female he created them.God blessed them. God said to them, ""Be fruitful, multiply, fill the earth, and subdue it. Have dominion over the fish of the sea, over the birds of the sky, and over every living thing that moves on the earth.""God said, ""Behold, I have given you every herb yielding seed, which is on the surface of all the earth, and every tree, which bears fruit yielding seed. It will be your food.To every animal of the earth, and to every bird of the sky, and to everything that creeps on the earth, in which there is life, I have given every green herb for food."" And it was so.God saw everything that he had made, and, behold, it was very good. There was evening and there was morning, the sixth day."
1,1,Genesis,OT,1,2,"The heavens and the earth were finished, and all the host of them.On the seventh day God finished his work which he had made; and he rested on the seventh day from all his work which he had made.God blessed the seventh day, and made it holy, because he rested in it from all his work which he had created and made.This is the history of the generations of the heavens and of the earth when they were created, in the day that Yahweh God made earth and the heavens.No plant of the field was yet in the earth, and no herb of the field had yet sprung up; for Yahweh God had not caused it to rain on the earth. There was not a man to till the ground,but a mist went up from the earth, and watered the whole surface of the ground.Yahweh God formed man from the dust of the ground, and breathed into his nostrils the breath of life; and man became a living soul.Yahweh God planted a garden eastward, in Eden, and there he put the man whom he had formed.Out of the ground Yahweh God made every tree to grow that is pleasant to the sight, and good for food; the tree of life also in the midst of the garden, and the tree of the knowledge of good and evil.A river went out of Eden to water the garden; and from there it was parted, and became four heads.The name of the first is Pishon: this is the one which flows through the whole land of Havilah, where there is gold;and the gold of that land is good. There is aromatic resin and the onyx stone.The name of the second river is Gihon: the same river that flows through the whole land of Cush.The name of the third river is Hiddekel: this is the one which flows in front of Assyria. The fourth river is the Euphrates.Yahweh God took the man, and put him into the garden of Eden to dress it and to keep it.Yahweh God commanded the man, saying, ""Of every tree of the garden you may freely eat:but of the tree of the knowledge of good and evil, you shall not eat of it: for in the day that you eat of it you will surely die.""Yahweh God said, ""It is not good that the man should be alone; I will make him a helper suitable for him.""Out of the ground Yahweh God formed every animal of the field, and every bird of the sky, and brought them to the man to see what he would call them. Whatever the man called every living creature, that was its name.The man gave names to all cattle, and to the birds of the sky, and to every animal of the field; but for man there was not found a helper suitable for him.Yahweh God caused a deep sleep to fall on the man, and he slept; and he took one of his ribs, and closed up the flesh in its place.He made the rib, which Yahweh God had taken from the man, into a woman, and brought her to the man.The man said, ""This is now bone of my bones, and flesh of my flesh. She will be called Woman, because she was taken out of Man.""Therefore a man will leave his father and his mother, and will join with his wife, and they will be one flesh.They were both naked, the man and his wife, and were not ashamed."


In [9]:
chapter_text = chapter_text[['name', 'old_new', 'group', 'b', 'c', 'clean_t']]

In [10]:
chapter_text.head(2)

Unnamed: 0,name,old_new,group,b,c,clean_t
0,Genesis,OT,1,1,1,"In the beginning God created the heavens and the earth.Now the earth was formless and empty. Darkness was on the surface of the deep. God's Spirit was hovering over the surface of the waters.God said, ""Let there be light,"" and there was light.God saw the light, and saw that it was good. God divided the light from the darkness.God called the light Day, and the darkness he called Night. There was evening and there was morning, one day.God said, ""Let there be an expanse in the midst of the waters, and let it divide the waters from the waters.""God made the expanse, and divided the waters which were under the expanse from the waters which were above the expanse, and it was so.God called the expanse sky. There was evening and there was morning, a second day.God said, ""Let the waters under the sky be gathered together to one place, and let the dry land appear,"" and it was so.God called the dry land Earth, and the gathering together of the waters he called Seas. God saw that it was good.God said, ""Let the earth put forth grass, herbs yielding seed, and fruit trees bearing fruit after their kind, with its seed in it, on the earth,"" and it was so.The earth brought forth grass, herbs yielding seed after their kind, and trees bearing fruit, with its seed in it, after their kind: and God saw that it was good.There was evening and there was morning, a third day.God said, ""Let there be lights in the expanse of sky to divide the day from the night; and let them be for signs, and for seasons, and for days and years;and let them be for lights in the expanse of sky to give light on the earth,"" and it was so.God made the two great lights: the greater light to rule the day, and the lesser light to rule the night. He also made the stars.God set them in the expanse of sky to give light to the earth,and to rule over the day and over the night, and to divide the light from the darkness. God saw that it was good.There was evening and there was morning, a fourth day.God said, ""Let the waters swarm with swarms of living creatures, and let birds fly above the earth in the open expanse of sky.""God created the large sea creatures, and every living creature that moves, with which the waters swarmed, after their kind, and every winged bird after its kind. God saw that it was good.God blessed them, saying, ""Be fruitful, and multiply, and fill the waters in the seas, and let birds multiply on the earth.""There was evening and there was morning, a fifth day.God said, ""Let the earth bring forth living creatures after their kind, cattle, creeping things, and animals of the earth after their kind,"" and it was so.God made the animals of the earth after their kind, and the cattle after their kind, and everything that creeps on the ground after its kind. God saw that it was good.God said, ""Let us make man in our image, after our likeness: and let them have dominion over the fish of the sea, and over the birds of the sky, and over the cattle, and over all the earth, and over every creeping thing that creeps on the earth.""God created man in his own image. In God's image he created him; male and female he created them.God blessed them. God said to them, ""Be fruitful, multiply, fill the earth, and subdue it. Have dominion over the fish of the sea, over the birds of the sky, and over every living thing that moves on the earth.""God said, ""Behold, I have given you every herb yielding seed, which is on the surface of all the earth, and every tree, which bears fruit yielding seed. It will be your food.To every animal of the earth, and to every bird of the sky, and to everything that creeps on the earth, in which there is life, I have given every green herb for food."" And it was so.God saw everything that he had made, and, behold, it was very good. There was evening and there was morning, the sixth day."
1,Genesis,OT,1,1,2,"The heavens and the earth were finished, and all the host of them.On the seventh day God finished his work which he had made; and he rested on the seventh day from all his work which he had made.God blessed the seventh day, and made it holy, because he rested in it from all his work which he had created and made.This is the history of the generations of the heavens and of the earth when they were created, in the day that Yahweh God made earth and the heavens.No plant of the field was yet in the earth, and no herb of the field had yet sprung up; for Yahweh God had not caused it to rain on the earth. There was not a man to till the ground,but a mist went up from the earth, and watered the whole surface of the ground.Yahweh God formed man from the dust of the ground, and breathed into his nostrils the breath of life; and man became a living soul.Yahweh God planted a garden eastward, in Eden, and there he put the man whom he had formed.Out of the ground Yahweh God made every tree to grow that is pleasant to the sight, and good for food; the tree of life also in the midst of the garden, and the tree of the knowledge of good and evil.A river went out of Eden to water the garden; and from there it was parted, and became four heads.The name of the first is Pishon: this is the one which flows through the whole land of Havilah, where there is gold;and the gold of that land is good. There is aromatic resin and the onyx stone.The name of the second river is Gihon: the same river that flows through the whole land of Cush.The name of the third river is Hiddekel: this is the one which flows in front of Assyria. The fourth river is the Euphrates.Yahweh God took the man, and put him into the garden of Eden to dress it and to keep it.Yahweh God commanded the man, saying, ""Of every tree of the garden you may freely eat:but of the tree of the knowledge of good and evil, you shall not eat of it: for in the day that you eat of it you will surely die.""Yahweh God said, ""It is not good that the man should be alone; I will make him a helper suitable for him.""Out of the ground Yahweh God formed every animal of the field, and every bird of the sky, and brought them to the man to see what he would call them. Whatever the man called every living creature, that was its name.The man gave names to all cattle, and to the birds of the sky, and to every animal of the field; but for man there was not found a helper suitable for him.Yahweh God caused a deep sleep to fall on the man, and he slept; and he took one of his ribs, and closed up the flesh in its place.He made the rib, which Yahweh God had taken from the man, into a woman, and brought her to the man.The man said, ""This is now bone of my bones, and flesh of my flesh. She will be called Woman, because she was taken out of Man.""Therefore a man will leave his father and his mother, and will join with his wife, and they will be one flesh.They were both naked, the man and his wife, and were not ashamed."


# Pushing chapter data to SQL database

Before moving forward, I want to save this dataset to our SQL dataset.

In [11]:
conn = sqlite3.connect(database)

chapter_text.to_sql('chapter_text', conn, if_exists='replace', index=False)

conn.close()

1189

In [12]:
# *table* means double quotes around table
 
conn = sqlite3.connect(database)
cursor = conn.cursor()
 
cursor.execute('SELECT name FROM sqlite_master WHERE type="table"')
 
for i in cursor.fetchall():
    print(i[0])
    
conn.close()

<sqlite3.Cursor at 0x2863231f6c0>

t_web
people_names
gpe_name
chapter_text


# Compiling stopwords from NLP packages

In [13]:
from nltk.corpus import stopwords
nltk_stopwords = stopwords.words('english')

from gensim.parsing.preprocessing import STOPWORDS
gen_stopwords = list(STOPWORDS)

nlp = spacy.load('en_core_web_lg')
spacy_stopwords = list(nlp.Defaults.stop_words)

print('There are', len(nltk_stopwords), 'stopwords in nltk,', len(gen_stopwords), 'in gensim, and', len(spacy_stopwords), 'in spacy.')

There are 179 stopwords in nltk, 337 in gensim, and 326 in spacy.


There are 179 default stopwords in nltk, 337 in gensim, and 326 in spacy. These lists are all slightly different from each other, so which should we use? Why not all of them? I am going to review the differences in these list and compile a single list of stopwords taken for all three.

First, I will look at the universal stopwords.

In [14]:
universal_stopwords = [i for i in nltk_stopwords if i in gen_stopwords and i in spacy_stopwords]
len(universal_stopwords)

123

There 123 stopwords are universal t0 all three NLP packages. Next, I will review nltk's stopwords that are not universal to all three NLP packages.

In [15]:
[i for i in nltk_stopwords if i not in universal_stopwords]

["you're",
 "you've",
 "you'll",
 "you'd",
 "she's",
 "it's",
 'theirs',
 "that'll",
 'having',
 's',
 't',
 'don',
 "don't",
 "should've",
 'd',
 'll',
 'm',
 'o',
 've',
 'y',
 'ain',
 'aren',
 "aren't",
 'couldn',
 "couldn't",
 'didn',
 "didn't",
 'doesn',
 "doesn't",
 'hadn',
 "hadn't",
 'hasn',
 "hasn't",
 'haven',
 "haven't",
 'isn',
 "isn't",
 'ma',
 'mightn',
 "mightn't",
 'mustn',
 "mustn't",
 'needn',
 "needn't",
 'shan',
 "shan't",
 'shouldn',
 "shouldn't",
 'wasn',
 "wasn't",
 'weren',
 "weren't",
 'won',
 "won't",
 'wouldn',
 "wouldn't"]

These make sense, so I will keep all of nltk's stop words. Now, I want to look at the stopwords that are universal to gensim and spacy but not to nltk.

In [16]:
gen_spacy = [i for i in gen_stopwords if i in spacy_stopwords and i not in nltk_stopwords]
gen_spacy

['seem',
 'another',
 'must',
 'fifty',
 'back',
 'could',
 'empty',
 'front',
 'nowhere',
 'ten',
 'make',
 'whose',
 'hundred',
 'hence',
 'show',
 'throughout',
 'towards',
 'nobody',
 'around',
 'sixty',
 'behind',
 'thereupon',
 'well',
 'done',
 'however',
 'together',
 'much',
 'due',
 'everywhere',
 'whereafter',
 'anything',
 'wherever',
 'enough',
 'thru',
 'third',
 'thereby',
 'therefore',
 'full',
 'amongst',
 'moreover',
 'elsewhere',
 'whole',
 'sometimes',
 'thereafter',
 'give',
 'formerly',
 'along',
 'three',
 'everything',
 'onto',
 'else',
 'perhaps',
 'move',
 'less',
 'herein',
 'last',
 'sometime',
 'eleven',
 'serious',
 'hereby',
 'whatever',
 'ever',
 'say',
 'every',
 'something',
 'either',
 'beside',
 'first',
 'take',
 'would',
 'quite',
 'always',
 'please',
 'seeming',
 'others',
 'various',
 'many',
 'fifteen',
 'thus',
 'unless',
 'whenever',
 'may',
 'whoever',
 'twenty',
 'two',
 'beforehand',
 'somehow',
 'call',
 'really',
 'never',
 'amount',
 'a

These look good, too. Now, I am going to look at any words that are unique to gensim or spacy.

In [17]:
[i for i in gen_stopwords if i not in gen_spacy + nltk_stopwords]

['cry',
 'thin',
 'ltd',
 'couldnt',
 'kg',
 'sincere',
 'thick',
 'fire',
 'inc',
 'co',
 'con',
 'cant',
 'mill',
 'hasnt',
 'find',
 'ie',
 'eg',
 'interest',
 'computer',
 'found',
 'detail',
 'etc',
 'km',
 'describe',
 'system',
 'de',
 'bill',
 'amoungst',
 'un',
 'fill']

In [18]:
[i for i in spacy_stopwords if i not in gen_spacy + nltk_stopwords]

['’ll',
 '’ve',
 "'re",
 '’re',
 "'m",
 "'ve",
 "'ll",
 '‘d',
 "n't",
 "'s",
 '’d',
 'n’t',
 '‘ll',
 "'d",
 '’m',
 '‘re',
 '’s',
 'n‘t',
 '‘s',
 'ca',
 '‘ve',
 '‘m']

I like all these, so I'm going to keep all of the stopwords from each of the NLP packages.

In [19]:
stopwords = list(set(nltk_stopwords+gen_stopwords+spacy_stopwords))
len(stopwords)

sorted(stopwords)

412

["'d",
 "'ll",
 "'m",
 "'re",
 "'s",
 "'ve",
 'a',
 'about',
 'above',
 'across',
 'after',
 'afterwards',
 'again',
 'against',
 'ain',
 'all',
 'almost',
 'alone',
 'along',
 'already',
 'also',
 'although',
 'always',
 'am',
 'among',
 'amongst',
 'amoungst',
 'amount',
 'an',
 'and',
 'another',
 'any',
 'anyhow',
 'anyone',
 'anything',
 'anyway',
 'anywhere',
 'are',
 'aren',
 "aren't",
 'around',
 'as',
 'at',
 'back',
 'be',
 'became',
 'because',
 'become',
 'becomes',
 'becoming',
 'been',
 'before',
 'beforehand',
 'behind',
 'being',
 'below',
 'beside',
 'besides',
 'between',
 'beyond',
 'bill',
 'both',
 'bottom',
 'but',
 'by',
 'ca',
 'call',
 'can',
 'cannot',
 'cant',
 'co',
 'computer',
 'con',
 'could',
 'couldn',
 "couldn't",
 'couldnt',
 'cry',
 'd',
 'de',
 'describe',
 'detail',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'done',
 'down',
 'due',
 'during',
 'each',
 'eg',
 'eight',
 'either',
 'eleven',
 'else

Additional stop words after initial run.

In [20]:
stopwords = set(stopwords+['shall', 'say', 'yahweh', 'god', 'let', 'come', 'take', 'go', 'give', 'know', 'like'])

In [21]:
stopwords

{"'d",
 "'ll",
 "'m",
 "'re",
 "'s",
 "'ve",
 'a',
 'about',
 'above',
 'across',
 'after',
 'afterwards',
 'again',
 'against',
 'ain',
 'all',
 'almost',
 'alone',
 'along',
 'already',
 'also',
 'although',
 'always',
 'am',
 'among',
 'amongst',
 'amoungst',
 'amount',
 'an',
 'and',
 'another',
 'any',
 'anyhow',
 'anyone',
 'anything',
 'anyway',
 'anywhere',
 'are',
 'aren',
 "aren't",
 'around',
 'as',
 'at',
 'back',
 'be',
 'became',
 'because',
 'become',
 'becomes',
 'becoming',
 'been',
 'before',
 'beforehand',
 'behind',
 'being',
 'below',
 'beside',
 'besides',
 'between',
 'beyond',
 'bill',
 'both',
 'bottom',
 'but',
 'by',
 'ca',
 'call',
 'can',
 'cannot',
 'cant',
 'co',
 'come',
 'computer',
 'con',
 'could',
 'couldn',
 "couldn't",
 'couldnt',
 'cry',
 'd',
 'de',
 'describe',
 'detail',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'done',
 'down',
 'due',
 'during',
 'each',
 'eg',
 'eight',
 'either',
 'eleven

# Processing the text

We now have 412 stopwords that we will remove from our text. I will also lemmatize the data after removing stopwords. Lemmatizing the text will revert each word back to its root. For instance, "run" will be changed to "ran." By doing this, we ensure run and ran are treated as the same word.

I will begin this by defining an nlp object and loading the large, English language pipeline. I will then define an empty list that we will turn into the corpus that we will then use for topic modeling. There's some stuff about timing. Then I will use a FOR loop to iterate through the text for each chapter in the Bible. The first thing I'll do for each chapter is define an empty list called temp. Then, I will start a TRY block to handle any exceptions that may come up. Within this TRY block, I will create a document for each chapter by applying our nlp object to the chapter text. I will next created a nested FOR loop that will iterate through each word in each chapter. Within this nested FOR loop, I will use conditional statements that will remove stopwords and punctuations.

In [22]:
chapter_text_sub = chapter_text.head(1)

In [23]:
nlp = spacy.load("en_core_web_lg")

corpus_list = []

# Ignore this
start = datetime.now()
# Stop ignoring

for index, row in chapter_text.iterrows():
    
    temp=[]
    
    try:
        
        doc = nlp(row['clean_t'])        
    
        for token in doc:
            
            # Removing stopwords
            if token.lemma_.lower() not in stopwords:
                if not token.is_punct:
                    
                    temp+=[token.lemma_.lower()]
                
        corpus_list.append(temp)
                
    except:
        
        print('Check out this chapter:')
        print(row['name'], row['c'])        
        print()

# Ignore this
stop = datetime.now()

print('This process took', stop-start)
print()


This process took 0:01:55.549252



In [24]:
corpus_list[0]

['beginning',
 'create',
 'heaven',
 'earth',
 'earth',
 'formless',
 'darkness',
 'surface',
 'deep',
 'spirit',
 'hover',
 'surface',
 'water',
 'light',
 'light',
 'light',
 'good',
 'divide',
 'light',
 'darkness',
 'light',
 'day',
 'darkness',
 'night',
 'evening',
 'morning',
 'day',
 'expanse',
 'midst',
 'water',
 'divide',
 'water',
 'water',
 '"god',
 'expanse',
 'divide',
 'water',
 'expanse',
 'water',
 'expanse',
 'expanse',
 'sky',
 'evening',
 'morning',
 'second',
 'day',
 'water',
 'sky',
 'gather',
 'place',
 'dry',
 'land',
 'appear',
 'dry',
 'land',
 'earth',
 'gathering',
 'water',
 'seas',
 'good',
 'earth',
 'forth',
 'grass',
 'herb',
 'yield',
 'seed',
 'fruit',
 'tree',
 'bear',
 'fruit',
 'kind',
 'seed',
 'earth',
 'earth',
 'bring',
 'forth',
 'grass',
 'herb',
 'yield',
 'seed',
 'kind',
 'tree',
 'bear',
 'fruit',
 'seed',
 'kind',
 'good',
 'evening',
 'morning',
 'day',
 'light',
 'expanse',
 'sky',
 'divide',
 'day',
 'night',
 'sign',
 'season',
 'd

This code took just over two minutes and looks good.

# Converting corpus_list into a corpus

Before I can get started I need to convert my list of lists into an actual corpus.

In [25]:
import gensim.corpora as corpora

id2word = corpora.Dictionary(corpus_list)

# Term Document Frequency
chapter_corpus = [id2word.doc2bow(text) for text in corpus_list]

# Topic modeling

The Bible covers a lot of subjects so I will set the number of topics to 100. We'll evaluate this later and either increase or decrease it. Then I'm going to run our LDA on the chapter_corpus

In [36]:
import gensim
from pprint import pprint
# number of topics
num_topics = 5
# Build LDA model
lda_model = gensim.models.LdaMulticore(corpus=chapter_corpus,
                                       id2word=id2word,
                                       num_topics=num_topics)

In [37]:
import pyLDAvis
import pyLDAvis.gensim_models

# Visualize the topics
pyLDAvis.enable_notebook()

LDAvis_prepared = pyLDAvis.gensim_models.prepare(lda_model, chapter_corpus, id2word)

LDAvis_prepared

  by='saliency', ascending=False).head(R).drop('saliency', 1)
