# Goals of this notebook

1. Named Entity Recognition of the summa
2. Build Ontology of the summa


In [1]:
import json
with open("aquinas.json", "r") as handle:
    data = json.load(handle)
    
data[:5]

[{'volume': 'Volume 1',
  'questionTitle': 'Question 1. The nature and extent of sacred doctrine',
  'question': 'Question 1.',
  'articleTitle': 'Article 1. Whether, besides philosophy, any further doctrine is required?',
  'article': 'Article 1.',
  'articleObjection': ['Objection 1. It seems that, besides philosophical science, we have no need of any further knowledge. For man should not seek to know what is above reason: "Seek not the things that are too high for thee" (Sirach 3:22). But whatever is not above reason is fully treated of in philosophical science. Therefore any other knowledge besides philosophical science is superfluous.',
   'Objection 2. Further, knowledge can be concerned only with being, for nothing can be known, save what is true; and all that is, is true. But everything that is, is treated of in philosophical science—even God Himself; so that there is a part of philosophy called theology, or the divine science, as Aristotle has proved (Metaph. vi). Therefore, b

In [125]:
def getCorpusContent(data):
    articles = []
    for chunk in data:
        text = ""
        volume = chunk["volume"]
        question = chunk["question"]
        article = chunk["article"]
        for i in chunk["articleBody"]:
            text += i
            text += "\n"
        articles.append({"volume":volume, "question":question, "article":article, "data":text})
    return articles

corpus = getCorpusContent(data)
corpus[0]

{'volume': 'Volume 1',
 'question': 'Question 1.',
 'article': 'Article 1.',
 'data': 'On the contrary, It is written (2 Timothy 3:16): "All Scripture, inspired of God is profitable to teach, to reprove, to correct, to instruct in justice." Now Scripture, inspired of God, is no part of philosophical science, which has been built up by human reason. Therefore it is useful that besides philosophical science, there should be other knowledge, i.e. inspired of God.\nI answer that, It was necessary for man\'s salvation that there should be a knowledge revealed by God besides philosophical science built up by human reason. Firstly, indeed, because man is directed to God, as to an end that surpasses the grasp of his reason: "The eye hath not seen, O God, besides Thee, what things Thou hast prepared for them that wait for Thee" (Isaiah 64:4). But the end must first be known by men who are to direct their thoughts and actions to the end. Hence it was necessary for the salvation of man that certa

In [126]:
import spacy
nlp = spacy.load("en_core_web_sm")

for i, c in enumerate(corpus):
    doc = nlp(c["data"])
    for token in doc:
        print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.is_stop)
    break


On on ADP IN prep True
the the DET DT det True
contrary contrary NOUN NN pobj False
, , PUNCT , punct False
It -PRON- PRON PRP nsubjpass True
is be AUX VBZ auxpass True
written write VERB VBN ROOT False
( ( PUNCT -LRB- punct False
2 2 NUM CD nummod False
Timothy Timothy PROPN NNP npadvmod False
3:16 3:16 NUM CD nummod False
) ) PUNCT -RRB- punct False
: : PUNCT : punct False
" " PUNCT `` punct False
All all DET DT det True
Scripture Scripture PROPN NNP nsubj False
, , PUNCT , punct False
inspired inspire VERB VBN advcl False
of of ADP IN prep True
God God PROPN NNP pobj False
is be AUX VBZ ccomp True
profitable profitable ADJ JJ acomp False
to to PART TO aux True
teach teach VERB VB xcomp False
, , PUNCT , punct False
to to PART TO aux True
reprove reprove VERB VB xcomp False
, , PUNCT , punct False
to to PART TO aux True
correct correct VERB VB xcomp False
, , PUNCT , punct False
to to PART TO aux True
instruct instruct VERB VB xcomp False
in in ADP IN prep True
justice justice NOUN N

In [144]:
from tqdm import tqdm

class Collector:
    def __self__(self):
        pass
    
    @classmethod
    def collect_logical_units(self, corpus, nlp): 
        self.logical_unit_corpus = []
        for i in tqdm(range(len(corpus)), position=0, leave=True, desc="Collecting Logical Units", ncols=100, bar_format='{l_bar}{bar}|'):
            c = corpus[i]["data"]
            doc = nlp(c)
            # Collect logical units
            logical_units = []
            for token in doc:
                # Catch subjects, objects, nouns, and proper nouns
                if "subj" in token.dep_ or "obj" in token.dep_ or "VERB" == token.pos_:
                    #"NOUN" == token.pos_ or
                    #"PROPN" == token.pos_ or
                    #"ROOT" == token.dep_ or
                    if not token.is_stop:
                        logical_units.append([token.text, (token.dep_, token.pos_)])
            self.logical_unit_corpus.append(logical_units)
        return self.logical_unit_corpus
    
    @classmethod
    def collect_logical_entities(self, corpus, nlp):
        self.logical_entity_corpus = []
        for i in tqdm(range(len(corpus)), position=0, leave=True, desc="Collecting Logical Entities", ncols=100, bar_format='{l_bar}{bar}|'):
            c = corpus[i]["data"]
            
            doc = nlp(c)

            # Collect logical entities
            logical_entities = []
            for ent in doc.ents:
                logical_entities.append([ent, ent.label_])
            self.logical_entity_corpus.append(logical_entities)
        return self.logical_entity_corpus
    
lunits = Collector().collect_logical_units(corpus, nlp)
len(lunits)
#lents = Collector().collect_logical_entities(corpus, nlp)

Collecting Logical Units: 100%|████████████████████████████████████████████████████████████████████|


3148

In [116]:
"""
Rules. 
1. subj - Subject
2. pobj - Object of the Subject
3. 'VERB' - Predicate of the Subject to the Object
"""
import copy

class Subject:
    def __init__(self, name, objects = [], predicates = []):
        #print("[Creating Subject {}]".format(name))
        self.name = name
        self.objects = objects
        self.predicates = predicates
        
    def __str__(self):
        return self.name

    def addObject(self, newObject):
        #print("[Adding Object {}]".format(newObject))
        self.objects.append(newObject)
        
    def addPredicate(self, newPredicate):
        #print("[Adding Predicate {}]".format(newPredicate))
        self.predicates.append(newPredicate)
        
    def objectCount(self):
        return len(self.objects)
    
    def predicateCount(self):
        return len(self.predicates)

In [191]:
from collections import defaultdict

class Ontology:
    def __self__(self):
        pass
    
    @classmethod
    def createSubjectUnits(self, lunits):
        
        allSubjects = []
        for i in tqdm(range(len(lunits)), position=0, leave=True, desc="Creating Subjects", ncols=100, bar_format='{l_bar}{bar}|'):
            unit = lunits[i]
            subjects = []
            currentSubject = None
            for row in unit:
                word, x = row
                dep, pos = x

                # If Subject
                if "nsubj" == dep:
                    #print("___")

                    """
                    if currentSubject:
                        
                        if (currentSubject.objectCount() < 1 or currentSubject.predicateCount() < 1):
                            
                            # Duplicate name problem
                            if currentSubject.name == word:
                                newName = currentSubject.name
                            else:
                                newName = currentSubject.name + " " + word
                            
                            subject = Subject(newName, currentSubject.objects, currentSubject.predicates)
                            del subjects[-1]
                        else:
                            subject = Subject(word)
                        
                    else:
                        subject = Subject(word)
                    """
                    subject = Subject(word)
                    currentSubject = copy.deepcopy(subject)
                    subjects.append(currentSubject)

                if currentSubject:
                    # If Predicate
                    if pos == "VERB":
                        currentSubject.addPredicate(word)
                    # If Object
                    elif "obj" in dep:
                        currentSubject.addObject(word)

                    #print(row)
            allSubjects.append(subjects)
        return allSubjects
    
    @classmethod
    def displaySubjectUnit(self, subjectUnit):
        
        for row in subjectUnit:
            print("Subject:")
            print("{}".format(row.name))
            print("\tObjects:")
            for obj in row.objects:
                print("\t\t{}".format(obj))
            print("\tPredicates:")
            for pred in row.predicates:
                print("\t\t{}".format(pred))
                
    @classmethod
    def convertSubjectsToList(self, corpus, allSubjects):
        D = []
        for i in tqdm(range(len(allSubjects)), position=0, leave=True, desc="Converting Subjects To List", ncols=100, bar_format='{l_bar}{bar}|'):
            unit = allSubjects[i]
            for row in unit:
                for obj in row.objects:
                    for pred in row.predicates:
                        D.append([corpus[i]["volume"],corpus[i]["question"], corpus[i]["article"], row.name, pred, obj])
        return D
    
    @classmethod
    def convertSubjectsToGraph(self, corpus, allSubjects):
        D = {"nodes":{}, "edges":{}}
        for i in tqdm(range(len(allSubjects)), position=0, leave=True, desc="Converting Subjects To Graph", ncols=100, bar_format='{l_bar}{bar}|'):
            unit = allSubjects[i]
            for row in unit:
                for obj in row.objects:
                    for pred in row.predicates:
                        
                        if not row.name in list(D["nodes"].keys()):
                            D["nodes"][row.name] = {"type":"subject", "weight":1}
                        else:
                            D["nodes"][row.name]["weight"] += 1
                        
                        if not obj in list(D["nodes"].keys()):
                            D["nodes"][obj] = {"type":"object", "weight":1}
                        else:
                            D["nodes"][obj]["weight"] += 1

                        if not row.name + "_" + obj in list(D["edges"].keys()):  
                            D["edges"][row.name + "_" + pred +"_" + obj] = {"weight":1, "from": row.name, "by":pred, "to":obj}
                        else:
                            D["edges"][row.name + "_" + pred +"_" + obj]["weight"] += 1
        return D
    
    @classmethod
    def convertSubjectsToJSON(self, corpus, allSubjects):
        D = {"subjects":[]}
        for i in tqdm(range(len(allSubjects)), position=0, leave=True, desc="Converting Subjects To JSON", ncols=100, bar_format='{l_bar}{bar}|'):
            unit = allSubjects[i]
            for row in unit:
                D["subjects"].append({"name":row.name, "objects":row.objects, "predicates":row.predicates})
        return D
                
    
subjectUnits = Ontology().createSubjectUnits(lunits)
len(subjectUnits)

Creating Subjects: 100%|███████████████████████████████████████████████████████████████████████████|


3148

In [192]:
Ontology().displaySubjectUnit(subjectUnits[0])

Subject:
Scripture
	Objects:
		God
		justice
	Predicates:
		instruct
		inspired
		teach
		correct
		reprove
Subject:
Scripture
	Objects:
		God
		end
		grasp
		reason
		salvation
		science
	Predicates:
		built
		revealed
		inspired
		surpasses
		answer
		directed
Subject:
hath
	Objects:
		Thee
		things
	Predicates:
		seen
Subject:
Thou
	Objects:
	Predicates:
Subject:
hast
	Objects:
		truths
		God
		thoughts
		men
		Thee
		end
		man
		reason
		salvation
		revelation
	Predicates:
		wait
		exceed
		regards
		prepared
		direct
		known
Subject:
reason
	Objects:
		revelation
	Predicates:
		discovered
		taught
Subject:
truth
	Objects:
		God
		admixture
		reason
		errors
		time
	Predicates:
		discover
		known
Subject:
salvation
	Objects:
		truths
		God
		men
		truth
		reason
		order
		revelation
		knowledge
		science
	Predicates:
		depends
		learned
		built
		brought
		taught


In [193]:
df_ls = Ontology().convertSubjectsToList(corpus, subjectUnits)
df_graph = Ontology().convertSubjectsToGraph(corpus, subjectUnits)
df_json = Ontology().convertSubjectsToJSON(corpus, subjectUnits)


Converting Subjects To List: 100%|█████████████████████████████████████████████████████████████████|
Converting Subjects To Graph: 100%|████████████████████████████████████████████████████████████████|
Converting Subjects To JSON: 100%|█████████████████████████████████████████████████████████████████|


In [190]:
df_graph

{'nodes': {'Scripture': {'type': 'subject', 'weight': 718},
  'God': {'type': 'object', 'weight': 27011},
  'justice': {'type': 'object', 'weight': 3141},
  'end': {'type': 'object', 'weight': 5193},
  'grasp': {'type': 'object', 'weight': 17},
  'reason': {'type': 'object', 'weight': 13008},
  'salvation': {'type': 'object', 'weight': 990},
  'science': {'type': 'object', 'weight': 627},
  'hath': {'type': 'subject', 'weight': 2399},
  'Thee': {'type': 'object', 'weight': 236},
  'things': {'type': 'object', 'weight': 18568},
  'hast': {'type': 'subject', 'weight': 988},
  'truths': {'type': 'object', 'weight': 91},
  'thoughts': {'type': 'object', 'weight': 262},
  'men': {'type': 'object', 'weight': 5927},
  'man': {'type': 'object', 'weight': 43940},
  'revelation': {'type': 'object', 'weight': 544},
  'truth': {'type': 'subject', 'weight': 3227},
  'admixture': {'type': 'object', 'weight': 136},
  'errors': {'type': 'object', 'weight': 96},
  'time': {'type': 'object', 'weight': 2