## Identify the Most Commonly Used Key Phrases 
After key phrases are identified in a book of the Bible using *bible_key_phrase_extractor.ipynb*, this file analyzes the key phrases outputs. 

I will now scan the entire list of key phrases to find what is most commonly used across the entire document. 

In [1]:
import json

import ts_functions as ts

import importlib
importlib.reload(ts)


<module 'ts_functions' from 'c:\\Users\\14154\\OneDrive\\Python\\_PROJECTS_WO_CLIENTS\\azure-language-ai\\ts_functions.py'>

In [55]:

# Define a function to count the number of key phrases in a given text.
# It reads one of the JSON files and then outputs a dictionary whose keys are the key phrases and whose values are the number of times that key phrase appears in the text.
# Drop, is > -1, means it just takes the most common phrases.  For example, if drop = 10, it will only return the 10 most common phrases.
def count_key_phrases(filename, drop = -1):
    
    # Read the JSON file using the JSON library at datasets/key_phrases_results/filename and store it in kp_dict
    with open('datasets/key_phrases_results/' + filename) as f:
        kp_dict = json.load(f)

    
    # First, combine all the values from key_phrases_output_dict into a single list.
    # This is a list of lists.
    # Then flatten this.
    list_of_lists = list(kp_dict.values())
    list_of_lists = [item for sublist in list_of_lists for item in sublist]

    # Now count the total number of time each phrase appears in list_of_lists
    key_phrase_counts = {i:list_of_lists.count(i) for i in list_of_lists}

    # Then sort numerically from highest to lowest by the values.
    key_phrase_counts = {k: v for k, v in sorted(key_phrase_counts.items(), key=lambda item: item[1], reverse = True)}

    

    if drop >-1:
        # Select only the first drop elements of the dictionary
        key_phrase_counts = dict(list(key_phrase_counts.items())[0:drop])

    return key_phrase_counts


# This functions compares two dictionaries (or lists) of key phrases and returns a list of the shared key phrases.
# The input to this function must be int he format of the output of count_key_phrases.
def shared_words(phrases_dict1,phrases_dict2):
    
    # If lists are passed, there's no change..
    if type(phrases_dict1) == list:
        list1 = phrases_dict1
    if type(phrases_dict2) == list:
        list2 = phrases_dict2
        
    # If dictionaries are paast, extract keys.
    if type(phrases_dict1) == dict:
        list1 = list(phrases_dict1.keys())
    if type(phrases_dict2) == dict:
        list2 = list(phrases_dict2.keys())

    shared_words = []
    for key_phrase in list1:
        if key_phrase in list2:
            shared_words.append(key_phrase) 

    return shared_words

In [56]:
Isaiah_key_phrase_count = count_key_phrases('OT-27_Isaiah_key_phrases.json',30)
John_key_phrase_count = count_key_phrases('NT-04_John_key_phrases.json',30)
Matthew_key_phrase_count = count_key_phrases('NT-01_Matthewkey_phrases.json',30)
Lk_key_phrase_count = count_key_phrases('NT-03_Lukekey_phrases.json',30)
Mk_key_phrase_count = count_key_phrases('NT-02_Mark.key_phrases.json',30)
print(f"There were {len(Isaiah_key_phrase_count)} key phrases found in Isiah: {Isaiah_key_phrase_count}")
print(f"There were {len(John_key_phrase_count)} key phrases found in John: {John_key_phrase_count}")
print(f"There were {len(Matthew_key_phrase_count)} key phrases found in Matthew: {Matthew_key_phrase_count}")
print(f"There were {len(Lk_key_phrase_count)} key phrases found in Matthew: {Lk_key_phrase_count}")
print(f"There were {len(Mk_key_phrase_count)} key phrases found in Matthew: {Mk_key_phrase_count}")

There were 30 key phrases found in Isiah: {'earth': 33, 'Israel': 31, 'Lord': 30, 'people': 29, 'hosts': 28, 'glory': 28, 'God': 27, 'name': 27, 'face': 26, 'hand': 25, 'house': 23, 'heart': 23, 'Zion': 22, 'land': 21, 'eyes': 21, 'nations': 21, 'day': 21, 'Jerusalem': 20, 'judgment': 20, 'strength': 20, 'justice': 20, 'place': 19, 'mouth': 18, 'voice': 18, 'waters': 18, 'Judah': 17, 'sea': 17, 'iniquity': 16, 'The Lord': 16, 'praise': 16}
There were 30 key phrases found in John: {'Jesus': 16, 'things': 15, 'world': 15, 'God': 14, 'Jews': 13, 'Lord': 12, 'disciples': 12, 'testimony': 11, 'Father': 11, 'name': 10, 'truth': 10, 'one': 9, 'place': 9, 'word': 9, 'glory': 8, 'Pharisees': 8, 'hour': 8, 'judgment': 8, 'Son': 8, 'man': 7, 'law': 7, 'Jerusalem': 7, 'Galilee': 7, 'will': 6, 'Moses': 6, 'Passover': 6, 'eternal life': 6, 'Amen': 6, 'works': 6, 'Christ': 6}
There were 30 key phrases found in Matthew: {'Lord': 23, 'Jesus': 21, 'kingdom': 17, 'heaven': 16, 'God': 15, 'disciples': 14,

In [59]:

shared_gospels =  shared_words(Matthew_key_phrase_count,Mk_key_phrase_count)
shared_gospels =  shared_words(shared_gospels,Lk_key_phrase_count)
shared_gospels =  shared_words(shared_gospels,John_key_phrase_count)
print(shared_gospels)

['Lord', 'Jesus', 'God', 'disciples', 'things', 'name', 'Pharisees', 'Galilee', 'word', 'Jerusalem']
