# NLP - Project 2
## Rinehart Analysis with Word Vectors
**Team**: *Jean Merlet, Konstantinos Georgiou, Matt Lane*

## Where to put the code
- Place the preprocessing functions/classes in [nlp_libs/books/preprocessing.py](https://github.com/NLPaladins/rinehartAnalysis_wordVectors/nlp_libs/books/preprocessing.py)
- The custom word embeddings functions/classes (task 1) in [nlp_libs/books/word_embeddings.py](https://github.com/NLPaladins/rinehartAnalysis_wordVectors/nlp_libs/books/word_embeddings.py) (separate class)
- The pretrained word embeddings functions/classes (task 2) in [nlp_libs/books/word_embeddings.py](https://github.com/NLPaladins/rinehartAnalysis_wordVectors/nlp_libs/books/word_embeddings.py) (separate class)
- The functions/classes (if any) that compare the results (tasks 3, 4, 5) in [nlp_libs/books/compare_statistics.py](https://github.com/NLPaladins/rinehartAnalysis_wordVectors/nlp_libs/books/compare_statistics.py)
- Any plotting related functions in [nlp_libs/books/plotter.py](https://github.com/NLPaladins/rinehartAnalysis_wordVectors/nlp_libs/books/plotter.py)

**The code is reloaded automatically. Any class object needs to reinitialized though.** 

## Config file
The yml/config file is located at: [confs/proj_2.yml](https://github.com/NLPaladins/rinehartAnalysis_wordVectors/confs/proj_2.yml)<br>
To load it run:
```python
config_path='confs/proj_2.yml'
conf = Configuration(config_src=config_path)
# Get the books dictionary
books = conf.get_config('data_loader')['config']['books'] # type = Dict
print(books.keys())
print(books['The_Bat'])
```
To reload the config just run the 2nd and 3rd command.

## Libraries Overview:
All the libraries are located under *"\<project root>/nlp_libs"*
- nlp_libs/**books**: This project's code (imported later)
- nlp_libs/**configuration**: Class that creates config objects from yml files
- nlp_libs/**fancy_logger**: Logger that can be used instead of prints for text formatting (color, bold, underline etc)

## Project 1 Code
If you need to import anything from Project 1 just run:
```python
import proj1_nlp_libs.books.processed_book as proc
import proj1_nlp_libs.books.book_extractor as extr
import proj1_nlp_libs.books.plotter as pl
```

## For more info check out:
- the **[Project Board](https://github.com/NLPaladins/rinehartAnalysis_wordVectors/projects/1)**
- the **[README](https://github.com/NLPaladins/https://github.com/NLPaladins/rinehartAnalysis_wordVectors/blob/main/README.md)**
- and the **[Current Issues](https://github.com/NLPaladins/rinehartAnalysis_wordVectors/issues)**

# ------------------------------------------------------------------

## On Google Collab?
- **If yes, run the two cells and press the two buttons below:**
- Otherwise go to "***Import the base Libraries***"

In [1]:
# Import Jupyter Widgets
import os
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
from IPython.display import display
# Clone the repository if you're in Google Collab
def clone_project(is_collab: bool = False):
    print("Cloning Project..")
    !git clone https://github.com/NLPaladins/rinehartAnalysis_wordVectors.git
    print("Project cloned.")
       
print("Clone project?")
print("(If you do this you will ovewrite local changes on other files e.g. configs)")
print("Not needed if you're not on Google Collab")
btn = widgets.Button(description="Yes, clone")
btn.on_click(clone_project)
display(btn)

Clone project?
(If you do this you will ovewrite local changes on other files e.g. configs)
Not needed if you're not on Google Collab


Button(description='Yes, clone', style=ButtonStyle())

In [2]:
# Clone the repository if you're in Google Collab
def change_dir(is_collab: bool = False):
    try:
        print("Changing dir..")
        os.chdir('/content/rinehartAnalysis')
        print('done')
        print("Current dir:")
        print(os.getcwd())
        print("Dir Contents:")
        print(os.listdir())
        print("\nInstalling Requirements")
        !pip install -r requirements.txt
    except Exception:
        print("Error: Project not cloned")
       
print("Are you on Google Collab?")
btn = widgets.Button(description="Yes")
btn.on_click(change_dir)
display(btn)

Are you on Google Collab?


Button(description='Yes', style=ButtonStyle())

### To commit and push Google Collab notebook to Github
Click **File > Save a copy on Gihtub**

# ------------------------------------------------------------------

# Initializations

## Import the base Libraries

In [3]:
# Imports
%load_ext autoreload
%autoreload 2
from importlib import reload as reload_lib
from typing import *
import os
import re
from pprint import pprint
# Numpy
import numpy as np

# Import preprocessing lib
from nlp_libs.books import *

## Load the YML file

In [4]:
from nlp_libs import Configuration

In [5]:
# The path of configuration and log save path
config_path = "confs/proj_2.yml"
# !cat "$config_path"
# Load the configuration
conf = Configuration(config_src=config_path)
# Get the books dict
books_conf = conf.get_config('data_loader')['config']['books']
# print(books.keys())
# pprint(books)  # Pretty print the books dict

2021-11-01 00:51:08 Config       INFO     [1m[37mConfiguration file loaded successfully from path: /Users/gkos/Insync/delfinas7kostas@gmail.com/Google Drive/Projects/UTK/NLP-Project2/Code/confs/proj_2.yml[0m
2021-11-01 00:51:08 Config       INFO     [1m[37mConfiguration Tag: proj2[0m


## Setup Logger and Example

In [6]:
log_path = "logs/proj_2.log"
# Load and setup logger
logger = ColorizedLogger(logger_name='Notebook', color='cyan')
ColorizedLogger.setup_logger(log_path=log_path, debug=False, clear_log=True)
# Examples
logger.info("Logger Examples:")
logger.nl(num_lines=1) # New lines
logger.warn("Logger Warning underlined", attrs=['underline']) 
# Atrs:  bold, dark, underline, blink, reverse, concealed
logger.error("Logger Error in red&yellow", color="yellow", on_color="on_red")
# Colors: on_grey, on_red, on_green, on_yellow, on_blue, on_magenta, on_cyan, on_white

2021-11-01 00:51:08 FancyLogger  INFO     [1m[37mLogger is set. Log file path: /Users/gkos/Insync/delfinas7kostas@gmail.com/Google Drive/Projects/UTK/NLP-Project2/Code/logs/proj_2.log[0m
2021-11-01 00:51:08 Notebook     INFO     [1m[36mLogger Examples:[0m

2021-11-01 00:51:08 Notebook     ERROR    [1m[41m[33mLogger Error in red&yellow[0m


# ------------------------------------------------------------------

# Start of Project Code

In [7]:
from nlp_libs import books as books_lib

## Preprocessing

# The Circular Staircase

In [8]:
# Load conf
book_meta = books_conf['The_Circular_Staircase']
book = ProcessedBook(book_meta)

In [9]:
# Lemmatize sentences
protagonist_subs = list(book_meta['protagonists'][0].values())[0]
substitution = (protagonist_subs, 'protagonist')
sentences_substituted = book.lemmatize_by_sentence(word_subs=substitution)
sentences = book.lemmatize_by_sentence()

In [10]:
# Generate word combinations
protagonists_antagonists = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['protagonists'], 
                                            get_all_sub_values_1=True,
                                            keys_2=['antagonists'],
                                            get_all_sub_values_2=True,
                                            ignore_words_with_spaces=True)
antagonists_crime_weapon = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['antagonists'],
                                            get_all_sub_values_1=True,
                                            keys_2=['crime', 'crime_weapon'],
                                            get_all_sub_values_2=False,
                                            ignore_words_with_spaces=True)
antagonists_crime_objects = books_lib.word_embeddings\
                           .get_combinations(conf=book_meta,
                                             keys_1=['antagonists'],
                                             get_all_sub_values_1=True,
                                             keys_2=['crime', 'crime_objects'],
                                             get_all_sub_values_2=False,
                                             ignore_words_with_spaces=True)

print("\nprotagonists_antagonists: ")
pprint(protagonists_antagonists)
print("\nantagonists_crime_weapon: ")
pprint(antagonists_crime_weapon)
print("\nantagonists_crime_objects: ")
pprint(antagonists_crime_objects)


protagonists_antagonists: 
[('jamieson', 'watson'), ('detective', 'watson'), ('winters', 'watson')]

antagonists_crime_weapon: 
[('watson', 'revolver')]

antagonists_crime_objects: 
[('watson', 'staircase'), ('watson', 'floor'), ('watson', 'waistcoat')]


In [11]:
# Calculate distances with custom word embeddings
protag_antag_dists = books_lib\
                     .word_embeddings\
                     .calculate_differing_distances(sentences=sentences, 
                                                    word_pairs=protagonists_antagonists)
antag_crime_weap_dists = books_lib\
                         .word_embeddings\
                         .calculate_differing_distances(sentences=sentences, 
                                                        word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(sentences=sentences, 
                                                       word_pairs=antagonists_crime_objects)

In [12]:
# Save the results
protag_antag_dists.to_pickle(f"data{os.sep}The_Circular_Staircase__protag_antag_dists.pkl")
antag_crime_weap_dists.to_pickle(f"data{os.sep}The_Circular_Staircase__antag_crime_weap_dists.pkl")
antag_crime_obj_dists.to_pickle(f"data{os.sep}The_Circular_Staircase__antag_crime_obj_dists.pkl")
# To load them
protag_antag_dists = pd.read_pickle(f"data{os.sep}The_Circular_Staircase__protag_antag_dists.pkl")
antag_crime_weap_dists = pd.read_pickle(f"data{os.sep}The_Circular_Staircase__antag_crime_weap_dists.pkl")
antag_crime_obj_dists = pd.read_pickle(f"data{os.sep}The_Circular_Staircase__antag_crime_obj_dists.pkl")

In [13]:
display(protag_antag_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
2,jamieson,watson,50,3,0.998246,4.83571
0,jamieson,watson,50,2,0.998251,3.797696
4,jamieson,watson,100,2,0.998713,3.567988
1,jamieson,watson,50,5,0.998749,6.229471
3,jamieson,watson,50,10,0.998974,7.455702
6,jamieson,watson,100,3,0.998998,4.841573
5,jamieson,watson,100,5,0.999268,6.257964
7,jamieson,watson,100,10,0.999372,7.488316
8,jamieson,watson,200,2,0.99944,3.782495
10,jamieson,watson,200,3,0.999517,4.726704


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,watson,revolver,50,2,0.996274,2.06146
1,watson,revolver,50,5,0.996442,2.232622
2,watson,revolver,50,3,0.996913,2.522306
3,watson,revolver,50,10,0.997319,2.971783
4,watson,revolver,100,2,0.998261,2.05818
5,watson,revolver,100,5,0.998306,2.211048
6,watson,revolver,100,3,0.998533,2.503365
7,watson,revolver,100,10,0.998652,2.778102
8,watson,revolver,200,2,0.999087,2.090296
9,watson,revolver,200,5,0.999124,2.240093


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,watson,staircase,50,2,0.99481,3.144682
2,watson,staircase,50,3,0.995501,3.803063
1,watson,staircase,50,5,0.995625,4.407054
4,watson,staircase,100,2,0.996188,3.138797
6,watson,staircase,100,3,0.996293,3.615664
5,watson,staircase,100,5,0.996545,4.226993
3,watson,staircase,50,10,0.996619,5.29794
7,watson,staircase,100,10,0.99745,5.206012
8,watson,staircase,200,2,0.998315,3.06469
10,watson,staircase,200,3,0.998445,3.609994


In [14]:
# Calculate distances for pretrained embeddings
model_names = books_lib.word_embeddings.get_model_names()
model_names = [mn for mn in model_names if 'glove-wiki-gigaword' in mn]
print(model_names)

protag_antag_dists_pre = books_lib\
                     .word_embeddings\
                     .calculate_differing_distances(model_names=model_names, 
                                                    word_pairs=protagonists_antagonists)
antag_crime_weap_dists_pre = books_lib\
                         .word_embeddings\
                         .calculate_differing_distances(model_names=model_names, 
                                                        word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists_pre = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(model_names=model_names, 
                                                       word_pairs=antagonists_crime_objects)

['glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300']


In [15]:
# Save the results
protag_antag_dists_pre.to_pickle(f"data{os.sep}The_Circular_Staircase__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre.to_pickle(f"data{os.sep}The_Circular_Staircase__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre.to_pickle(f"data{os.sep}The_Circular_Staircase__antag_crime_obj_dists__PRETRAINED.pkl")
# To load them
protag_antag_dists_pre = pd.read_pickle(f"data{os.sep}The_Circular_Staircase__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre = pd.read_pickle(f"data{os.sep}The_Circular_Staircase__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre = pd.read_pickle(f"data{os.sep}The_Circular_Staircase__antag_crime_obj_dists__PRETRAINED.pkl")


In [16]:
display(protag_antag_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists_pre.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,jamieson,watson,glove-wiki-gigaword-300,0.177056,5.408132
2,jamieson,watson,glove-wiki-gigaword-200,0.246759,6.121186
1,jamieson,watson,glove-wiki-gigaword-100,0.378605,6.556598
0,jamieson,watson,glove-wiki-gigaword-50,0.434102,5.792758


Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,watson,revolver,glove-wiki-gigaword-300,0.076764,2.823694
2,watson,revolver,glove-wiki-gigaword-200,0.080159,2.572687
1,watson,revolver,glove-wiki-gigaword-100,0.092983,2.159217
0,watson,revolver,glove-wiki-gigaword-50,0.189547,3.656829


Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,watson,staircase,glove-wiki-gigaword-300,-0.061957,-2.265938
2,watson,staircase,glove-wiki-gigaword-200,-0.007154,-0.225405
1,watson,staircase,glove-wiki-gigaword-100,0.004739,0.112529
0,watson,staircase,glove-wiki-gigaword-50,0.040019,0.829024


# The Man in the Lower Ten

In [17]:
# Load conf
book_meta = books_conf['The_Man_in_Lower_Ten']
book = ProcessedBook(book_meta)

In [18]:
# Lemmatize sentences
protagonist_subs = list(book_meta['protagonists'][0].values())[0]
substitution = (protagonist_subs, 'protagonist')
sentences_substituted = book.lemmatize_by_sentence(word_subs=substitution)
sentences = book.lemmatize_by_sentence()

In [19]:
# Generate word combinations
protagonists_antagonists = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['protagonists'], 
                                            get_all_sub_values_1=True,
                                            keys_2=['antagonists'],
                                            get_all_sub_values_2=True,
                                            ignore_words_with_spaces=True)
antagonists_crime_weapon = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['antagonists'],
                                            get_all_sub_values_1=True,
                                            keys_2=['crime', 'crime_weapon'],
                                            get_all_sub_values_2=False,
                                            ignore_words_with_spaces=True)
antagonists_crime_objects = books_lib.word_embeddings\
                           .get_combinations(conf=book_meta,
                                             keys_1=['antagonists'],
                                             get_all_sub_values_1=True,
                                             keys_2=['crime', 'crime_objects'],
                                             get_all_sub_values_2=False,
                                             ignore_words_with_spaces=True)

print("\nprotagonists_antagonists: ")
pprint(protagonists_antagonists)
print("\nantagonists_crime_weapon: ")
pprint(antagonists_crime_weapon)
print("\nantagonists_crime_objects: ")
pprint(antagonists_crime_objects)


protagonists_antagonists: 
[('mcknight', 'curtis'),
 ('mcknight', 'alice'),
 ('richey', 'curtis'),
 ('richey', 'alice')]

antagonists_crime_weapon: 
[('curtis', 'dagger'), ('alice', 'dagger')]

antagonists_crime_objects: 
[('curtis', 'watch'),
 ('curtis', 'diamond'),
 ('curtis', 'revolver'),
 ('curtis', 'suit-case'),
 ('alice', 'watch'),
 ('alice', 'diamond'),
 ('alice', 'revolver'),
 ('alice', 'suit-case')]


In [20]:
# Calculate distances with custom word embeddings
protag_antag_dists = books_lib\
                     .word_embeddings\
                     .calculate_differing_distances(sentences=sentences, 
                                                    word_pairs=protagonists_antagonists)
antag_crime_weap_dists = books_lib\
                         .word_embeddings\
                         .calculate_differing_distances(sentences=sentences, 
                                                        word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(sentences=sentences, 
                                                       word_pairs=antagonists_crime_objects)

In [21]:
# Save the results
protag_antag_dists.to_pickle(f"data{os.sep}The_Man_in_Lower_Ten__protag_antag_dists.pkl")
antag_crime_weap_dists.to_pickle(f"data{os.sep}The_Man_in_Lower_Ten__antag_crime_weap_dists.pkl")
antag_crime_obj_dists.to_pickle(f"data{os.sep}The_Man_in_Lower_Ten__antag_crime_obj_dists.pkl")
# To load them
protag_antag_dists = pd.read_pickle(f"data{os.sep}The_Man_in_Lower_Ten__protag_antag_dists.pkl")
antag_crime_weap_dists = pd.read_pickle(f"data{os.sep}The_Man_in_Lower_Ten__antag_crime_weap_dists.pkl")
antag_crime_obj_dists = pd.read_pickle(f"data{os.sep}The_Man_in_Lower_Ten__antag_crime_obj_dists.pkl")

In [22]:
display(protag_antag_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,mcknight,curtis,50,2,0.995041,3.341126
2,mcknight,curtis,50,3,0.996081,4.208818
1,mcknight,curtis,50,5,0.996168,4.826505
3,mcknight,curtis,50,10,0.996556,5.770578
4,mcknight,curtis,100,2,0.998301,3.34833
5,mcknight,curtis,100,5,0.998521,4.736731
6,mcknight,curtis,100,3,0.998639,4.372715
7,mcknight,curtis,100,10,0.998763,5.769204
8,mcknight,curtis,200,2,0.998966,3.187205
9,mcknight,curtis,200,5,0.999196,4.769861


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
3,curtis,dagger,50,10,0.960456,0.344755
0,curtis,dagger,50,2,0.976637,0.433768
7,curtis,dagger,100,10,0.980329,0.361576
2,curtis,dagger,50,3,0.9808,0.518912
1,curtis,dagger,50,5,0.984352,0.594748
4,curtis,dagger,100,2,0.986757,0.395649
6,curtis,dagger,100,3,0.989744,0.52313
5,curtis,dagger,100,5,0.990015,0.514379
8,curtis,dagger,200,2,0.994035,0.422451
12,curtis,dagger,300,2,0.994556,0.341979


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,curtis,watch,50,2,0.995857,1.76572
1,curtis,watch,50,5,0.996697,1.894393
2,curtis,watch,50,3,0.996997,2.175083
3,curtis,watch,50,10,0.997277,2.515061
4,curtis,watch,100,2,0.997733,1.78684
5,curtis,watch,100,5,0.997879,1.917013
7,curtis,watch,100,10,0.998241,2.471807
6,curtis,watch,100,3,0.998255,2.215778
8,curtis,watch,200,2,0.99881,1.740641
11,curtis,watch,200,10,0.99887,1.855818


In [23]:
# Calculate distances for pretrained embeddings
model_names = books_lib.word_embeddings.get_model_names()
model_names = [mn for mn in model_names if 'glove-wiki-gigaword' in mn]
print(model_names)

protag_antag_dists_pre = books_lib\
                     .word_embeddings\
                     .calculate_differing_distances(model_names=model_names, 
                                                    word_pairs=protagonists_antagonists)
antag_crime_weap_dists_pre = books_lib\
                         .word_embeddings\
                         .calculate_differing_distances(model_names=model_names, 
                                                        word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists_pre = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(model_names=model_names, 
                                                       word_pairs=antagonists_crime_objects)

['glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300']


In [24]:
# Save the results
protag_antag_dists_pre.to_pickle(f"data{os.sep}The_Man_in_Lower_Ten__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre.to_pickle(f"data{os.sep}The_Man_in_Lower_Ten__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre.to_pickle(f"data{os.sep}The_Man_in_Lower_Ten__antag_crime_obj_dists__PRETRAINED.pkl")
# To load them
protag_antag_dists_pre = pd.read_pickle(f"data{os.sep}The_Man_in_Lower_Ten__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre = pd.read_pickle(f"data{os.sep}The_Man_in_Lower_Ten__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre = pd.read_pickle(f"data{os.sep}The_Man_in_Lower_Ten__antag_crime_obj_dists__PRETRAINED.pkl")

In [25]:
display(protag_antag_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists_pre.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,mcknight,curtis,glove-wiki-gigaword-300,0.247055,6.907011
2,mcknight,curtis,glove-wiki-gigaword-200,0.315148,7.548784
1,mcknight,curtis,glove-wiki-gigaword-100,0.442035,6.762619
0,mcknight,curtis,glove-wiki-gigaword-50,0.637331,7.10529


Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,curtis,dagger,glove-wiki-gigaword-300,-0.002432,-0.087583
2,curtis,dagger,glove-wiki-gigaword-200,0.02331,0.695448
1,curtis,dagger,glove-wiki-gigaword-100,0.046768,0.98247
0,curtis,dagger,glove-wiki-gigaword-50,0.079586,1.282962


Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
2,curtis,watch,glove-wiki-gigaword-200,0.140903,3.895205
3,curtis,watch,glove-wiki-gigaword-300,0.158067,4.947235
1,curtis,watch,glove-wiki-gigaword-100,0.217362,4.732463
0,curtis,watch,glove-wiki-gigaword-50,0.259233,4.31563


# The After House

In [26]:
# Load conf
book_meta = books_conf['The_After_House']
book = ProcessedBook(book_meta)

In [27]:
# Lemmatize sentences
protagonist_subs = list(book_meta['protagonists'][0].values())[0]
substitution = (protagonist_subs, 'protagonist')
sentences_substituted = book.lemmatize_by_sentence(word_subs=substitution)
sentences = book.lemmatize_by_sentence()

In [28]:
# Generate word combinations
protagonists_antagonists = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['protagonists'], 
                                            get_all_sub_values_1=True,
                                            keys_2=['antagonists'],
                                            get_all_sub_values_2=True,
                                            ignore_words_with_spaces=True)
antagonists_crime_weapon = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['antagonists'],
                                            get_all_sub_values_1=True,
                                            keys_2=['crime', 'crime_weapon'],
                                            get_all_sub_values_2=False,
                                            ignore_words_with_spaces=True)
antagonists_crime_objects = books_lib.word_embeddings\
                           .get_combinations(conf=book_meta,
                                             keys_1=['antagonists'],
                                             get_all_sub_values_1=True,
                                             keys_2=['crime', 'crime_objects'],
                                             get_all_sub_values_2=False,
                                             ignore_words_with_spaces=True)

print("\nprotagonists_antagonists: ")
pprint(protagonists_antagonists)
print("\nantagonists_crime_weapon: ")
pprint(antagonists_crime_weapon)
print("\nantagonists_crime_objects: ")
pprint(antagonists_crime_objects)


protagonists_antagonists: 
[('mcwhirter', 'charlie'), ('mcwhirter', 'jones')]

antagonists_crime_weapon: 
[('charlie', 'axe'), ('jones', 'axe')]

antagonists_crime_objects: 
[('charlie', 'cabin'), ('jones', 'cabin')]


In [29]:
# Calculate distances with custom word embeddings
protag_antag_dists = books_lib\
                     .word_embeddings\
                     .calculate_differing_distances(sentences=sentences, 
                                                    word_pairs=protagonists_antagonists)
antag_crime_weap_dists = books_lib\
                         .word_embeddings\
                         .calculate_differing_distances(sentences=sentences, 
                                                        word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(sentences=sentences, 
                                                       word_pairs=antagonists_crime_objects)

In [30]:
# Save the results
protag_antag_dists.to_pickle(f"data{os.sep}The_After_House__protag_antag_dists.pkl")
antag_crime_weap_dists.to_pickle(f"data{os.sep}The_After_House__antag_crime_weap_dists.pkl")
antag_crime_obj_dists.to_pickle(f"data{os.sep}The_After_House__antag_crime_obj_dists.pkl")
# To load them
protag_antag_dists = pd.read_pickle(f"data{os.sep}The_After_House__protag_antag_dists.pkl")
antag_crime_weap_dists = pd.read_pickle(f"data{os.sep}The_After_House__antag_crime_weap_dists.pkl")
antag_crime_obj_dists = pd.read_pickle(f"data{os.sep}The_After_House__antag_crime_obj_dists.pkl")

In [31]:
display(protag_antag_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,mcwhirter,charlie,50,2,0.994946,2.983171
2,mcwhirter,charlie,50,3,0.995811,3.77822
1,mcwhirter,charlie,50,5,0.997358,5.130692
3,mcwhirter,charlie,50,10,0.998105,6.22253
4,mcwhirter,charlie,100,2,0.998276,3.036815
6,mcwhirter,charlie,100,3,0.998568,3.770924
5,mcwhirter,charlie,100,5,0.999052,5.324931
8,mcwhirter,charlie,200,2,0.99913,2.992051
10,mcwhirter,charlie,200,3,0.999321,3.743343
7,mcwhirter,charlie,100,10,0.999324,6.304576


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,charlie,axe,50,2,0.995675,3.49108
2,charlie,axe,50,3,0.996109,4.280466
1,charlie,axe,50,5,0.997087,5.180049
4,charlie,axe,100,2,0.998,3.592905
6,charlie,axe,100,3,0.998224,4.221454
3,charlie,axe,50,10,0.998257,6.894542
5,charlie,axe,100,5,0.998752,5.094182
8,charlie,axe,200,2,0.998941,3.376753
10,charlie,axe,200,3,0.999183,4.230534
7,charlie,axe,100,10,0.999222,6.794313


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
2,charlie,cabin,50,3,0.995839,5.053222
0,charlie,cabin,50,2,0.995842,4.543077
1,charlie,cabin,50,5,0.997336,7.111734
4,charlie,cabin,100,2,0.998154,4.322909
6,charlie,cabin,100,3,0.99845,5.070008
3,charlie,cabin,50,10,0.998608,10.758627
8,charlie,cabin,200,2,0.998851,4.036192
5,charlie,cabin,100,5,0.999055,7.544358
10,charlie,cabin,200,3,0.999081,4.931735
12,charlie,cabin,300,2,0.999318,4.226752


In [32]:
# Calculate distances for pretrained embeddings
model_names = books_lib.word_embeddings.get_model_names()
model_names = [mn for mn in model_names if 'glove-wiki-gigaword' in mn]
print(model_names)

protag_antag_dists_pre = books_lib\
                     .word_embeddings\
                     .calculate_differing_distances(model_names=model_names, 
                                                    word_pairs=protagonists_antagonists)
antag_crime_weap_dists_pre = books_lib\
                         .word_embeddings\
                         .calculate_differing_distances(model_names=model_names, 
                                                        word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists_pre = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(model_names=model_names, 
                                                       word_pairs=antagonists_crime_objects)

['glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300']


In [33]:
# Save the results
protag_antag_dists_pre.to_pickle(f"data{os.sep}The_After_House__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre.to_pickle(f"data{os.sep}The_After_House__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre.to_pickle(f"data{os.sep}The_After_House__antag_crime_obj_dists__PRETRAINED.pkl")
# To load them
protag_antag_dists_pre = pd.read_pickle(f"data{os.sep}The_After_House__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre = pd.read_pickle(f"data{os.sep}The_After_House__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre = pd.read_pickle(f"data{os.sep}The_After_House__antag_crime_obj_dists__PRETRAINED.pkl")

In [34]:
display(protag_antag_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists_pre.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
1,mcwhirter,charlie,glove-wiki-gigaword-100,-0.027188,-0.592603
0,mcwhirter,charlie,glove-wiki-gigaword-50,-0.001796,-0.033404
3,mcwhirter,charlie,glove-wiki-gigaword-300,0.020236,0.944832
2,mcwhirter,charlie,glove-wiki-gigaword-200,0.028993,1.106474


Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,charlie,axe,glove-wiki-gigaword-300,0.110767,4.122083
1,charlie,axe,glove-wiki-gigaword-100,0.120016,2.692579
2,charlie,axe,glove-wiki-gigaword-200,0.121701,4.145815
0,charlie,axe,glove-wiki-gigaword-50,0.19056,3.260837


Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,charlie,cabin,glove-wiki-gigaword-300,0.019189,0.776641
2,charlie,cabin,glove-wiki-gigaword-200,0.049996,1.810939
0,charlie,cabin,glove-wiki-gigaword-50,0.19036,3.861888
1,charlie,cabin,glove-wiki-gigaword-100,0.191625,4.886503


# The Window at the Withe Cat

In [35]:
# Load conf
book_meta = books_conf['The_Window_at_the_White_Cat']
book = ProcessedBook(book_meta)

In [36]:
# Lemmatize sentences
protagonist_subs = list(book_meta['protagonists'][0].values())[0]
substitution = (protagonist_subs, 'protagonist')
sentences_substituted = book.lemmatize_by_sentence(word_subs=substitution)
sentences = book.lemmatize_by_sentence()

In [37]:
# Generate word combinations
protagonists_antagonists = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['protagonists'], 
                                            get_all_sub_values_1=True,
                                            keys_2=['antagonists'],
                                            get_all_sub_values_2=True,
                                            ignore_words_with_spaces=True)
antagonists_crime_weapon = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['antagonists'],
                                            get_all_sub_values_1=True,
                                            keys_2=['crime', 'crime_weapon'],
                                            get_all_sub_values_2=False,
                                            ignore_words_with_spaces=True)
antagonists_crime_objects = books_lib.word_embeddings\
                           .get_combinations(conf=book_meta,
                                             keys_1=['antagonists'],
                                             get_all_sub_values_1=True,
                                             keys_2=['crime', 'crime_objects'],
                                             get_all_sub_values_2=False,
                                             ignore_words_with_spaces=True)

print("\nprotagonists_antagonists: ")
pprint(protagonists_antagonists)
print("\nantagonists_crime_weapon: ")
pprint(antagonists_crime_weapon)
print("\nantagonists_crime_objects: ")
pprint(antagonists_crime_objects)


protagonists_antagonists: 
[('hunter', 'butler')]

antagonists_crime_weapon: 
[('butler', 'revolver')]

antagonists_crime_objects: 
[('butler', 'table'),
 ('butler', 'papers'),
 ('butler', 'ink'),
 ('butler', 'pen'),
 ('butler', 'tray')]


In [38]:
# Calculate distances with custom word embeddings
protag_antag_dists = books_lib\
                     .word_embeddings\
                     .calculate_differing_distances(sentences=sentences, 
                                                    word_pairs=protagonists_antagonists)
antag_crime_weap_dists = books_lib\
                         .word_embeddings\
                         .calculate_differing_distances(sentences=sentences, 
                                                        word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(sentences=sentences, 
                                                       word_pairs=antagonists_crime_objects)

In [39]:
# Save the results
protag_antag_dists.to_pickle(f"data{os.sep}The_Window_at_the_White_Cat__protag_antag_dists.pkl")
antag_crime_weap_dists.to_pickle(f"data{os.sep}The_Window_at_the_White_Cat__antag_crime_weap_dists.pkl")
antag_crime_obj_dists.to_pickle(f"data{os.sep}The_Window_at_the_White_Cat__antag_crime_obj_dists.pkl")
# To load them
protag_antag_dists = pd.read_pickle(f"data{os.sep}The_Window_at_the_White_Cat__protag_antag_dists.pkl")
antag_crime_weap_dists = pd.read_pickle(f"data{os.sep}The_Window_at_the_White_Cat__antag_crime_weap_dists.pkl")
antag_crime_obj_dists = pd.read_pickle(f"data{os.sep}The_Window_at_the_White_Cat__antag_crime_obj_dists.pkl")

In [40]:
display(protag_antag_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,hunter,butler,50,2,0.998619,6.72214
2,hunter,butler,50,3,0.998995,8.5122
1,hunter,butler,50,5,0.999143,10.293833
3,hunter,butler,50,10,0.999255,12.203511
4,hunter,butler,100,2,0.999279,6.882215
6,hunter,butler,100,3,0.999518,8.456774
5,hunter,butler,100,5,0.999599,10.218754
8,hunter,butler,200,2,0.999638,6.409602
7,hunter,butler,100,10,0.999644,12.137957
10,hunter,butler,200,3,0.999728,8.213093


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,butler,revolver,50,2,0.997407,3.505509
1,butler,revolver,50,5,0.99808,5.058284
2,butler,revolver,50,3,0.99814,4.490715
3,butler,revolver,50,10,0.998209,5.22847
4,butler,revolver,100,2,0.999029,3.693836
6,butler,revolver,100,3,0.999114,4.351472
5,butler,revolver,100,5,0.99924,5.027448
7,butler,revolver,100,10,0.999333,5.571593
8,butler,revolver,200,2,0.999476,4.061533
10,butler,revolver,200,3,0.999509,4.375998


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
1,butler,table,50,5,0.997314,6.833781
0,butler,table,50,2,0.997644,4.86787
3,butler,table,50,10,0.997797,8.111781
2,butler,table,50,3,0.998032,5.826626
4,butler,table,100,2,0.998544,4.624686
5,butler,table,100,5,0.998801,6.739102
7,butler,table,100,10,0.998953,8.067116
6,butler,table,100,3,0.998954,5.695745
8,butler,table,200,2,0.999207,4.553554
11,butler,table,200,10,0.999221,7.430847


In [41]:
# Calculate distances for pretrained embeddings
model_names = books_lib.word_embeddings.get_model_names()
model_names = [mn for mn in model_names if 'glove-wiki-gigaword' in mn]
print(model_names)

protag_antag_dists_pre = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(model_names=model_names, 
                                                       word_pairs=protagonists_antagonists)
antag_crime_weap_dists_pre = books_lib\
                            .word_embeddings\
                            .calculate_differing_distances(model_names=model_names, 
                                                           word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists_pre = books_lib\
                           .word_embeddings\
                           .calculate_differing_distances(model_names=model_names, 
                                                          word_pairs=antagonists_crime_objects)

['glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300']


In [42]:
# Save the results
protag_antag_dists_pre.to_pickle(f"data{os.sep}The_Window_at_the_White_Cat__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre.to_pickle(f"data{os.sep}The_Window_at_the_White_Cat__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre.to_pickle(f"data{os.sep}The_Window_at_the_White_Cat__antag_crime_obj_dists__PRETRAINED.pkl")
# To load them
protag_antag_dists_pre = pd.read_pickle(f"data{os.sep}The_Window_at_the_White_Cat__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre = pd.read_pickle(f"data{os.sep}The_Window_at_the_White_Cat__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre = pd.read_pickle(f"data{os.sep}The_Window_at_the_White_Cat__antag_crime_obj_dists__PRETRAINED.pkl")

In [43]:
display(protag_antag_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists_pre.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,hunter,butler,glove-wiki-gigaword-300,0.330243,10.111649
2,hunter,butler,glove-wiki-gigaword-200,0.398672,10.615868
1,hunter,butler,glove-wiki-gigaword-100,0.502751,9.411463
0,hunter,butler,glove-wiki-gigaword-50,0.544165,7.285058


Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,butler,revolver,glove-wiki-gigaword-300,0.005197,0.19523
2,butler,revolver,glove-wiki-gigaword-200,0.053686,1.795451
1,butler,revolver,glove-wiki-gigaword-100,0.146359,3.29242
0,butler,revolver,glove-wiki-gigaword-50,0.213074,3.636106


Unnamed: 0,word1,word2,model_name,cosineSim,dotSim
3,butler,table,glove-wiki-gigaword-300,0.058439,2.008679
2,butler,table,glove-wiki-gigaword-200,0.112859,3.529393
1,butler,table,glove-wiki-gigaword-100,0.18277,4.206408
0,butler,table,glove-wiki-gigaword-50,0.215771,3.605739


# The Bat

In [44]:
# Load conf
book_meta = books_conf['The_Bat']
book = ProcessedBook(book_meta)

In [45]:
# Lemmatize sentences
protagonist_subs = list(book_meta['protagonists'][0].values())[0]
substitution = (protagonist_subs, 'protagonist')
sentences_substituted = book.lemmatize_by_sentence(word_subs=substitution)
sentences = book.lemmatize_by_sentence()

In [46]:
# Generate word combinations
protagonists_antagonists = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['protagonists'], 
                                            get_all_sub_values_1=True,
                                            keys_2=['antagonists'],
                                            get_all_sub_values_2=True,
                                            ignore_words_with_spaces=True)
antagonists_crime_weapon = books_lib.word_embeddings\
                          .get_combinations(conf=book_meta, 
                                            keys_1=['antagonists'],
                                            get_all_sub_values_1=True,
                                            keys_2=['crime', 'crime_weapon'],
                                            get_all_sub_values_2=False,
                                            ignore_words_with_spaces=True)
antagonists_crime_objects = books_lib.word_embeddings\
                           .get_combinations(conf=book_meta,
                                             keys_1=['antagonists'],
                                             get_all_sub_values_1=True,
                                             keys_2=['crime', 'crime_objects'],
                                             get_all_sub_values_2=False,
                                             ignore_words_with_spaces=True)

print("\nprotagonists_antagonists: ")
pprint(protagonists_antagonists)
print("\nantagonists_crime_weapon: ")
pprint(antagonists_crime_weapon)
print("\nantagonists_crime_objects: ")
pprint(antagonists_crime_objects)


protagonists_antagonists: 
[('anderson', 'bat')]

antagonists_crime_weapon: 
[('bat', 'gun'), ('bat', 'revolver')]

antagonists_crime_objects: 
[('bat', 'blueprint'), ('bat', 'money'), ('bat', 'revolver'), ('bat', 'stairs')]


In [47]:
# Calculate distances with custom word embeddings
protag_antag_dists = books_lib\
                     .word_embeddings\
                     .calculate_differing_distances(sentences=sentences, 
                                                    word_pairs=protagonists_antagonists)
antag_crime_weap_dists = books_lib\
                         .word_embeddings\
                         .calculate_differing_distances(sentences=sentences, 
                                                        word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(sentences=sentences, 
                                                       word_pairs=antagonists_crime_objects)

blueprint not in vocabulary! Skipping..


In [48]:
# Save the results
protag_antag_dists.to_pickle(f"data{os.sep}The_Bat__protag_antag_dists.pkl")
antag_crime_weap_dists.to_pickle(f"data{os.sep}The_Bat__antag_crime_weap_dists.pkl")
antag_crime_obj_dists.to_pickle(f"data{os.sep}The_Bat__antag_crime_obj_dists.pkl")
# To load them
protag_antag_dists = pd.read_pickle(f"data{os.sep}The_Bat__protag_antag_dists.pkl")
antag_crime_weap_dists = pd.read_pickle(f"data{os.sep}The_Bat__antag_crime_weap_dists.pkl")
antag_crime_obj_dists = pd.read_pickle(f"data{os.sep}The_Bat__antag_crime_obj_dists.pkl")

In [49]:
display(protag_antag_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists.sort_values(['cosineSim', 'dotSim']))

Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,anderson,bat,50,2,0.997752,9.564051
2,anderson,bat,50,3,0.998095,12.85491
1,anderson,bat,50,5,0.998452,17.702887
4,anderson,bat,100,2,0.998918,9.194827
8,anderson,bat,200,2,0.999117,9.592024
3,anderson,bat,50,10,0.99912,24.345646
6,anderson,bat,100,3,0.999366,12.445475
12,anderson,bat,300,2,0.999462,9.698387
5,anderson,bat,100,5,0.999582,17.75194
10,anderson,bat,200,3,0.999624,13.251512


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
0,bat,gun,50,2,0.982294,1.269234
1,bat,gun,50,5,0.984349,1.904429
3,bat,gun,50,10,0.985204,2.342054
2,bat,gun,50,3,0.988845,1.95285
4,bat,gun,100,2,0.990554,1.444168
7,bat,gun,100,10,0.990876,2.261071
6,bat,gun,100,3,0.990888,1.594261
5,bat,gun,100,5,0.991835,2.119043
8,bat,gun,200,2,0.995668,1.334126
9,bat,gun,200,5,0.995703,1.705578


Unnamed: 0,word1,word2,vectorSize,windowSize,cosineSim,dotSim
1,bat,money,50,5,0.99533,8.145314
0,bat,money,50,2,0.996189,5.520333
3,bat,money,50,10,0.996217,11.349937
2,bat,money,50,3,0.996736,7.245745
5,bat,money,100,5,0.998281,8.271401
7,bat,money,100,10,0.998822,10.915357
6,bat,money,100,3,0.998979,6.265683
4,bat,money,100,2,0.998995,5.561276
9,bat,money,200,5,0.99913,8.142879
11,bat,money,200,10,0.99944,11.06499


In [None]:
# Calculate distances with custom word embeddings
protag_antag_dists_pre = books_lib\
                        .word_embeddings\
                        .calculate_differing_distances(model_names=model_names, 
                                                       word_pairs=protagonists_antagonists)
antag_crime_weap_dists_pre = books_lib\
                            .word_embeddings\
                            .calculate_differing_distances(model_names=model_names, 
                                                           word_pairs=antagonists_crime_weapon)
antag_crime_obj_dists_pre = books_lib\
                           .word_embeddings\
                           .calculate_differing_distances(model_names=model_names, 
                                                          word_pairs=antagonists_crime_objects)

In [None]:
# Save the results
protag_antag_dists_pre.to_pickle(f"data{os.sep}The_Bat__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre.to_pickle(f"data{os.sep}The_Bat__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre.to_pickle(f"data{os.sep}The_Bat__antag_crime_obj_dists__PRETRAINED.pkl")
# To load them
protag_antag_dists_pre = pd.read_pickle(f"data{os.sep}The_Bat__protag_antag_dists__PRETRAINED.pkl")
antag_crime_weap_dists_pre = pd.read_pickle(f"data{os.sep}The_Bat__antag_crime_weap_dists__PRETRAINED.pkl")
antag_crime_obj_dists_pre = pd.read_pickle(f"data{os.sep}The_Bat__antag_crime_obj_dists__PRETRAINED.pkl")

In [None]:
display(protag_antag_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_weap_dists_pre.sort_values(['cosineSim', 'dotSim']))
display(antag_crime_obj_dists_pre.sort_values(['cosineSim', 'dotSim']))

## Compare Vector distances and report similarities using Pretrained Embeddings

In [None]:
# cs.my_pretrained_embeddings_compare_function()

## Extra Analysis? Plots?

In [None]:
# Too much work