<a href="https://colab.research.google.com/github/andrewpkitchin/moral-concern-word-embeddings/blob/master/calculating_cosines_historic_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this Python 2 notebook, we will demonstrate how to use the historic word embedding models supported by the Histwords library to generate the results used in [link to paper]. We will calculate the cosine between each focal enitity and each moral concern word.

To use this notebook you will need to clone the Histwords github repository found here: https://github.com/williamleif/histwords as well as download the 
All English (1800s-1990s) pretrained models found here: https://nlp.stanford.edu/projects/histwords/.  

Since Python 2 is depreciated in google colab you may have to run this elsewhere. As of the date of upload, this link: https://colab.research.google.com/notebook#create=true&language=python2 will create a Python 2 notebook.

In [None]:
# Dependancies

import numpy as np, csv
from google.colab import drive

In [None]:
# For our purposes we cloned the Histwords github repository to a google drive and call the models from there.

drive.mount('/content/drive')
%cd /content/drive/My Drive/word2vecProject/histwords

Mounted at /content/drive
/content/drive/My Drive/word2vecProject/histwords


Here we create a function which returns the normalized vector representation of a word from a model. We also create a function to compute the cosine similarity of each pair of words from two input lists.

In [None]:
# Normalizing vectors.

def norm(word, startOfDec, model_name):
  from representations.embedding import Embedding
  if __name__ == "__main__":
    fiction_embeddings = Embedding.load("/content/drive/My Drive/word2vecProject/histwords/embeddings/{}/{}".format(model_name, startOfDec))

  return Embedding.represent(fiction_embeddings,word)/np.linalg.norm(Embedding.represent(fiction_embeddings,word))


def cosine_similarity(vec1,vec2):
  return np.dot(vec1, vec2)/(np.linalg.norm(vec1)* np.linalg.norm(vec2))

In [None]:
def cosineToCSV(csvName,entitiesList,moralList, startOfDec, model):
  from representations.embedding import Embedding
  if __name__ == "__main__":
    fiction_embeddings = Embedding.load("/content/drive/My Drive/word2vecProject/histwords/embeddings/{}/{}".format(model, startOfDec))

  with open(csvName, 'wt') as file:
    writer = csv.writer(file)

    moralList.insert(0, startOfDec)

    # Write the headings to the csv.
    writer.writerow(moralList)

    moralList.pop(0)
  
    for entity in entitiesList:
      listOfCosines = []
      listOfCosines.append(entity)
    
      for moral in moralList:
        sim = cosine_similarity(Embedding.represent(fiction_embeddings,entity)/np.linalg.norm(Embedding.represent(fiction_embeddings,entity)),Embedding.represent(fiction_embeddings,moral)/np.linalg.norm(Embedding.represent(fiction_embeddings,moral)))
        if np.isnan(sim):
          listOfCosines.append('NA')
        else: 
          listOfCosines.append(sim)

      # Writing the cosine scores to the csv.
      writer.writerow(listOfCosines)

Lists of words

In [None]:
moral_concern_list = ['care', 'cares', 'caring', 'cared', 'concern', 'concerns', 'concerned', 'concerning', 'empathy', 'empathetic', 'empathize', 'sympathy', 'sympathize', 'sympathetic', 'compassion', 'compassionate', 'uncaring', 'apathetic', 'apathy', 'unconcerned', 'unconcerning', 'indifference', 'indifferent', 'unsympathetic', 'unempathetic', 'uncompassionate', 'disregard', 'disregarded', 'disregarding']

In [None]:
humans_animals = ['husband', 'wife', 'father', 'mother', 'son', 'daughter', 'brother', 'sister', 'uncle', 'aunt', 'niece', 'nephew', 'grandmother', 'grandfather', 'acquaintance', 'ally', 'associate', 'colleague', 'counterpart', 'fellow', 'neighbor', 'patriot', 'confidant', 'companion', 'partner', 'supporter', 'member', 'follower', 'emigrant', 'foreigner', 'intruder', 'settler', 'stranger', 'visitor', 'vagrant', 'opposition', 'rival', 'opponent', 'adversary', 'competitor', 'invader', 'occupier', 'arab', 'beggar', 'blacks', 'crippled', 'disabled', 'jew', 'mexican', 'unemployed', 'native', 'elderly', 'indian', 'woman', 'chinese', 'pauper', 'animal', 'ape', 'bird', 'elephant', 'chicken', 'cow', 'dog', 'fish', 'pig', 'shark', 'bear', 'snake', 'monkey', 'lion', 'nature', 'forest', 'lake', 'mountain', 'ocean', 'reef', 'river', 'tree', 'sea', 'beach', 'island', 'coast', 'earth', 'planet']


Models

In [None]:
list_of_models = ['eng-all_sgns','Genre-Balanced_American_English_(1830s-2000s)']

# 'eng-all_sgns', 'Genre-Balanced_American_English_(1830s-2000s)'

Computing the cosine similarity between each entity word and each moral concern word.


In [None]:
for model in list_of_models:  
  for i in range(1830,2000,10):
    cosineToCSV('entities_and_moral_concern_cosines_{}_{}.csv'.format(model,i), entities_list, moral_concern_list, i, model)