<a href="https://colab.research.google.com/github/andrewpkitchin/moral-concern-word-embeddings/blob/master/calculating_cosines_contempary_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this Python 3 notebook, we will demonstrate how to use contempary word embedding models supported by the Gensim library to generate the results used in [link to paper]. We will calculate the cosine between each focal enitity and each moral concern word.

See https://github.com/RaRe-Technologies/gensim-data for a list of models and documentation.

In [None]:
# Dependancies

import csv, numpy as np, gensim.downloader as api

In [None]:
# List of contempary models we have used. Information on these can be found at https://github.com/RaRe-Technologies/gensim-data

list_of_models = ['word2vec-google-news-300']

#Google News Model
  #'word2vec-google-news-300'
#Twitter Models
  #'glove-twitter-100', 'glove-twitter-200', 'glove-twitter-25', 'glove-twitter-50', 
#Wikipedia Models
  #'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-wiki-gigaword-50',
  #'fasttext-wiki-news-subwords-300'

Here we create a function which returns the normalized vector representation of a word from a model. Note: word_vector will be defined as follows:

> word_vector = api.load(model_name)

we will define this when we load and run the models. We also create a function to compute the cosine similarity of each pair of words from two input lists.

In [None]:
def norm(word):
  return word_vector[word]/np.linalg.norm(word_vector[word])
 
def cosine_similarity(vec1,vec2):
  return np.dot(vec1, vec2)/(np.linalg.norm(vec1)* np.linalg.norm(vec2))

def average_vector(list_of_words, average_vec):
  for word in list_of_words:
    try:
      average_vec += norm(word)
    except KeyError:
      continue
  
  return average_vec/(len(list_of_words)+1)

In [None]:
def cosines_to_csv(csv_name, list1, list2, model_name):
  with open(csv_name, 'w', newline='') as file:
    writer = csv.writer(file)

    list2.insert(0,model_name)

    # Write the headings to the csv.
    writer.writerow(list2)

    list2.pop(0)

    for word in list1:
      listOfCosines = []
      listOfCosines.append(word)
      
      for entity in list2:
        try:
          listOfCosines.append(cosine_similarity(norm(word),norm(entity)))
        except KeyError:
          listOfCosines.append('NA')

      # Writing the cosine scores to the csv.
      writer.writerow(listOfCosines)

Lists of words

In [None]:
moral_concern_list = ['care', 'cares', 'caring', 'cared', 'concern', 'concerns', 'concerned', 'concerning', 'empathy', 'empathetic', 'empathize', 'sympathy', 'sympathize', 'sympathetic', 'compassion', 'compassionate', 'uncaring', 'apathetic', 'apathy', 'unconcerned', 'unconcerning', 'indifference', 'indifferent', 'unsympathetic', 'unempathetic', 'uncompassionate', 'disregard', 'disregarded', 'disregarding']


In [None]:
entities_list = ['husband', 'wife', 'father', 'mother', 'son', 'daughter', 'brother', 'sister', 'uncle', 'aunt', 'niece', 'nephew', 'grandmother', 'grandfather', 'acquaintance', 'ally', 'associate', 'colleague', 'counterpart', 'fellow', 'neighbor', 'patriot', 'confidant', 'companion', 'partner', 'supporter', 'member', 'follower', 'emigrant', 'foreigner', 'intruder', 'settler', 'stranger', 'visitor', 'vagrant', 'opposition', 'rival', 'opponent', 'adversary', 'competitor', 'invader', 'occupier', 'arab', 'beggar', 'blacks', 'crippled', 'disabled', 'jew', 'mexican', 'unemployed', 'native', 'elderly', 'indian', 'woman', 'chinese', 'pauper', 'animal', 'ape', 'bird', 'elephant', 'chicken', 'cow', 'dog', 'fish', 'pig', 'shark', 'bear', 'snake', 'monkey', 'lion', 'nature', 'forest', 'lake', 'mountain', 'ocean', 'reef', 'river', 'tree', 'sea', 'beach', 'island', 'coast', 'earth', 'planet']


Computing the cosine similarity between each entity word and each moral concern word.

In [None]:
for i in list_of_models:
  word_vector = api.load(i)
  cosines_to_csv('entities_and_moral_concern_cosines_{}.csv'.format(i), moral_concern_list, entities_list, i)

