<a href="https://colab.research.google.com/github/andrewpkitchin/moral-circle-word-embeddings/blob/master/contempary_models_average_vectors.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this Python 3 notebook we will demonstrate how to use contempary word embedding models supported by the Gensim library to generate the results used in [link to paper]. We will calculate the cosine between each focal enitity and the average vector of the moarl words.

We suggest mounting a Google Drive to output results as csv files of cosines for each model.

See https://github.com/RaRe-Technologies/gensim-data for a list of models and documentation.

In [1]:
# Dependancies

from google.colab import drive
import csv, numpy as np, gensim.downloader as api


In [2]:
# Mounting a Google Drive.

drive.mount('/content/drive')

%cd drive/My\ Drive/word2vecProject/csvFiles

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive
/content/drive/My Drive/word2vecProject/csvFiles


In [3]:
# List of contempary models we have used. Information on these can be found at https://github.com/RaRe-Technologies/gensim-data

list_of_models = ['word2vec-google-news-300','glove-twitter-100', 'glove-twitter-200', 'glove-twitter-25', 'glove-twitter-50','glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-wiki-gigaword-50','fasttext-wiki-news-subwords-300']

#Google News Model
  #'word2vec-google-news-300'
#Twitter Models
  #'glove-twitter-100', 'glove-twitter-200', 'glove-twitter-25', 'glove-twitter-50', 
#Wikipedia Models
  #'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-wiki-gigaword-50',
  #'fasttext-wiki-news-subwords-300'

Here we create a function which returns the normalized vector representation of a word from a model. Note: word_vector will be defined as follows:

> word_vector = api.load(model_name)

we will define this when we load and run the models. We also create a function to compute the cosine similarity of two vectors as well as a function to compute the average vector of a list of words.

In [4]:
def norm(word):
  return word_vector[word]/np.linalg.norm(word_vector[word])
 

def cosine_similarity(vec1,vec2):
  return np.dot(vec1, vec2)/(np.linalg.norm(vec1)* np.linalg.norm(vec2))


def average_vector(list_of_words, average_vec):
  for word in list_of_words:
    try:
      average_vec += norm(word)
    except KeyError:
      continue
  
  return average_vec/(len(list_of_words)+1)

Here we define a function for computing the cosine similarity between words in a list and the group/average vector of a list of words.

In [5]:
def average_vector_cosines_to_csv(csv_name, average_vec, list_of_words, model_name):
  with open(csv_name, 'a', newline='') as file:
    writer = csv.writer(file)

    list_of_cosines = []
    list_of_cosines.insert(0,model_name)

    for word in list_of_words:
      try:
        list_of_cosines.append(cosine_similarity(average_vec,norm(word)))
      except KeyError:
        list_of_cosines.append('NA')
      
    # Writing the cosine scores to the csv.
    writer.writerow(list_of_cosines)

Lists of words

In [6]:
moral_list = ['compassion', 'compassionate','care', 'cares', 'cared', 'caring','help', 'helps', 'helped', 'helping','responsibility', 'responsibilities','duty', 'duties','concern', 'concerns', 'concerned', 'concerning','welfare','rights','support', 'supports', 'supporting', 'supported','assist', 'assists', 'assisting', 'assisted']

In [7]:
focal_entities_list = ['husband', 'wife', 'father', 'mother', 'son', 'daughter', 'brother', 'sister', 'uncle', 'aunt', 'niece', 'nephew', 'grandmother', 'grandfather', 'acquaintance', 'ally', 'associate', 'colleague', 'confidant', 'companion', 'counterpart', 'fellow', 'follower', 'member', 'neighbor', 'partner', 'patriot', 'supporter', 
'arab', 'beggar', 'blacks', 'chinese', 'crippled', 'disabled', 'elderly', 'indian', 'jew', 'mexican', 'native', 'pauper', 'unemployed', 'woman', 'adversary', 'competitor', 'emigrant', 'foreigner', 'intruder', 'invader', 'occupier', 'opponent', 'opposition', 'rival', 'settler', 'stranger', 'visitor', 'vagrant', 
'convict', 'criminal', 'crook', 'delinquent', 'deserter', 'enemy', 'felon', 'murderer', 'offender', 'robber', 'scoundrel', 'thief', 'traitor', 'villain', 'animal', 'ape', 'bear', 'bird', 'chicken', 'cow', 'dog', 'elephant', 'fish', 'lion', 'monkey', 'pig', 'shark', 'snake', 'beach', 'coast', 'earth', 'forest', 'island', 
'lake', 'mountain', 'nature', 'ocean', 'planet', 'reef', 'river', 'sea', 'tree']

Computing the cosine similarity between each focal entity and the average vector of the moral words. 

In [8]:
with open('focal_entities_and_moral_average_vec_contempary.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    
    focal_entities_list.insert(0," ")
    writer.writerow(focal_entities_list)
    focal_entities_list.pop(0)


for i in list_of_models:
  word_vector = api.load(i)
  
  j = word_vector['word'].shape[0]
  
  average_vec = np.zeros([j, ])

  moral_list_aver_vec = average_vector(moral_list, average_vec)

  average_vector_cosines_to_csv('focal_entities_and_moral_average_vec_contempary.csv', moral_list_aver_vec, focal_entities_list, i)



  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


