<a href="https://colab.research.google.com/github/andrewpkitchin/speciesism-in-everyday-language/blob/main/calculating_cosines_gensim_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this Python 3 notebook, we will demonstrate how to use word embedding models supported by the Gensim library to generate the results used in [link to paper]. We will calculate the cosine between each focal enitity and each moral concern word.

See https://github.com/RaRe-Technologies/gensim-data for a list of models and documentation.

In [None]:
# Dependancies

import csv, numpy as np, gensim.downloader as api

In [None]:
# List of contempary models we have used. Information on these can be found at https://github.com/RaRe-Technologies/gensim-data

list_of_models = ['word2vec-google-news-300']

#Google News Model
  #'word2vec-google-news-300'
#Twitter Models
  #'glove-twitter-100', 'glove-twitter-200', 'glove-twitter-25', 'glove-twitter-50', 
#Wikipedia Models
  #'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-wiki-gigaword-50',
  #'fasttext-wiki-news-subwords-300'

In [None]:
Here we create a function which returns the normalized vector representation of a word from a model. Note: word_vector will be defined as follows:

word_vector = api.load(model_name)

we will define this when we load and run the models. We also create a function to compute the cosine similarity of each pair of words from two input lists.

In [None]:
def norm(word):
  return word_vector[word]/np.linalg.norm(word_vector[word])
 
def cosine_similarity(vec1,vec2):
  return np.dot(vec1, vec2)/(np.linalg.norm(vec1)* np.linalg.norm(vec2))

def average_vector(list_of_words, average_vec):
  for word in list_of_words:
    try:
      average_vec += norm(word)
    except KeyError:
      continue
  
  return average_vec/(len(list_of_words)+1)

In [None]:
def cosines_to_csv(csv_name, list1, list2, model_name):
  with open(csv_name, 'w', newline='') as file:
    writer = csv.writer(file)

    list2.insert(0,model_name)

    # Write the headings to the csv.
    writer.writerow(list2)

    list2.pop(0)

    for word in list1:
      listOfCosines = []
      listOfCosines.append(word)
      
      for entity in list2:
        try:
          listOfCosines.append(cosine_similarity(norm(word),norm(entity)))
        except KeyError:
          listOfCosines.append('NA')

      # Writing the cosine scores to the csv.
      writer.writerow(listOfCosines)

In [None]:
humans_and_animals = ['human', 'humans', 'person', 'persons', 'peoples', 'people', 'adult', 'adults', 'teenager', 'teenagers', 'child', 'children', 'kid', 'kids', 'man', 'men', 'woman', 'women', 'lady', 'ladies', 'gentleman', 'gentlemen', 'boy', 'boys', 'girl', 'girls', 'guy', 'gal', 'baby', 'babies', 'infant', 'infants', 'toddler', 'toddlers', 'newborn', 'newborns', 'cat', 'cats', 'kitten', 'kittens', 'dog', 'dogs', 'puppy', 'puppies', 'horse', 'horses', 'dolphin', 'dolphins', 'chimp', 'chimps', 'bear', 'bears', 'kangaroo', 'kangaroo', 'chicken', 'chickens', 'chick', 'chicks', 'goat', 'goats', 'sheep', 'lamb', 'lambs', 'pig', 'pigs', 'turkey', 'turkeys', 'cow', 'cows', 'calf', 'calves', 'duck', 'ducks', 'snake', 'snakes', 'snail', 'snails', 'starfish', 'crocodile', 'crocodiles', 'bat', 'bats', 'frog', 'frogs']
attributes = ['care', 'cares', 'caring', 'cared', 'concern', 'concerns', 'concerned', 'concerning', 'sympathy', 'sympathize', 'sympathetic', 'compassion', 'compassionate', 'apathy', 'uncaring', 'unconcerned', 'indifference', 'indifferent', 'disregard', 'disregarded', 'disregarding', 'value', 'valuable', 'valued', 'invaluable', 'important', 'importance', 'worth', 'precious', 'cherish', 'cherished', 'significant', 'valueless', 'worthless', 'insignificant', 'meritless', 'unimportant', 'unimportance', 'deficient', 'lacking', 'disfavored', 'useless'] 


Computing the cosine similarity between each entity word and each attribute.

In [None]:
for i in list_of_models:
  word_vector = api.load(i)
  cosines_to_csv('entities_and_moral_concern_cosines_{}.csv'.format(i), attributes, humans_and_animals, i)