## Files used in bertFuncs.py

In [1]:
from bertFuncs import analyzeWord, getBert
from createDims import createPolarDimension
import nltk
from nltk.corpus import wordnet as wn
import numpy as np
import pickle
import json
import string
import ast
import torch

  from .autonotebook import tqdm as notebook_tqdm


## Creating the required lookup files

The lookup files are needed to set up the POLAR dimensions and to match these dimensions to word sense definitions and example sentences, when analyzing the result.

The function ``create_lookup_files`` takes a list of lists as input. Each inner list contains a polar sense pair, where each word sense must be in WordNet readable format e.g. ``cold.a.01``), in orderder to automatically retrieve definitions and example sentences. All lookup files will be stored in the folder ``lookup_path``. 

In [29]:
import uuid


In [30]:
uuid.uuid4()

UUID('a4c72377-ee1b-42fa-a29a-a35163cc2164')

In [2]:
# helper functions

def get_name(antonym):
    return wn.synset(antonym).lemma_names()[0]

def get_examples(antonym):
    examples = wn.synset(antonym).examples()
    # replace punctuation symbols with spaces
    examples = [sent.translate(str.maketrans({k: " " for k in string.punctuation})) for sent in examples]
    # add a space after each sentence
    return ['{} '.format(sent) for sent in examples]

In [3]:
def create_lookup_files(antonyms, lookup_path):
    if len(np.unique(antonyms, axis=0)) != len(antonyms):
        print("Your antonym list contains duplicates. Please try again!")
        return
    
    # get all word sense definitions
    synset_defs = [[wn.synset(anto).definition() for anto in pair] for pair in antonyms]
    # get example sentences from wordnet
    examples_readable = {str(pair):{get_name(anto): get_examples(anto) for anto in pair} for pair in antonyms}
    examples_lookup = [[[get_name(anto), get_examples(anto)] for anto in pair] for pair in antonyms]
    
    # save 
    with open(out_path + 'lookup_synset_dict.txt', 'w') as t:
        t.write(json.dumps(antonyms, indent=4))
    with open(out_path + 'lookup_synset_dict.pkl', 'wb') as p:
        pickle.dump(antonyms, p)
    with open(lookup_path + 'lookup_synset_definition.txt', 'w') as t:
        t.write(json.dumps(synset_defs, indent=4))  
    with open(lookup_path + 'lookup_synset_definition.pkl', 'wb') as p:
        pickle.dump(synset_defs, p)        
    with open(lookup_path + 'antonym_wordnet_example_sentences_readable_extended.txt', 'w') as t:
        t.write(json.dumps(examples_readable, indent=4))  
    with open(lookup_path + 'lookup_anto_example_dict.txt', 'w') as t:
        t.write(json.dumps(examples_lookup, indent=4))      
    with open(lookup_path + 'lookup_anto_example_dict.pkl', 'wb') as p:
        pickle.dump(examples_lookup, p)
    return


In [38]:
wn.synsets("hot")[0].name()

'hot.a.01'

In [22]:
definitions = [wn.synsets("two")[i].definition() for i in range(5)]
definitions

IndexError: list index out of range

## Example usage

Polar dimensions should be __given__ as nested list of antonym pairs in wordnet representation (sense-annotated).

_Example:_   
`
[
    ['a_posteriori.a.01', 'a_priori.a.01'],
    ['abaxial.a.01', 'adaxial.a.01'],
    ['abridge.v.01', 'elaborate.v.01'],
    ...
]`

In [4]:
# folder in which all lookup files will be stored
out_path = 'antonyms/'

In [5]:
# define 3 exemplary POLAR dimensions
dims = [['cold.a.01', 'hot.a.01'], ['bad.a.01', 'good.a.01'], ['intelligent.a.01', 'unintelligent.a.01'], ['capable.a.01', 'incapable.a.01']]
    
# create all lookup files
create_lookup_files(dims, out_path)

In [6]:
# get the embedding model 
tokenizer, model = getBert()

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Create the POLAR matrix (for base change or projection) from a given set of antonyms. The antonyms and their example sentences are forwarded to an embedding model (here: BERT) from which the required embeddings and difference vectors are created. ``antonym_path`` specifies where the readable example sentence lookup file is currently stored. ``out_path`` specifies where the POLAR matrix should be stored.

The corresponding function can be found in ``createDims.py`` which is not part of the official SensePOLAR repo.

In [7]:
# create the base change matrix (this might take some time)
createPolarDimension(model, tokenizer, out_path=out_path, antonym_path=out_path + "antonym_wordnet_example_sentences_readable_extended.txt")

Start forwarding the Polar opposites ...


In [8]:
# base change does not work well with only few dimensions -> compare with projection
antonym_path = out_path + "polar_dimensions.pkl"
word = "school"
context = "school teaches you a lot of smart things"
analyzeWord(word, context, model=model,tokenizer=tokenizer, antonym_path=antonym_path, lookup_path=out_path, numberPolar=4) #method="projection"

Analyzing the word:  school
In the context of:  school teaches you a lot of smart things
Top:  1
Dimension:  capable<------>incapable
Definitions:  (usually followed by `of') having capacity or ability<------>(followed by `of') lacking capacity or ability
Value: -0.21221492


Top:  2
Dimension:  bad<------>good
Definitions:  having undesirable or negative qualities<------>having desirable or positive qualities especially those suitable for a thing specified
Value: -0.15057667


Top:  3
Dimension:  intelligent<------>unintelligent
Definitions:  having the capacity for thought and reason especially to a high degree<------>lacking intelligence
Value: -0.1265351


Top:  4
Dimension:  cold<------>hot
Definitions:  having a low or inadequate temperature or feeling a sensation of coldness or having been made cold by e.g. ice or refrigeration<------>used of physical heat; having a high or higher than desirable temperature or giving off heat or feeling or causing a sensation of heat or burning


['capable---incapable',
 'bad---good',
 'intelligent---unintelligent',
 'cold---hot']

In [9]:
antonym_path = out_path + "polar_dimensions.pkl"
word = "fire"
context = "the fire is burning"

analyzeWord(word, context, model=model, tokenizer=tokenizer, antonym_path=antonym_path, lookup_path=out_path, numberPolar=4, method="projection")

Analyzing the word:  fire
In the context of:  the fire is burning
Top:  1
Dimension:  cold<------>hot
Definitions:  having a low or inadequate temperature or feeling a sensation of coldness or having been made cold by e.g. ice or refrigeration<------>used of physical heat; having a high or higher than desirable temperature or giving off heat or feeling or causing a sensation of heat or burning
Value:                      3.1664367


Top:  2
Dimension:  intelligent<------>unintelligent
Definitions:  having the capacity for thought and reason especially to a high degree<------>lacking intelligence
Value: -1.3334554


Top:  3
Dimension:  capable<------>incapable
Definitions:  (usually followed by `of') having capacity or ability<------>(followed by `of') lacking capacity or ability
Value:                      0.23145732


Top:  4
Dimension:  bad<------>good
Definitions:  having undesirable or negative qualities<------>having desirable or positive qualities especially those suitable for a 

['cold---hot',
 'intelligent---unintelligent',
 'capable---incapable',
 'bad---good']