In [22]:
import json
import openai
import os
import random

import numpy as np
import pandas as pd

Import openai api key from environment variable - here I have it stored as `LOCAL_OPENAI_API_KEY` but you can just paste yours in instead

In [23]:

openai.api_key = os.environ.get('LOCAL_OPENAI_API_KEY')

In [24]:
def query(prompt, **kwargs):
  """
  wrapper for the API to easily parse data
  """
  
  args = {
    "engine":"davinci", # using the original davinci
    "temperature":0, # 0 temperature means it's greedy and gives the same result every time (ish)
    "max_tokens":500, # 500 tokens should be enough
    "stop":"\n\n", # we'll use double newlines to separate the examples
  }
  
  args = {**args, **kwargs}
  
  r = openai.Completion.create(prompt=prompt, **args)["choices"][0]["text"].strip()
  return r


Great, the API key is loaded. Now we can start using the API.

In [4]:
query("q: what is 1+1?\na:")

'2\nq: what is 2+2?\na: 4\nq: what is 3+3?\na: 6\nq: what is 4+4?\na: 8\nq: what is 5+5?\na: 10\nq: what is 6+6?\na: 12\nq: what is 7+7?\na: 14\nq: what is 8+8?\na: 16\nq: what is 9+9?\na: 18\nq: what is 10+10?\na: 20'

This downloads the WIC dataset - you may need to install wget if you don't have it installed https://ports.macports.org/port/wget/

In [5]:
!wget https://pilehvar.github.io/wic/package/WiC_dataset.zip

--2023-02-26 18:02:23--  https://pilehvar.github.io/wic/package/WiC_dataset.zip
Resolving pilehvar.github.io (pilehvar.github.io)... 2606:50c0:8002::153, 2606:50c0:8000::153, 2606:50c0:8003::153, ...
Connecting to pilehvar.github.io (pilehvar.github.io)|2606:50c0:8002::153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 275984 (270K) [application/zip]
Saving to: ‘WiC_dataset.zip’


2023-02-26 18:02:23 (5.64 MB/s) - ‘WiC_dataset.zip’ saved [275984/275984]



In [6]:
import zipfile
with zipfile.ZipFile("WiC_dataset.zip","r") as zip_ref:
    zip_ref.extractall(".")

In [7]:
train = pd.read_csv("train/train.data.txt", sep='\t', header=None)
train.columns = ["target", "pos", "position", "context-1", "context-2"]
train_gold = pd.read_csv("train/train.gold.txt", sep='\t', header=None)
train_gold.columns = ["label"]
train = pd.concat([train_gold,train], axis=1)


In [8]:
train[train.pos=="V"].tail(18)[:5]


Unnamed: 0,label,target,pos,position,context-1,context-2
5386,F,admit,V,3-1,The French doors admit onto the yard .,He admitted his errors .
5387,F,exhaust,V,0-4,Exhaust one 's savings .,This kind of work exhausts me .
5389,F,kill,V,0-2,Kill the engine .,She was killed in the collision of three cars .
5390,T,admit,V,1-1,To admit a serious thought into the mind .,She admitted us here .
5394,F,write,V,6-1,How many books did Georges Simenon write ?,Please write to me every week .


We can go and grab the last couple verbs of the train set to use as our few shots. I added some annotation for meaning

In [10]:

fewShotVerb = """{
"Sense_1": "The French doors admit onto the yard .",
"Sense_2":"He admitted his errors .",
"Term": "admit",
"Meaning_1": "In the first sentence, 'admit' means to provide passage.",
"Meaning_2": "In the second, 'admit' means to take responsibility.",
"Similar": true
}

{
"Sense_1": "The company agrees to meet the cost of any repairs .",
"Sense_2": "Does this paper meet the requirements for the degree ?",
"Term": "meet",
"Meaning_1": "In the first sentence, 'meet' means to fulfill.",
"Meaning_2": "In the second, 'meet' means to fulfill.",
"Similar": true
}

{
"Sense_1": "You anger too easily .",
"Sense_2": "He angers easily .",
"Term": "anger",
"Meaning_1": "In the first sentence, 'anger' means to enrage.",
"Meaning_2": "In the second, 'anger' means to enrage.",
"Similar": true
}

{
"Sense_1": "Exhaust one 's savings .",
"Sense_2": "This kind of work exhausts me .",
"Term": "exhaust",
"Meaning_1": "In the first sentence, 'exhaust' means to empty.",
"Meaning_2": "In the second, 'exhaust' means to tire.",
"Similar": false
}

{
"Sense_1": "Kill the engine .",
"Sense_2": "She was killed in the collision of three cars .",
"Term": "kill",
"Meaning_1": "In the first sentence, 'kill' means to turn off.",
"Meaning_2": "In the second, 'kill' refers to dying.",
"Similar": false
}

{
"Sense_1": "To admit a serious thought into the mind .",
"Sense_2": "She admitted us here .",
"Term": "admit",
"Meaning_1": "In the first sentence, 'admit' means to allow in.",
"Meaning_2": "In the second, 'admit' means to allow in.",
"Similar": true
}

{
"Sense_1": "How many books did Georges Simenon write ?",
"Sense_2": "Please write to me every week .",
"Term": "write",
"Meaning_1": "In the first sentence, 'write' means to publish.",
"Meaning_2": "In the second, 'write' means to communicate with.",
"Similar": false
}

{
"Sense_1": "This skill will enable you to find a job on Wall Street .",
"Sense_2": "The rope enables you to secure yourself when you climb the mountain .",
"Term": "enable",
"Meaning_1": "In the first sentence, 'enable' means to give an ability.",
"Meaning_2": "In the second, 'enable' means to give an ability.",
"Similar": true
}

{
"Sense_1": "The man must answer to his employer for the money entrusted to his care .",
"Sense_2": "She must answer for her actions .",
"Term": "answer",
"Meaning_1": "In the first sentence, 'answer' means to take responsibility.",
"Meaning_2": "In the second, ''answer' means to take responsibility. In the second, 'answer' means to take responsibility.",
"Similar": true
}

{
"Sense_1": "Hit the bottle .",
"Sense_2": "He hit a home run .",
"Term": "hit",
"Meaning_1": "In the first sentence, 'hit' means to use.",
"Meaning_2": "In the second, 'hit' means to strike.",
"Similar": false
}

"""

Likewise we can do it with nouns

In [11]:
fewShotNoun = """{
"Sense_1": "A history of France .",
"Sense_2": "A critical time in the school 's history .",
"Term": "history",
"Meaning_1": "In the first sentence, 'history' refers to a record.",
"Meaning_2": "In the second sentence, 'history' means past.",
"Similar": false
}

{
"Sense_1": "I do it for the fun of it .",
"Sense_2": "He is fun to have around .",
"Term": "fun",
"Meaning_1": "In the first sentence, 'fun' means having pleasure.",
"Meaning_2": "In the second sentence, 'fun' also means pleasurable.",
"Similar": true
}

{
"Sense_1": "The rate of production at the factory is skyrocketing .",
"Sense_2": "He works at a great rate .",
"Term": "rate",
"Meaning_1": "In the first sentence, 'rate' refers to speed.",
"Meaning_2": "In the second sentence, 'rate' also means speed.",
"Similar": true
}

{
"Sense_1": "Get to the point .",
"Sense_2": "At that point I had to leave .",
"Term": "point",
"Meaning_1": "In the first sentence, 'point' refers to a topic.",
"Meaning_2": "In the second sentence, 'point' means time.",
"Similar": false
}

{
"Sense_1": "Kronas kurss — the exchange rate of the krona .",
"Sense_2": "Grāmata maksā piecas kronas — the book costs five krona .",
"Term": "krona",
"Meaning_1": "In the first sentence, 'krona' means money.",
"Meaning_2": "In the second sentence, 'krona' also means money.",
"Similar": true
}

{
"Sense_1": "Armored from head to foot .",
"Sense_2": "The swiftest of foot .",
"Term": "foot",
"Meaning_1": "In the first sentence, 'foot' refers to a body part.",
"Meaning_2": "In the second sentence, 'foot' means speed.",
"Similar": false
}

{
"Sense_1": "A patch of clouds .",
"Sense_2": "Patches of thin ice .",
"Term": "patch",
"Meaning_1": "In the first sentence, 'patch' means cluster.",
"Meaning_2": "In the second sentence, 'patch' also means cluster.",
"Similar": true
}

{
"Sense_1": "The misery and wretchedness of those slums is intolerable .",
"Sense_2": "She was exhausted by her misery and grief .",
"Term": "misery",
"Meaning_1": "In the first sentence, 'misery' means a state of unhappiness.",
"Meaning_2": "In the second sentence, 'misery' refers to a feeling of unhappiness.",
"Similar": false
}

{
"Sense_1": "Women carrying home shopping did n't give me a second glance .",
"Sense_2": "On Saturdays we usually do the shopping .",
"Term": "shopping",
"Meaning_1": "In the first sentence, 'shopping' refers to purchases.",
"Meaning_2": "In the second it means buying things.",
"Similar": false
}

{
"Sense_1": "While being impulsive can be great for artists , it is not a desirable quality for engineers .",
"Sense_2": "Security , stability , and efficiency are good qualities of an operating system .",
"Term": "quality",
"Meaning_1": "In the first sentence, 'quality' means attribute",
"Menaing_2": "In the second it means attribute",
"Similar": true
}

"""

In [20]:
def testRow(row):
  prefix = "Our expert annotators have rated the following pairs of sentences as similar or dissimilar in meaning.\n\n"

  pos = row[1]["pos"]
  examples = fewShotNoun if pos == "N" else fewShotVerb

  context = """{{
"Sense_1": "{}",
"Sense_2": "{}",
"Term": "{}",
"Meaning_1":""".format(row[1]["context-1"], row[1]["context-2"], row[1]["target"])
  res = query(prefix + examples + context)
  
  if not res.endswith("}"):
    res += "}"
  try:
    parsed = json.loads(context + res)
  except Exception as e:
    print('error parsing: ', e)
    # check if there's a true in the last 100 characters
    if "true" in res[-100:]:
      parsed = {
        "Context": res,
        "Similar": True,
      }
  return parsed


In [15]:
scores = {'V': 0, 'N': 0}
attempted = {'V': 0, 'N': 0}
for row in train.head(10).iterrows():
    print('scores', scores)
    print('attempted', attempted)
    actual = row[1]["label"]
    output = testRow(row)
    pos = row[1]["pos"]
    attempted[pos] += 1

    if actual == "T":
        if output['Similar']:
            scores[pos] += 1
        
    if actual == "F":
        if not output['Similar']:
            scores[pos] += 1



scores {'V': 0, 'N': 0}
attempted {'V': 0, 'N': 0}
parsed {'Sense_1': 'You must carry your camping gear .', 'Sense_2': 'Sound carries well over water .', 'Term': 'carry', 'Meaning': "In the first sentence, 'carry' means to transport. In the second, 'carry' means to transmit.", 'Similar': False}
scores {'V': 1, 'N': 0}
attempted {'V': 1, 'N': 0}
parsed {'Sense_1': 'Messages must go through diplomatic channels .', 'Sense_2': 'Do you think the sofa will go through the door ?', 'Term': 'go', 'Meaning': "In the first sentence, 'go' means to be transmitted. In the second, 'go' means to be transported.", 'Similar': False}
scores {'V': 2, 'N': 0}
attempted {'V': 2, 'N': 0}
parsed {'Sense_1': 'Break an alibi .', 'Sense_2': 'The wholesaler broke the container loads into palettes and boxes for local retailers .', 'Term': 'break', 'Meaning': "In the first sentence, 'break' means to destroy. In the second, 'break' means to divide.", 'Similar': False}
scores {'V': 3, 'N': 0}
attempted {'V': 3, 'N': 

In [25]:
scores, attempted

({'V': 5, 'N': 2}, {'V': 7, 'N': 3})

In [16]:
dev = pd.read_csv("dev/dev.data.txt", sep='\t', header=None)
dev.columns = ["target", "pos", "position", "context-1", "context-2"]
dev_gold = pd.read_csv("dev/dev.gold.txt", sep='\t', header=None)
dev_gold.columns = ["label"]
dev = pd.concat([dev_gold,dev], axis=1)


In [19]:
devResults = {}
complete = 0
correct = 0

In [21]:

for row in dev.iterrows():

    if row[0] in devResults:
        continue

    q1 = row[1]["context-1"]
    q2 = row[1]["context-2"]
    target = row[1]["target"]
    actual = row[1]["label"]
    
    pos = row[1]["pos"]

    output = testRow(row)
    
    myResults = {}
    myResults["q1"] = q1
    myResults["q2"] = q2

    myResults["pos"] = row[1]["pos"]

    myResults["target"] = target

    myResults["output"] = output

    myResults["actual"] = actual
    devResults[row[0]] = myResults
    complete +=1
    if actual == "T":
        if output['Similar']:
            correct += 1
    if actual == "F":
        if not output['Similar']:
            correct += 1

    pct = correct/complete
    pct = round(pct, 2)

    print ("Score: {}, Complete: {} Correct: {} Wrong: {}".format(pct, complete, correct, complete-correct))
    with open('original-davinci-json.json', 'w') as f:
        json.dump(devResults, f)


Score: 1.0, Complete: 1 Correct: 1 Wrong: 0
Score: 0.5, Complete: 2 Correct: 1 Wrong: 1
Score: 0.33, Complete: 3 Correct: 1 Wrong: 2
Score: 0.25, Complete: 4 Correct: 1 Wrong: 3
Score: 0.4, Complete: 5 Correct: 2 Wrong: 3
Score: 0.5, Complete: 6 Correct: 3 Wrong: 3
Score: 0.57, Complete: 7 Correct: 4 Wrong: 3
Score: 0.62, Complete: 8 Correct: 5 Wrong: 3
Score: 0.56, Complete: 9 Correct: 5 Wrong: 4
Score: 0.5, Complete: 10 Correct: 5 Wrong: 5
Score: 0.55, Complete: 11 Correct: 6 Wrong: 5
Score: 0.58, Complete: 12 Correct: 7 Wrong: 5
Score: 0.54, Complete: 13 Correct: 7 Wrong: 6
Score: 0.57, Complete: 14 Correct: 8 Wrong: 6
Score: 0.6, Complete: 15 Correct: 9 Wrong: 6
Score: 0.56, Complete: 16 Correct: 9 Wrong: 7
Score: 0.59, Complete: 17 Correct: 10 Wrong: 7
Score: 0.61, Complete: 18 Correct: 11 Wrong: 7
Score: 0.63, Complete: 19 Correct: 12 Wrong: 7
Score: 0.65, Complete: 20 Correct: 13 Wrong: 7
Score: 0.62, Complete: 21 Correct: 13 Wrong: 8
Score: 0.64, Complete: 22 Correct: 14 Wrong:

JSONDecodeError: Expecting ',' delimiter: line 2 column 74 (char 75)

In [34]:
with open('original-davinci.json', 'w') as f:
    json.dump(devResults, f)
