# Copycat

In reading about GPT-3 I came across Melanie Mitchell's article ["Can GPT-3 Make Analogies?"](https://medium.com/@melaniemitchell.me/can-gpt-3-make-analogies-16436605c446). Both this article and [its follow-up](https://medium.com/@melaniemitchell.me/follow-up-to-can-gpt-3-make-analogies-b202204bd292) were written in August of 2020 and it seems GPT-3 has changed quite a bit since then. Most notably GPT-3 now offers four different models, each with their own tradeoffs between speed, cost, and capabilities. The most capable (and most expensive) model, `text-da-vinci-002`, uses training data from as recently as June of 2021. 

This made me curious: 
  1. With the changes to GPT-3, has its ability to solve letter-string analogy problems improved? 
  2. How does model selection affect performance?
  3. How can fine-tuning be used to increase performance?
  
I'll start by feeding GPT-3 the same prompts Dr. Mitchell did (from the original article as well as its follow-up) using `text-da-vinci-002` through the OpenAI Python API. I'll skip some prompts she used in situations where she needed to provide extra training examples if it seems like GPT-3 already "gets it". 

If you'd like to run or modify this code you'll just need to get an OpenAI API key and set your OPENAI_API_KEY env variable to it. I'll try to keep track of the overall cost in credits as I go. Additionally, I've written a few classes and helper functions in `gpt_helpers.py` to make iterating over different inputs a little easier. 

## How gpt_helpers.py works
The `LetterStringAnalogySolver` class configures the parameters passed to GPT-3, formats input, and displays the response. Configurable GPT-3 parameters are limited to the model name (required) and temperature for now (`max_tokens` is set as a constant).
The base prompt which the inputs are formatted into is also configurable. If not set it will default to:

```
"Q: if {example_source} changes to {example_target} , what does {challenge_source} change to?\nA: {challenge_target}"
```
If modified, the only requirement on the base prompt is that it includes the same format variables `example_source`, `example_target`, `challenge_source`, and `challenge_target`.

The input (prompt data) is a list of lists of strings which will be formatted (in order!) into the base prompt. Formatting includes inserting a space between each character (to avoid issues caused by GPT-3's byte-pair encoding), and cases are preserved. So, for example:
```
input = [
    ["aaa", "bbb", "ccc", "ddd"],
    ["fff", "ggg", "hhh", ""]
  ]
```
would yield the prompt
```
Q: if a a a changes to b b b , what does c c c change to?
A: d d d
Q: if f f f changes to g g g , what does h h h change to?
A:
```
Note that the last element of the last list is empty since we want GPT-3 to tell us what it thinks the `challenge_target` is. 

To pass the input to GPT-3 and receive a response, pass the prompt data to `LetterStringAnalogySolver.challenge()`.
To run each request multiple times, set the `trials` parameter. 

### Setup
I'm going to start with the model `text-davinci-002` as it's the most powerful, and I'll use the default temperature of 0.7 and run each prompt 5 times as Dr. Mitchell did. 

In [None]:
from gpt_helpers import LetterStringAnalogySolver, ModelName

solver              = LetterStringAnalogySolver()
solver.model        = ModelName.DAVINCI
solver.temperature  = 0.7
solver.trials       = 5

### Experiment 1: Simple alphabetic sequences

In [None]:
""" 
Zero-shot
Expected answer: p q s 
Original results:
a b d
p q r 
p q r
c d
a b c p q r a b c
"""
ex1_1_input = [
    ["abc", "abd", "pqr", ""]
]
solver.challenge(ex1_1_input)

In [None]:
""" 
One-shot
Expected answer: i j l 
Original results:
i j l (each trial)

"""
ex1_2_input = [
    ["abc", "abd", "pqr", "pqs"],
    ["abc", "abd", "ijk", ""]
]
solver.challenge(ex1_2_input)

In [None]:
""" 
Generalizing to different string lengths, zero-shot
(Not in original article) 
Expected answer: i j k l n
"""
ex1_3_oneshot_input = [
    ["abc", "abd", "ijklm", ""]
]
solver.challenge(ex1_3_oneshot_input)

In [None]:
""" 
Generalizing to different string lengths 
Expected answer: i j k l n
Original results:
i j l m
i j k m
i j m
i j l
i j k n
"""
ex1_3_input = [
    ["abc", "abd", "pqr", "pqs"],
    ["abc", "abd", "ijklm", ""],
]
solver.challenge(ex1_3_input)

### Experiment 2: Alphabetic sequences with grouping

In [None]:
""" 
Zero-shot 
Expected answer: i i j j l l
Original response:
Not shown, but they were all incorrect
"""
ex2_1_input = [
    ["abc", "abd", "iijjkk", ""]
]
solver.challenge(ex2_1_input)

In [None]:
""" 
One-shot
Expected answer: m m n n p p 
Original response:
m m n n p p (each trial)
"""
ex2_2_input = [
    ["abc", "abd", "iijjkk", "iijjll"],
    ["abc", "abd", "mmnnoo", ""]
]
solver.challenge(ex2_2_input)

In [None]:
""" 
Generalizing to different string lengths
Expected answer: q q r r s s u u 
Original response: 
q q r r s s t t
q q r r s s u u
q q r r s s u u v
q q r r s s t u
q q r r s s u u v
"""
ex2_3_input = [
    ["abc", "abd", "iijjkk", "iijjll"],
    ["abc", "abd", "qqrrsstt", ""]
]
solver.challenge(ex2_3_input)

In [None]:
""" 
Providing two training examples 
Expected answer: e e f f g g h h j j
Original response:
e e f f g g h h j j
e e f f g g i i
e e f f g g i i j j
e e f f g g h h i i
e e f f g g i i
"""
ex2_4_input = [
    ["abc", "abd", "iijjkk", "iijjll"],
    ["abc", "abd", "mmnnoopp", "mmnnooqq"],
    ["abc", "abd", "eeffgghhii", ""]
]

solver.challenge(ex2_4_input)

### Experiment 3: Cleaning up a string

In [None]:
""" 
Zero-shot
(Not in original article) 
Expected answer: m n o p q r 
"""
ex3_1_zeroshot_input = [
    ["abbcde", "abcde", "mnoopqr", ""]
]
solver.challenge(ex3_1_zeroshot_input)


In [None]:
""" 
One-shot 
Expected answer: m n o q p r 
Original response:
m n o p q r
m n o p q r
m n p q r
m n p q r 
m n o p q r
"""
ex3_1_input = [
    ["abbcde", "abcde", "pqrrst", "pqrst"],
    ["abbcde", "abcde", "mnoopqr", ""]
]
solver.challenge(ex3_1_input)

In [None]:
""" 
Expected answer: m n o p 
Original response:
m n o
m n p
m n o p
m n o
m n p
"""
ex3_2_input = [
    ["axbxcx", "abc", "pxqxxrx", "pqr"],
    ["axbxcx", "abc", "rxsxtxx", "rst"],
    ["axbxcx", "abc", "mxnxoxxp", ""]
]

solver.challenge(ex3_2_input)

In [None]:
""" 
Using the character to be removed at the start of the target string
Expected answer: i j k 
Original response:
Not shown, but incorrect each time.
"""
ex3_5_input = [
    ["axbxcx", "abc", "pxqxxrx", "pqr"],
    ["axbxcx", "abc", "rxsxtxx", "rst"],
    ["axbxcx", "abc", "mxnxoxxp", "mnop"],
    ["axbxcx", "abc", "xixxjxk", ""]
]
solver.challenge(ex3_5_input)

### Experiment 4: Analogies involving abstract examples of "successorship"

In [None]:
""" 
Generalizing from letter-successor to abstract number successor 
Expected answer: j y y q q q q 
Original response:
j y y r r r
j y y q q r 2
j y y q q q
j y y r r r
j y y q r
"""
ex4_1_input = [
    ["abc", "abd", "pqr", "pqs"],
    ["abc", "abd", "ijklm", "ijkln"],
    ["abc", "abd", "rstuvw", "rstuvx"],
    ["abc", "abd", "jyyqqq", ""],
]
solver.challenge(ex4_1_input)

In [None]:
""" 
Abstract numerical sequence 
Expected answer: b o o c c c v v v v
Original response:
b o o c c v v v v v v
b o o c c v v v v v v v v v v v v v
b o o c v v v
b o b o c c c v v v v
b o o c c c v v v v
"""
ex4_2_input = [
    ["qlg", "qllggg", "xmr", "xmmrrr"],
    ["qlg", "qllggg", "rmqd", "rmmqqqdddd"],
    ["qlg", "qllggg", "bocv", ""]
]
solver.challenge(ex4_2_input)

In [None]:
""" 
Replacing a substring with its successor 
Expected answer: s s t s t u v 
Original response:
s s t s t u v (each trial)
"""
ex4_3_input = [
    ["abc", "abd", "aababc", "aababcd"],
    ["abc", "abd", "ppqpqr", "ppqpqrs"],
    ["abc", "abd", "sststu", ""],
]
solver.challenge(ex4_3_input)

In [None]:
""" 
Generalizing the above to different-length target strings 
Expected answer: e e f e f g e f g h i
Original response:
Not shown, but it got 4/5 correct.
"""
ex4_4_input = [
    ["abc", "abd", "aababc", "aababcd"],
    ["abc", "abd", "ppqpqr", "ppqpqrs"],
    ["abc", "abd", "eefefgefgh", ""],
]
solver.challenge(ex4_4_input)

### Experiment 5: A letter with no successor

In [None]:
""" 
A letter with no successor 
Expected answer: x y a 
Original results:
x y a
x y w
x y b
x z y
x z b
"""
ex5_1_input = [
    ["abc", "abd", "pqr", "pqs"],
    ["abc", "abd", "ijk", "ijl"],
    ["abc", "abd", "xyz", ""],
]
solver.challenge(ex5_1_input)

### Bonus: Follow-up
One prompt from the follow-up article

In [None]:
""" 
Reversing a string 
Expected answer: v l q r y
Original results:
l q r y v
r l y q v
l y r q v
r y l v q
"""
ex6_1_input = [
    ["mxq", "qxm", "pabm", "mbap"],
    ["mxq", "qxm", "yrqlv", ""],
]
solver.challenge(ex6_1_input)