# Multi-shot GPT-4o completions for Edwardian prompts

We select passages from Edwardian books and ask GPT-4o to continue in the same style. In this version of the notebook, however, we provide a much richer multi-shot prompt to give the model more stylistic examples to work with.

In [25]:
from glob import glob
import pandas as pd
import textwrap
import random

import backoff
import openai

from nltk.tokenize import sent_tokenize

In [11]:
edward = pd.read_csv('edwardian_double_segments.tsv', sep='\t')
edward.shape

(6961, 3)

In [12]:
edward.head()

Unnamed: 0,source,date,segment
0,mdp.39015008736798,1912,": : '-: •. '- VNLEY PAUL A. CO . / -- SKX s""'b..."
1,mdp.39015008736798,1912,"collection has been often praised, and it has,..."
2,mdp.39015008736798,1912,"8vo, Boston, USA. 1907), edited by Mr. Q. Pier..."
3,mdp.39015008736798,1912,the Soci6t6 de Histoire 1 This Collection is k...
4,mdp.39015008736798,1912,THE ACTOR 38 IV. THE ADMIUEH OF SHAKESPEARE .....


In [13]:
# How many sources are there?
edward['source'].nunique()

31

In [14]:
# Split the data into two sets: one to use for multi-shot prompting and one to use
# for the final completion. We'll split by the source, so that we know that the
# model hasn't seen any of the sources in the final completion.

# First create a list of sources
sources = edward['source'].unique()
# Then shuffle the sources
random.seed(42)
random.shuffle(sources)
# Now split the sources into two lists. We'll use the first 10 for the
# multi-shot prompting and the rest for the final completion.

num_sources_multi = 10
sources_multi = sources[:num_sources_multi]
sources_final = sources[num_sources_multi:]

for_multi = edward[edward['source'].isin(sources_multi)]
for_final = edward[edward['source'].isin(sources_final)]


In [15]:
# What is the average length of a segment in edward, measured in words?
edward['length'] = edward['segment'].apply(lambda x: len(x.split()))
edward['length'].describe()

count    6961.000000
mean      287.129148
std        19.645036
min       103.000000
25%       281.000000
50%       292.000000
75%       300.000000
max       305.000000
Name: length, dtype: float64

In [16]:
# pretty-print the first five segments in edward

for i in range(5):
    print(f"Segment {i+1}:\n")
    print(textwrap.fill(edward.iloc[i]['segment'], width=70))
    print("\n")


Segment 1:

: : '-: •. '- VNLEY PAUL A. CO . / -- SKX s"'b»r; rf ^rtiAN./, v, .<
A COSMOPOLITAN ACTOR DAVID GARRIGK AND HIS FRENCH FRIENDS BY FRANK A.
HEDGGOGK DOCTEUR S LETTERS, PARIS ; LECTURER IN FRENCH LITERATURE IN
THE UNIVERSITY OF BIRMINGHAM WITH PHOTOGRAVURE FRONTISPIECE AND
SIXTEEN ILLUSTRATIONS IN HALF tone FROM PICTURES, ENGRAVINGS, ETC., OF
THE PERIOD LONDON STANLEY PAUL & GO 31 ESSEX STREET, STRAND, W.C.
PRINTED BT llA/KLI, , WATSON AND VINRY, LD., LONDON AKD ATLE8BURT. • J
DAVID GARRICK, our great English actor, enjoyed fa European
reputation. In France especially he had almost as many discerning
admirers and fer + vent friends as in his own country, and with them
he remained in relation for many years. This aspect of his career,
little studied so far by his biographers, forms the principal object
of this essay, the composition of which may further be justified by a
short criticism of our sources of information. Two contemporaries of
Garrick have told the story of his lif

In [17]:
def print_wrapped_text(text, width=70):
    texts = text.split('\n')
    if len(texts) > 1:
        for t in texts:
            print_wrapped_text(t, width=70)

    else:
        text = texts[0]
        wrapper = textwrap.TextWrapper(width=width)
        wrapped_text = wrapper.fill(text)
        print(wrapped_text)
    
with open('credentials.txt', encoding = 'utf-8') as f:
    organization = f.readline().strip()
    api_key = f.readline().strip()
    
client = openai.OpenAI(organization=organization, api_key=api_key)

### The function that actually calls the API

We surround this with ```backoff``` instructions to avoid errors.

In [33]:
@backoff.on_exception(
    backoff.expo,
    openai.RateLimitError,
    max_time=60,  # Set a maximum wait time in seconds (adjust as needed)
    giveup=lambda e: False  # This prevents giving up on retries
)
def completions_with_backoff(**kwargs):
    global client
    try:
        return client.chat.completions.create(**kwargs)
    except openai.APIError as e:
        print(f"Error: {e}")
        raise  # Re-raise the error to trigger the retry mechanism

In [26]:
def make_prompt_pairs(df):
    '''Randomly select 40 rows of df, and split the 
    'segment' field of each row into sentences. Take the
    first half of the sentences as the user prompt and the
    second half as the assistant response. Return a list of
    tuples, where each tuple contains the user prompt and
    the assistant response.'''
    # Randomly select 40 rows of df
    selection = df.sample(n=40, random_state=42)
    # Split the 'segment' field into sentences

    all_pairs = []

    for idx, row in selection.iterrows():
        # Split the segment into sentences
        sentences = sent_tokenize(row['segment'])
        # Take the first half of the sentences as the user prompt
        # and the second half as the assistant response
        mid = len(sentences) // 2
        user_prompt = ' '.join(sentences[:mid])
        assistant_response = ' '.join(sentences[mid:])
        all_pairs.append((user_prompt, assistant_response))
    
    return all_pairs
    

In [27]:
pairs = make_prompt_pairs(for_multi)

# print the first five pairs

for i in range(5):
    print(f"Pair {i+1}:\n")
    print("User prompt:")
    print_wrapped_text(pairs[i][0])
    print("\nAssistant response:")
    print_wrapped_text(pairs[i][1])
    print("\n")

Pair 1:

User prompt:
She touched him with her finger tips, she kissed his throat, his
wrists, the palms of his hands, his eyelids, his hair. Strange, subtle
kisses, unlike the kisses o women. And often, between her purrings,
she murmured love words in some strange fierce language of her own,
brushing his ears and his eyes with her lips the while. And through it
all Paul slept on, the Eastern perfume in the air still drugging his
sense. It was quite dark when he awoke again, and beside him — seated
on the floor, all propped with pillows, his lady reclined her head
against his 121  shoulder. rand as he looked down at her in the
firelight's flickering gleam, he saw that her wonderful eyes were wet
with great glittering tears. "My soul, my soul!" he said tenderly, his
heart wrung with emotion. "What is it, sweetheart — why have you these
tears?

Assistant response:
Oh! what have I done — darling, my own?" "I am weary, " she said, and
fell to weeping softly, and refused to be comforted. Pa

In [35]:
def submit_prompt(system_prompt, edwardian_prompt_pairs, to_complete, temperature):

    prompt = [{"role": "system", "content": system_prompt}]

    for i, pair in enumerate(edwardian_prompt_pairs):
        prompt.append({"role": "user", "content": pair[0]})
        prompt.append({"role": "assistant", "content": pair[1]})
        if i > 19:
            break
    
    prompt.append({"role": "user", "content": to_complete})
    
    p = list(prompt)
    # print(p)
    try:
        completion = completions_with_backoff(
            model = "gpt-4-turbo",
            messages = p,
            max_tokens = 220,
            temperature = temperature
        )
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        
    return completion

In [29]:
def get_system_prompt():
    system_prompt = "Your task is to complete passages from early twentieth-century books, in close to the same style. \
You will be given a passage from a book published in the period 1905-1914. Continue this passage in the same style, \
for at least 140 words. Match diction as closely as possible and avoid modern anachronisms. \
Start immediately by continuing the passage; do not make any framing remarks, like 'here is the continuation:'."

    return system_prompt

In [30]:
system_prompt = get_system_prompt()
print_wrapped_text(system_prompt, width=70)

Your task is to complete passages from early twentieth-century books,
in close to the same style. You will be given a passage from a book
published in the period 1905-1914. Continue this passage in the same
style, for at least 140 words. Match diction as closely as possible
and avoid modern anachronisms. Start immediately by continuing the
passage; do not make any framing remarks, like 'here is the
continuation:'.


In [28]:
# Create a list of prompts to complete by selecting 2000 rows of for_final
# and splitting the 'segment' field into sentences. Take the first half of the
# sentences as the user prompt.

prompts = []
for_final = for_final.sample(n=2000, random_state=42)
for idx, row in for_final.iterrows():
    # Split the segment into sentences
    sentences = sent_tokenize(row['segment'])
    # Take the first half of the sentences as the user prompt
    # and the second half as the assistant response
    mid = len(sentences) // 2
    user_prompt = ' '.join(sentences[:mid])
    prompts.append(user_prompt)

print(len(prompts))

2000


In [36]:
# this is a comment

continuations = []
printnext = True
ctr = 0

for edwardian_prompt in prompts:
    ctr += 1

    temperature = round(random.uniform(0.5, 0.7), 3)

    system_prompt = get_system_prompt()
    prompt_pairs = make_prompt_pairs(for_multi)

    if printnext:
        print_wrapped_text(edwardian_prompt)
        print()

    try:
        completion = submit_prompt(system_prompt, prompt_pairs, edwardian_prompt, temperature)
    except Exception as e:
        print(f"Error: {e}")
        continue

    continuation = completion.choices[0].message.content
    continuation = continuation.replace('\n', ' ').replace('\t', ' ').replace('  ', ' ')
    responselen = len(continuation.split())   # token approximation

    if responselen < 140:
        print(f"Response too short: {responselen} tokens")
        continuations.append('')
        continue

    continuations.append(continuation)
    if printnext:
        print_wrapped_text(continuation)
        print('-------------------\n')

    if ctr % 10 == 5:
        print(f"Completed {ctr} prompts\n")
        printnext = True
        # sofar = prompts[:ctr].copy()
        # sofar['continuation'] = continuations
        # sofar.to_csv('new_GPT4o_continuations.tsv', sep='\t', index=False)
    else:
        printnext = False



It is, therefore, entirely misleading to think that we really perceive
that which is offered to our senses. We perceive of it just as much as
we are prepared to perceive, and our preparation depends upon our
general conceptions which control our modes of motor behaviour. We
perceive just what we are seeking. Ever so many adults move in the
midst of nature and do not see anything of the differences of the
flowers or of the birds, and their interest is not in the least
stirred up by the physical and chemical phenomena which surround them.
Their education and their life work has not trained them in reacting
on those differences, and that upon which they are not reacting does
not exist for them.

Thus, perception is not merely a passive reception of data; it is an
active process shaped by our preconceptions, our knowledge, and our
attentiveness. The mind, rather than being a blank slate upon which
the world imprints itself, is more akin to a lens, filtering and
interpreting sensory informa

In [37]:
print(len(prompts), len(continuations))

2000 2000


In [42]:
df = pd.DataFrame({'prompt': prompts, 'text': continuations, 'label': for_final['date'][0:2000].values})
df['prompt'] = df['prompt'].str.replace('\n', ' ').replace('\t', ' ').replace('  ', ' ')
df['text'] = df['text'].str.replace('\n', ' ').replace('\t', ' ').replace('  ', ' ')
df = df.dropna()
df.shape

(2000, 3)

In [43]:
# drop rows with empty continuations
df = df[df['text'] != '']
df = df[df['text'] != ' ']
print(df.shape)

(1910, 3)


In [44]:
df.to_csv('20shot_GPT4o_continuations.tsv', sep='\t', index=False)