# Use GPT models for annotating social orientation tags
**Notebook outline:**
1. Load a sample text message conversation (M01000G9A_social_orientation.csv)
1. Pass this message to several OpenAI models and compare predictions

In [94]:
%load_ext autoreload
%autoreload 2
import math
import os
import openai
import pandas as pd
import tiktoken

import utils

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [95]:
utils.set_api_key(user='yanda')

In [96]:
openai.organization = os.getenv("OPENAI_ORG_ID")
openai.api_key = os.getenv("OPENAI_API_KEY")

In [97]:
models = openai.Model.list()

gpt_models = [x for x in models['data'] if 'gpt' in x['id']]

In [98]:
gpt_models[:1]

[<Model model id=gpt-3.5-turbo at 0x7f5f53571850> JSON: {
   "created": 1677610602,
   "id": "gpt-3.5-turbo",
   "object": "model",
   "owned_by": "openai",
   "parent": null,
   "permission": [
     {
       "allow_create_engine": false,
       "allow_fine_tuning": false,
       "allow_logprobs": true,
       "allow_sampling": true,
       "allow_search_indices": false,
       "allow_view": true,
       "created": 1683329687,
       "group": null,
       "id": "modelperm-t1KTzS4psIS6bJ1GTGGkPzDi",
       "is_blocking": false,
       "object": "model_permission",
       "organization": "*"
     }
   ],
   "root": "gpt-3.5-turbo"
 }]

In [228]:
# prices
gpt_4_model = 'gpt-4'
gpt_4_prompt_price = 0.03 # / 1K tokens
gpt_4_completion_price = 0.06 # / 1K tokens
gpt_3_model = 'gpt-3.5-turbo'
gpt_3_price = 0.002 # / 1K tokens
davinci_model = 'davinci'
davinci_price = 0.0200 # / 1K tokens

## Load data

In [100]:
sample_df = pd.read_csv('M01000G9A_social_orientation.csv')

In [134]:
sample_df['Utterance ID'] = range(len(sample_df))
sample_df['Utterance ID'] = sample_df['Utterance ID'] + 1

In [135]:
sample_df.head()

Unnamed: 0.1,Unnamed: 0,Utterance ID,Participant,Time,Original Text,chat_gpt_labels,chat_gpt_explanations,gpt_3.5_labels,gpt_3.5_explanations,gpt_4_labels,gpt_4_explanations,timestamp,impact_scalar,comment,Complete Line,Complete Line (Unknown Speaker)
0,0,1,A,2014-09-25 15:13:04 UTC,起床了吗,Unassuming-Ingenuous,This utterance does not provide enough informa...,Unassuming-Ingenuous,The speaker is asking a simple question in a s...,Aloof-Introverted,"Speaker A asks if Speaker B is awake, which is...",,,,Speaker 1 (1): 起床了吗,Speaker unknown (1): 起床了吗
1,1,2,B,2014-09-25 15:14:19 UTC,在干活呢 咋了,Unassuming-Ingenuous,The speaker is casual and does not exhibit any...,Unassuming-Ingenuous,The speaker is responding to the question in a...,Warm-Agreeable,Speaker B responds that they are working and a...,,,,Speaker 2 (2): 在干活呢 咋了,Speaker unknown (2): 在干活呢 咋了
2,2,3,A,2014-09-25 15:36:32 UTC,没啥jiuwenwen,Unassuming-Ingenuous,The speaker is casual and does not exhibit any...,Unassuming-Ingenuous,The speaker is responding to the previous stat...,Unassuming-Ingenuous,Speaker A says there's nothing much going on.,,,,Speaker 1 (3): 没啥jiuwenwen,Speaker unknown (3): 没啥jiuwenwen
3,3,4,A,2014-09-25 15:36:32 UTC,喔喔对哦,Unassuming-Ingenuous,The speaker is casual and does not exhibit any...,Unassuming-Ingenuous,The speaker is acknowledging the previous stat...,Warm-Agreeable,Speaker A acknowledges Speaker B's response.,,,,Speaker 1 (4): 喔喔对哦,Speaker unknown (4): 喔喔对哦
4,4,5,B,2014-09-25 15:36:52 UTC,好,Unassuming-Ingenuous,The speaker is casual and does not exhibit any...,Unassuming-Ingenuous,The speaker is responding to the previous stat...,Unassuming-Ingenuous,"Speaker B simply responds with ""好"" (okay).",,,,Speaker 2 (5): 好,Speaker unknown (5): 好


In [136]:
def create_line(row, include_speaker=True):
    # TODO: generalize this more speakers
    speaker_map = {'A': '1', 'B': '2'}
    if include_speaker:
        # TODO: could optionally include the time
        return f"Speaker {speaker_map[row['Participant']]} ({row['Utterance ID']}):  {row['Original Text']}"
    else:
        return f"Speaker unknown ({row['Utterance ID']}):  {row['Original Text']}"

In [137]:
sample_df['Complete Line'] = sample_df.apply(create_line, axis=1)
# create another column where the speaker is unknown
sample_df['Complete Line (Unknown Speaker)'] = sample_df.apply(lambda x: create_line(x, include_speaker=False), axis=1)

In [138]:
sample_df.head()

Unnamed: 0.1,Unnamed: 0,Utterance ID,Participant,Time,Original Text,chat_gpt_labels,chat_gpt_explanations,gpt_3.5_labels,gpt_3.5_explanations,gpt_4_labels,gpt_4_explanations,timestamp,impact_scalar,comment,Complete Line,Complete Line (Unknown Speaker)
0,0,1,A,2014-09-25 15:13:04 UTC,起床了吗,Unassuming-Ingenuous,This utterance does not provide enough informa...,Unassuming-Ingenuous,The speaker is asking a simple question in a s...,Aloof-Introverted,"Speaker A asks if Speaker B is awake, which is...",,,,Speaker 1 (1): 起床了吗,Speaker unknown (1): 起床了吗
1,1,2,B,2014-09-25 15:14:19 UTC,在干活呢 咋了,Unassuming-Ingenuous,The speaker is casual and does not exhibit any...,Unassuming-Ingenuous,The speaker is responding to the question in a...,Warm-Agreeable,Speaker B responds that they are working and a...,,,,Speaker 2 (2): 在干活呢 咋了,Speaker unknown (2): 在干活呢 咋了
2,2,3,A,2014-09-25 15:36:32 UTC,没啥jiuwenwen,Unassuming-Ingenuous,The speaker is casual and does not exhibit any...,Unassuming-Ingenuous,The speaker is responding to the previous stat...,Unassuming-Ingenuous,Speaker A says there's nothing much going on.,,,,Speaker 1 (3): 没啥jiuwenwen,Speaker unknown (3): 没啥jiuwenwen
3,3,4,A,2014-09-25 15:36:32 UTC,喔喔对哦,Unassuming-Ingenuous,The speaker is casual and does not exhibit any...,Unassuming-Ingenuous,The speaker is acknowledging the previous stat...,Warm-Agreeable,Speaker A acknowledges Speaker B's response.,,,,Speaker 1 (4): 喔喔对哦,Speaker unknown (4): 喔喔对哦
4,4,5,B,2014-09-25 15:36:52 UTC,好,Unassuming-Ingenuous,The speaker is casual and does not exhibit any...,Unassuming-Ingenuous,The speaker is responding to the previous stat...,Unassuming-Ingenuous,"Speaker B simply responds with ""好"" (okay).",,,,Speaker 2 (5): 好,Speaker unknown (5): 好


## Prepare prompt for OpenAI API

In [139]:
# load prompt to prepend:
with open('prompt.txt', 'r') as f:
    prompt = f.read()

# load addendum
with open('prompt_speaker_unknown.txt', 'r') as f:
    prompt_speaker_unknown = f.read()

In [140]:
# remove the last two lines of the prompt and add the speaker unknown prompt
prompt_speaker_unknown = '\n'.join(prompt.split('\n')[:-2]) + '\n' + prompt_speaker_unknown

In [155]:
model_input = prompt + '\n'.join(sample_df['Complete Line'].tolist()) + '\n\nOutput:\n'
model_input_speaker_unknown = prompt_speaker_unknown + '\n'.join(sample_df['Complete Line (Unknown Speaker)'].tolist()) + '\n\nOutput:\n'

In [156]:
# create messages for chatbot conversation
messages = [
  {"role": "system", "content": f"You are a helpful assistant."},
  {"role": "user", "content": model_input},
]
# create messages for chatbot conversation
messages_speaker_unknown = [
  {"role": "system", "content": f"You are a helpful assistant."},
  {"role": "user", "content": model_input_speaker_unknown},
]

In [163]:
token_count = utils.num_tokens_from_messages(messages, model=gpt_4_model)
print(token_count)

2488


In [164]:
token_count_speaker_unknown = utils.num_tokens_from_messages(messages_speaker_unknown, model=gpt_4_model)
print(token_count_speaker_unknown)

3173


In [165]:
gpt4_model_limit = 8_192
generation_capacity = gpt4_model_limit - token_count
# should be plenty
print(generation_capacity)

5704


## Send requests to GPT models
- temperature = 0.2 for all models
- leave top_p at default (1.0)

In [166]:
temperature = 0.2
top_p = 1.0

### Try sending request to GPT-3 API

In [167]:
result = openai.ChatCompletion.create(
  model=gpt_3_model,
  messages=messages,
  temperature=temperature,
  top_p=top_p,
)

In [168]:
print(result['usage'])

{
  "completion_tokens": 1447,
  "prompt_tokens": 2490,
  "total_tokens": 3937
}


In [170]:
# add this into sample_df
anno_string = result['choices'][0]['message']['content']
gpt_3_annotations = anno_string.split('\n')

In [171]:
labels_explanations = [x.split(': ')[1].split(' - ') for x in gpt_3_annotations]
labels, explanations = zip(*labels_explanations)

In [174]:
sample_df['gpt_3.5_labels'] = labels
sample_df['gpt_3.5_explanations'] = explanations

### GPT-4

In [175]:
result = openai.ChatCompletion.create(
  model=gpt_4_model,
  messages=messages,
  temperature=temperature,
  top_p=top_p,
)

In [176]:
print(result['usage'])

{
  "completion_tokens": 1563,
  "prompt_tokens": 2488,
  "total_tokens": 4051
}


In [177]:
print(result['choices'][0]['message']['content'])

Speaker 1 (1): Warm-Agreeable - The speaker is asking if the other person is awake, showing concern for their well-being.
Speaker 2 (2): Unassured-Submissive - The speaker responds with a simple statement about what they are doing and asks what's up.
Speaker 1 (3): Unassuming-Ingenuous - The speaker says there's nothing important to discuss.
Speaker 1 (4): Unassured-Submissive - The speaker acknowledges the other person's situation.
Speaker 2 (5): Unassuming-Ingenuous - The speaker responds with a simple "好" (okay).
Speaker 2 (6): Warm-Agreeable - The speaker mentions they will ask about the afternoon plans after finishing their work.
Speaker 1 (7): Warm-Agreeable - The speaker shares a personal experience about playing squash and their own attitude.
Speaker 2 (8): Warm-Agreeable - The speaker asks what the other person said about the situation.
Speaker 1 (9): Warm-Agreeable - The speaker shares the other person's opinion about their attitude towards mistakes.
Speaker 1 (10): Warm-Agre

In [178]:
# add this into sample_df
anno_string = result['choices'][0]['message']['content']
gpt_4_annotations = anno_string.split('\n')
labels_explanations = [x.split(': ')[1].split(' - ') for x in gpt_4_annotations]
labels, explanations = zip(*labels_explanations)

In [180]:
sample_df['gpt_4_labels'] = labels
sample_df['gpt_4_explanations'] = explanations

In [181]:
# model agreement rate
(sample_df['gpt_3.5_labels'] == sample_df['gpt_4_labels']).sum() / len(sample_df)

0.2647058823529412

In [188]:
sample_df[['Original Text', 'gpt_3.5_labels', 'gpt_4_labels']].head()

Unnamed: 0,Original Text,gpt_3.5_labels,gpt_4_labels
0,起床了吗,Aloof-Introverted,Warm-Agreeable
1,在干活呢 咋了,Unassuming-Ingenuous,Unassured-Submissive
2,没啥jiuwenwen,Unassured-Submissive,Unassuming-Ingenuous
3,喔喔对哦,Aloof-Introverted,Unassured-Submissive
4,好,Unassuming-Ingenuous,Unassuming-Ingenuous


### Try a completion based model


In [182]:
models = openai.Model.list()

davinci_models = [x for x in models['data'] if 'davinci' in x['id']]

In [183]:
davinci_models[:1]

[<Model model id=davinci at 0x7f5f4fa78590> JSON: {
   "created": 1649359874,
   "id": "davinci",
   "object": "model",
   "owned_by": "openai",
   "parent": null,
   "permission": [
     {
       "allow_create_engine": false,
       "allow_fine_tuning": false,
       "allow_logprobs": true,
       "allow_sampling": true,
       "allow_search_indices": false,
       "allow_view": true,
       "created": 1669066355,
       "group": null,
       "id": "modelperm-U6ZwlyAd0LyMk4rcMdz33Yc3",
       "is_blocking": false,
       "object": "model_permission",
       "organization": "*"
     }
   ],
   "root": "davinci"
 }]

In [184]:
encoding = tiktoken.encoding_for_model('text-davinci-003')

In [185]:
token_len = len(encoding.encode(model_input))

In [186]:
max_tokens = 4000 - token_len - 10

In [189]:
# try davinci-text-003
result = openai.Completion.create(
  model="text-davinci-003",
  prompt=model_input,
  max_tokens=max_tokens,
  temperature=0
)

In [199]:
# not enough tokens
print(result['choices'][0]['text'][50:])

asking a simple question without revealing much about themselves.
Speaker 2 (2): Assured-Dominant - The speaker is being assertive and taking charge of the conversation.
Speaker 1 (3): Unassuming-Ingenuous - The speaker is being honest and straightforward.
Speaker 1 (4): Unassured-Submissive - The speaker is responding with a simple "喔喔对哦" which indicates a lack of confidence or assertiveness.
Speaker 2 (5): Unassured-Submissive - The speaker responds with a simple "好" which indicates a lack of confidence or assertiveness.
Speaker 2 (6): Assured-Dominant - The speaker is taking charge of the conversation and making plans.
Speaker 1 (7): Cold - The speaker expresses a lack of understanding and shows no sympathy for the situation.
Speaker 2 (8): Unassuming-Ingenuous - The speaker is asking a straightforward question.
Speaker 1 (9): Unassuming-Ingenuous - The speaker is being honest and straightforward.
Speaker 1 (10): Cold - The speaker expresses a lack of understanding and shows no symp

## Evaluate ability to detect the speaker ID

In [207]:
result = openai.ChatCompletion.create(
  model=gpt_4_model,
  messages=messages_speaker_unknown,
  temperature=temperature,
  top_p=top_p,
)

In [209]:
# add this into sample_df
anno_string = result['choices'][0]['message']['content']
gpt_4_annotations = anno_string.split('\n')

In [210]:
speakers = [x.split(' ')[1] for x in gpt_4_annotations]

In [213]:
labels_explanations = [x.split(': ')[1].split(' - ') for x in gpt_4_annotations]
labels, explanations = zip(*labels_explanations)

In [216]:
sample_df['Speaker'] = sample_df['Participant'].apply(lambda x: {'A': '1', 'B': '2'}[x])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sample_df['Speaker'] = sample_df['Participant'].apply(lambda x: {'A': '1', 'B': '2'}[x])


In [217]:
sample_df['gpt_4_speakers'] = speakers

In [218]:
# evaluate accuracy of speaker identification
(sample_df['Speaker'] == sample_df['gpt_4_speakers']).sum() / len(sample_df)

0.6617647058823529

In [219]:
sample_df['gpt_4_labels_speaker_unknown'] = labels
sample_df['gpt_4_explanations_speaker_unknown'] = labels

In [220]:
# determine if this changes the way it labels social orientation tags
# prediction quality might take a hit without speaker information
(sample_df['gpt_4_labels'] == sample_df['gpt_4_labels_speaker_unknown']).sum() / len(sample_df)

0.5735294117647058

In [221]:
sample_df[['Original Text', 'gpt_4_labels', 'gpt_4_labels_speaker_unknown']]

Unnamed: 0,Original Text,gpt_4_labels,gpt_4_labels_speaker_unknown
0,起床了吗,Warm-Agreeable,Unassured-Submissive
1,在干活呢 咋了,Unassured-Submissive,Warm-Agreeable
2,没啥jiuwenwen,Unassuming-Ingenuous,Unassured-Submissive
3,喔喔对哦,Unassured-Submissive,Unassured-Submissive
4,好,Unassuming-Ingenuous,Unassured-Submissive
...,...,...,...
63,不想吃两顿,Unassured-Submissive,Unassured-Submissive
64,好,Unassuming-Ingenuous,Unassured-Submissive
65,我发现我们楼三楼有poster board的。。z,Warm-Agreeable,Unassuming-Ingenuous
66,各种广告,Warm-Agreeable,Unassured-Submissive


### Save sample_df to disk

In [223]:
sample_df = sample_df[['Utterance ID', 'Participant', 'Time', 'Original Text',
       'chat_gpt_labels', 'chat_gpt_explanations', 'gpt_3.5_labels',
       'gpt_3.5_explanations', 'gpt_4_labels', 'gpt_4_explanations', 'gpt_4_labels_speaker_unknown', 'gpt_4_explanations_speaker_unknown', 'timestamp',
       'impact_scalar', 'comment']]

In [224]:
sample_df.to_csv('M01000G9A_social_orientation.csv', index=False)

## Estimate cost

In [226]:
# prompt length + typical conversation length + typical completion length
print(result['usage'])

{
  "completion_tokens": 1752,
  "prompt_tokens": 3173,
  "total_tokens": 4925
}


In [227]:
# tally lengths
prompt_length = result['usage']['prompt_tokens']
completion_length = result['usage']['completion_tokens']

In [229]:
# break cost down
prompt_cost = gpt_4_prompt_price * (prompt_length / 1000)
completion_cost = gpt_4_completion_price * (completion_length / 1000)

In [231]:
total_cost = prompt_cost + completion_cost
print(f'Approx. cost per annotated conversation: {total_cost}')

Approx. cost per annotated conversation: 0.20031


In [232]:
# we have ~10,000 conversations
total_cost * 10_000

2003.1

In [237]:
# ~600 text conversations
total_cost * 600

120.18599999999999

In [239]:
print(model_input_speaker_unknown)

Circumplex theory is a social psychology based theory that characterizes social interactions between speakers. The social orientation tagset includes: {Assured-Dominant, Gregarious-Extraverted, Warm-Agreeable, Unassuming-Ingenuous, Unassured-Submissive, Aloof-Introverted, Cold, Arrogant-Calculating}, which are defined below in more detail.

Assured-Dominant - Demands to be the center of interest, demands attention, does most of the talking, speaks loudly, is firm, is self-confident, is forceful, is ambitious, is assertive, is persistent, is domineering, not self-conscious

Gregarious-Extraverted - Feels comfortable around people, starts conversations, talks to a lot of different people, loves large groups, is friendly, is enthusiastic, is warm, is extraverted, is good-natured, is cheerful / happy, is pleasant, is outgoing, is approachable, is not shy, is "lively"

Warm-Agreeable - is interested in people, reassures others, inquires about others' well-being, gets along well with others,