# ChatGPT API, Prompt Engineering, Processing Batch Request
## The Association for Computational Linguistics
## WASSA 2023 Shared Task on Empathy Emotion and Personality Detection in Interactions
More details [here](https://codalab.lisn.upsaclay.fr/competitions/11167#learn_the_details)

In [14]:
import openai
import os
import re
import numpy as np
import pandas as pd
from typing import List
from sklearn.metrics import classification_report, multilabel_confusion_matrix

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 400)

# to see all env variables:
#for name, value in os.environ.items():
#    print("{0}: {1}".format(name, value))

# Prompts

In [2]:
# categories are listed in the decreasing frequency order
prompt_one   = '''Act as a very accurate zero-shot text classifier and classify the provided text into one most relevant category from the following list of categories: Sadness, Neutral, Anger, Disgust, Hope, Fear, Surprise, Joy. Classify the following text with the most relevant category from the above list and output ONLY the category and nothing else: "{}"'''
prompt_multi = '''Act as a very accurate zero-shot text classifier and classify the text provided below into one most relevant category from the following list of categories: Sadness, Neutral, Anger, Disgust, Hope, Fear, Surprise, Joy. Add another category from the above list if and only if the second category is also relevant to the text provided below. If two categories are relevant, sort them lexicographically and concatenate them using the forward slash. Using these instructions, classify the following text and output ONLY the category or categories and nothing else: "{}"'''
s = 'This is a text sample'
print(prompt_one.format(s), '\n')
print(prompt_multi.format(s))

Act as a very accurate zero-shot text classifier and classify the provided text into one most relevant category from the following list of categories: Sadness, Neutral, Anger, Disgust, Hope, Fear, Surprise, Joy. Classify the following text with the most relevant category from the above list and output ONLY the category and nothing else: "This is a text sample" 

Act as a very accurate zero-shot text classifier and classify the text provided below into one most relevant category from the following list of categories: Sadness, Neutral, Anger, Disgust, Hope, Fear, Surprise, Joy. Add another category from the above list if and only if the second category is also relevant to the text provided below. If two categories are relevant, sort them lexicographically and concatenate them using the forward slash. Using these instructions, classify the following text and output ONLY the category or categories and nothing else: "This is a text sample"


In [3]:
# Using followup Q1 can improve the reponse. If the reponse has multiple words, first parse it and try to find
# the category in it. Only if this doesn't work, send followup Q2. ChatGPT can offer the second category in reponse
# to Q1, but can change its mind again and offer a third category if asked Q2
followup1 = 'Are you sure about that?'
followup2 = 'Output only the category and nothing else'

# Load data

In [4]:
file1    = 'data/df_train.pkl'
df_train = pd.read_pickle(file1)

file2    = 'data/df_dev.pkl'
df_dev   = pd.read_pickle(file2)

print(df_train.shape, df_dev.shape)

2023-04-13 23:48:55.252896: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


(792, 30) (208, 30)


In [5]:
cols = [ 'essay_clean_spellchecked', 'emotion', ]
with pd.option_context('display.max_colwidth', None):
    display(df_dev[cols].head(25))

Unnamed: 0,essay_clean_spellchecked,emotion
0,"How sad is it that this kind of pain and suffering, and those kind of living conditions still exist today? what a gap we have in society between developed countries and those that aren't. It's crazy to drive around the US and see all the money people spend on pointless things, and then to think about how the people in Haiti are living.",[Sadness]
1,The article is kind of tragic and hits close to home as I am the son of Haitian immigrants. Haiti has a lot of problems that only become exaggerated during natural disasters. I think what the Haitian people really need from the international community is help developing infrastructure so they can address these issues themselves. Foreign aid only acts as a band aid.,[Sadness]
2,"I think that these kinds of stories, are sad, yet inspirational and leave you with kind of a good feeling. Even though his story is sad, it's cool and inspiring/motivational to see that he rose up against his circumstances. That he worked hard to make something of himself and he succeeded in what he wanted to do.",[Sadness]
3,It's crazy that random accidents like this happen everyday. I am not a baseball fan but of course enjoy a baseball game every now and again. I lived and worked in Miami too so I am vaguely familiar with that baseball player who unfortunately passed away. The effort to save him was great but unfortunately bad things seem to happen every day. He was so young too so it makes it worse.,[Neutral]
4,"This story makes me so so sad.... As someone who also grew up in the system, I can strongly relate. It's sad that America has not figured out a better and more safe system to handle kid's without parents or with parents who are unfit. A lot of the times, the system is no better, or even worse than the situation kids were in before, and I think this story is a good example of that.",[Sadness]
5,"After reading the article, my first reaction and feeling is that i feel really bad for the brothers. I feel like people their age should not have to be locked inside a jail cell. They should be out in the world improving themselves and being normal people. It's also really sad for the family members of these brothers as well because they are probably all suffering and worrying.",[Sadness]
6,"I didn't know coal mining had such adverse effects on the surrounding environment. It has basically ruined the lives of the people who live nearby these mines. And the animal populations too, imagine a heard of elephants not able to sustain themselves with the food available and needing to invade human territory...They must really be in a desperate situation.",[Neutral]
7,"This is very sad. I can't imagine having elephants come stampede my house in the middle of the night. What a terrible and sad situation, and these poor people can't even do anything about it. Someone needs to stop the deforestation and stop polluting the air these people breathe, it is not right that they are doing and all for the sake of turning a profit.",[Sadness]
8,"Guys, reading this article really hits home for me. If you or someone you know is having suicidal thoughts, please get help from the available sources. Suicide is no joke and it is a shame when someone does not get the help they need. I've struggled with this for a few years now but I got the help I needed. This woman was not as fortunate.",[Sadness]
9,Hey guys. So I just read this article about Iraqi Christians being persecuted by Muslims in Iraq. I don't understand why people of different religious backgrounds can't get along there. I'm sure it is a cultural thing but it is such unnecessary violence and conflict. It hurts both sides and I wish there was a way we could get them to set aside their differences. But not military action. We don't need another war.,[Neutral]


# Sample request

In [None]:
openai.api_key = os.getenv("OPENAI_API_KEY")
model  = 'gpt-3.5-turbo'
labels = {'sadness', 'neutral', 'anger', 'disgust', 'hope', 'fear', 'surprise', 'joy'} 
clean  = re.compile(r'[^a-zA-Z ]+')

In [None]:
def verify_label(label_):
    '''
       Verify if label_ contains any of the categories
       from the predefined set of labels
    '''
    label_ = clean.sub(' ', label_)
    label_ = multi_spaces.sub(' ', label_).lower().split()
    res    = [i for i in label_ if i in labels_set]
    res    = list(set(res))
    return '/'.join(res) if res else None

In [None]:
def verify_num_tokens(model, messages):
    '''Check that there is enough tokens available for a ChatGPT repsonse'''
    num_tokens_tiktoken = num_tokens_from_messages(messages, model)
    if num_tokens_tiktoken > 3950:
        print(f'Number of tokens is {num_tokens_tiktoken} which exceeds 3950')
        print(f'TEXT: {text_}\n')
        return False
    else:
        return True


@backoff.on_exception(backoff.expo, openai.error.RateLimitError, max_time=10)
def get_response(model, messages, temperature=0, max_tokens=None):
    '''Send request, return reponse'''
    response  = openai.ChatCompletion.create(
        model = model,
        messages = messages,
        temperature = temperature,        # range(0,2), the more the less deterministic / focused
        top_p = 1,                        # top probability mass, e.g. 0.1 = only tokens from top 10% proba mass
        n = 1,                            # number of chat completions
        #max_tokens = max_tokens,          # tokens to return
        stream = False,        
        stop=None,                        # sequence to stop generation (new line, end of text, etc.)
        )
    content = response['choices'][0]['message']['content'].strip()
    #num_tokens_api = response['usage']['prompt_tokens']
    return content

In [None]:
def translate_text(text_, prompt_):
    '''Translate text_ using prompt_ and ChatGPT API'''    
        
    # compose messages and check num_tokens
    messages = [            
            { "role": "system", "content": "You are an accurate translator from Roman Urdu.", },
            { "role": "user", "content": prompt_.format(text_), },
            ]
    if not verify_num_tokens(model, messages): return None
    return get_response(model, messages)

In [181]:
def classify_text(text_, prompt_):
    '''Classify text_ using prompt_ and ChatGPT API'''
        
    # compose messages and check num_tokens
    messages = [
            { "role": "system", "content": "You are a very accurate text classifier.", },
            { "role": "user", "content": prompt_.format(text_), },
            ]
    if not verify_num_tokens(model, messages): return None
    label_    = get_response(model, messages)
    old_label = label_
    label_    = verify_label(label_)        # get just the category if response is too long
        
    # if label not found in response text - second, extended chat
    if label_ is None:
        messages += [
            { "role": "assistant", "content": old_label, },
            { "role": "user", "content": followup1, }
            ]        
        label_    = get_response(model, messages)        
        old_label = label_
        label_    = verify_label(label_)        # get just the category if response is too long
            
    return label_ if label_ is not None else old_label

In [9]:
idx = 19
prompt_one1 = '''How are you {}'''
text, groundtruth_labels = df_dev[['essay_clean_spellchecked', 'emotion']].values[idx]
label, tokens = classify_text(text, prompt_one)

#print(prompt_one.format( text ))
print(f"\nGROUNDTRUTH LABEL:\n{'/'.join( groundtruth_labels )}")
print(f"\nPREDICTED LABEL:\n{label}")
print(f'\nTOTAL TOKENS: {tokens}')


GROUNDTRUTH LABEL:
Neutral

PREDICTED LABEL:
Sadness

TOTAL TOKENS: 185


# Batch Request

In [10]:
# target variables
label2key = {   
    'Anger':    0,
    'Disgust':  1,
    'Fear':     2,
    'Hope':     3,    
    'Joy':      4,
    'Neutral':  5,
    'Sadness':  6,
    'Surprise': 7,
}
key2label = {v: k for k,v in label2key.items()}
print(key2label)

{0: 'Anger', 1: 'Disgust', 2: 'Fear', 3: 'Hope', 4: 'Joy', 5: 'Neutral', 6: 'Sadness', 7: 'Surprise'}


In [15]:
def get_target(emotions: List[str])->List[int]:
    '''
        Convert list of strings with categories into list of 0s and 1s with length 8 because there are 8 categories;
        1 in the i-th position means that this essay belongs to the i-th category as in key2label[i]
    '''
    res  = [0]*8
    idxs = [label2key[e] for e in emotions]    
    for idx in idxs:
        res[idx] = 1
    return res

In [16]:
df_dev['pred_all'] = df_dev['essay_clean_spellchecked'].apply( lambda x: classify_text(x, prompt_one) )

In [20]:
df_dev = df_dev.rename( columns={'pred': 'pred_all'} )

In [21]:
df_dev['pred'] = df_dev['pred_all'].apply( lambda x: x[0] )

In [22]:
df_dev['pred'].value_counts()

Sadness     119
Disgust      21
Hope         20
Fear         18
Anger        17
Neutral      12
Surprise      1
Name: pred, dtype: int64

In [23]:
df_dev['pred_encoded'] = df_dev['pred'].apply( lambda x: get_target(x.split('/')) )

In [25]:
y_dev_encoded      = np.array( df_dev['target_encoded'].values.tolist() )
y_dev_pred_encoded = np.array( df_dev['pred_encoded'].values.tolist() )
labels = list(label2key.keys())
print( classification_report( y_dev_encoded, y_dev_pred_encoded, target_names=labels, digits=4 ) )

              precision    recall  f1-score   support

       Anger     0.6471    0.2895    0.4000        38
     Disgust     0.4286    0.3750    0.4000        24
        Fear     0.2222    0.5000    0.3077         8
        Hope     0.3000    0.3750    0.3333        16
         Joy     0.0000    0.0000    0.0000         2
     Neutral     0.7500    0.1667    0.2727        54
     Sadness     0.7479    0.8812    0.8091       101
    Surprise     0.0000    0.0000    0.0000         3

   micro avg     0.6154    0.5203    0.5639       246
   macro avg     0.3870    0.3234    0.3154       246
weighted avg     0.6402    0.5203    0.5246       246
 samples avg     0.6154    0.5385    0.5641       246



  _warn_prf(average, modifier, msg_start, len(result))


## This is predicting just one label!
Things to try next:
* Predict 1 label, but use followup1 = 'Are you sure about that?' and parse the longer reponse
* Try predicting 2 labels
* Try providing examples of categories - how to divide examples? What whould be the prompt?
* Play with the prompt?

# REFERENCE

Source: https://platform.openai.com/docs/guides/chat/introduction  
* The main parameter: __messages__: array of message objects, where each object has a role ("system", "user", or "assistant") and content. Conversations can be as short as 1 message or fill many pages. If can't __fit within the model’s token limit__, shorten the message in some way.
* Limit for gpt-3.5-turbo - 4,096 tokens
* Conversation: __system message__ + __ALTERNATING user and assistant messages__.
* __System message__ sets the assistant's behavior (note: gpt-3.5-turbogpt-3.5-turbo is not always perceptive to system messages).
* __User messages__ instruct the assistant (generated by either the app's end user or developer as instructions).
* __Assistant messages__ store prior responses (conversation history) OR contain examples of desired behavior. The models have no memory of past requests => all relevant information must be supplied via the conversation. If can't __fit within the model’s token limit__, shorten the message in some way.

In [None]:
# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

Parameters for openai.ChatCompletion.create():

__Required__: 
* model / str - model ID.
* messages / array - see above

__Optional__
* `temperature` / number - default 1, range(0,2). Higher values like 0.8 - output is more random, lower values like 0.2 - output is more focused and deterministic. Altering this or top_p, not both.
* `top_p` / number - default 1, nucleus sampling = model considers the results of the tokens with top_p probability mass (0.1 = consdering only tokens comprising the top 10% probability mass).
* `n` / int - default 1: how many chat completions to generate.
* `stream` / bool - default false, if set, partial message deltas will be sent as data-only server-sent events as they become available; stream terminated by a `data: [DONE]` message. More [here](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb)
* `stop` / str or array - default null, up to 4 sequences where the API will stop generating further tokens (new line, end of text `lendoftext`, etc.).
* `max_tokens` / int / Optional - default inf, max number of tokens in chat completion.
* `presence_penalty` / number - default 0, range(-2.0, 2.0), increase likelihood of new topics in chat completion by using positive values.
* `frequency_penalty` / number - default 0, range(-2.0, 2.0), decrease likelihood to repeat same line verbatim by using positive values that penalize new tokens based on their existing frequency.
* `logit_bias` / map {token ID (from tokenizer): value in range(-100, 100} - default null, modify likelihood of specified tokens in completion. Math: this bias is added to logits prior to sampling. Values in range likie -1 or 1 decrease or increase likelihood of selection; values like -100 or 100 result in ban or exclusive selection of relevant token.
* `user` / string - end-user id, helps OpenAI monitor and detect abuse

In [None]:
# API response
{
 'id': 'chatcmpl-6p9XYPYSTTRi0xEviKjjilqrWU2Ve',
 'object': 'chat.completion',
 'created': 1677649420,
 'model': 'gpt-3.5-turbo',
 'usage': {'prompt_tokens': 56, 'completion_tokens': 31, 'total_tokens': 87},
 'choices': [
   {
    'message': {
      'role': 'assistant',
      'content': 'The 2020 World Series was played in Arlington, Texas at the Globe Life Field, which was the new home stadium for the Texas Rangers.'},
    'finish_reason': 'stop',
    'index': 0
   }
  ]
}

In Python, the assistant’s reply can be extracted with `response['choices'][0]['message']['content']`  
Every response has a finish_reason with these possible values:
* `stop`: API returned complete model output
* `length`: Incomplete model output due to max_tokens parameter or token limit
* `content_filter`: Omitted content due to a flag from our content filters
* `null`: API response still in progress or incomplete

NOTE: leave enopugh tokens for the response - very long conversations may receive incomplete replies because if a conversation is 4090 tokens long when model='gpt-3.5-turbo', the reply will be cut off after just 6 tokens

In [5]:
# list modles
openai.Model.list()

<OpenAIObject list at 0x7fcbd46feed0> JSON: {
  "data": [
    {
      "created": 1649358449,
      "id": "babbage",
      "object": "model",
      "owned_by": "openai",
      "parent": null,
      "permission": [
        {
          "allow_create_engine": false,
          "allow_fine_tuning": false,
          "allow_logprobs": true,
          "allow_sampling": true,
          "allow_search_indices": false,
          "allow_view": true,
          "created": 1669085501,
          "group": null,
          "id": "modelperm-49FUp5v084tBB49tC4z8LPH5",
          "is_blocking": false,
          "object": "model_permission",
          "organization": "*"
        }
      ],
      "root": "babbage"
    },
    {
      "created": 1649359874,
      "id": "davinci",
      "object": "model",
      "owned_by": "openai",
      "parent": null,
      "permission": [
        {
          "allow_create_engine": false,
          "allow_fine_tuning": false,
          "allow_logprobs": true,
          "allow_sa

In [None]:
# classify text using DaVinci model - verified, it works
import openai

# Replace 'your_api_key' with your OpenAI API key
openai.api_key = "your_api_key"

def classify_text(prompt, labels):
    completions = openai.Completion.create(
        engine="text-davinci-002",  # You can change the engine if you prefer another one
        prompt=(
            f"Given the text: '{prompt}', "
            f"please classify it into one of the following categories: {', '.join(labels)}.\n"
            f"Text: '{prompt}'\nCategory:"
        ),
        max_tokens=50,
        n=1,
        stop=None,
        temperature=0.5,
    )

    message = completions.choices[0].text.strip()
    return message


# Example usage
text = "A blog post about the benefits of adopting a plant-based diet"
labels = ["Health", "Technology", "Environment", "Entertainment"]

category = classify_text(text, labels)
print(f"Category: {category}")

In [7]:
import tiktoken

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"):
    """Returns the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        #print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model == "gpt-3.5-turbo":
        #print("Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301")
    elif model == "gpt-4":
        #print("Warning: gpt-4 may change over time. Returning num tokens assuming gpt-4-0314.")
        return num_tokens_from_messages(messages, model="gpt-4-0314")
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif model == "gpt-4-0314":
        tokens_per_message = 3
        tokens_per_name = 1
    else:
        raise NotImplementedError(f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.""")
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens