## Using GPT-3 to classify labels(yp2201)

### Description:
- This notebook is using GPT-3 API to classify label results.
- This model uses data from r/wallstreetbets, and focuses on 6 labels
- labels(6): TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT
- GPT-3 can be found at https://beta.openai.com/

In [1]:
# Make sure you've installed openai. If not, uncomment below line and install openai.
# pip install openai

In [45]:
# loading dataset
import numpy as np
import pandas as pd

# should change file directory to given csv file's path
data = pd.read_csv('/Users/yoontaepark/nlu-reddit-toxicity-dataset/data/labelled_master_data_2022-05-11.csv')
print('shape of dataset:', data.shape)
data.head(3)

shape of dataset: (800, 11)


Unnamed: 0.1,Unnamed: 0,example_id,preceding_comment,comment_for_evaluation,following_comment,toxicity,severe_toxicity,identity_attack,insult,profanity,threat
0,6,275362,Which one of u bought my stocks? >:(,posted this in the other thread but the more i...,That thing is 1.64% rn. JPow has to say somet...,0.0,0.0,0.0,0.0,1.0,0.0
1,12,30108,Alexa play down by 311.,No one cares about 10 year bonds bro - thats j...,Sorry you were born the bad kind of retarded.,0.0,0.0,0.0,0.0,1.0,0.0
2,21,316461,Today = no Vaseline,The entire market is fucking hemorrhaging,Imagine putting your money in bonds lol,0.0,0.0,0.0,0.0,1.0,0.0


In [46]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Unnamed: 0              800 non-null    int64  
 1   example_id              800 non-null    int64  
 2   preceding_comment       800 non-null    object 
 3   comment_for_evaluation  800 non-null    object 
 4   following_comment       800 non-null    object 
 5   toxicity                800 non-null    float64
 6   severe_toxicity         800 non-null    float64
 7   identity_attack         800 non-null    float64
 8   insult                  800 non-null    float64
 9   profanity               800 non-null    float64
 10  threat                  800 non-null    float64
dtypes: float64(6), int64(2), object(3)
memory usage: 68.9+ KB


In [47]:
data.drop(['Unnamed: 0'], axis=1, inplace=True)

In [48]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   example_id              800 non-null    int64  
 1   preceding_comment       800 non-null    object 
 2   comment_for_evaluation  800 non-null    object 
 3   following_comment       800 non-null    object 
 4   toxicity                800 non-null    float64
 5   severe_toxicity         800 non-null    float64
 6   identity_attack         800 non-null    float64
 7   insult                  800 non-null    float64
 8   profanity               800 non-null    float64
 9   threat                  800 non-null    float64
dtypes: float64(6), int64(1), object(3)
memory usage: 62.6+ KB


In [49]:
# printing some of the commented to be evaluated
data['comment_for_evaluation'][120]

'Disney is fucked... Jesus'

In [50]:
comments_for_eval = data['comment_for_evaluation']

### Prompt setting
- prompt = instruction + label + definition + comment + result
- instruction/label/definition/result is stable, while comment is changed
- Trials of other prompts are listed in appendix(see below)

In [51]:
# defining prompt
instruction = 'instructions:\n\
Please evaluate comment for evaluation on the labels provided. You can refer to the definitions of each labels below. Answer as format " , , , , , ".\n\n'

label = 'labels provided:\n\
TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT\n\n\
label values:\n\
1 - Matches the Definition\n\
0 - Does not match the Definition\n\n'

definition = 'definitions of each labels:\n\
TOXICITY: A rude, disrespectful, or unreasonable comment that is likely to make people leave a discussion.\n\
SEVERE_TOXICITY: A very hateful, aggressive, disrespectful comment or otherwise very likely to make a user leave a discussion or give up on sharing their perspective. This attribute is much less sensitive to more mild forms of toxicity, such as comments that include positive uses of curse words.\n\
IDENTITY_ATTACK: Negative or hateful comments targeting someone because of their identity.\n\
INSULT: Insulting, inflammatory, or negative comment towards a person or a group of people.\n\
PROFANITY: Swear words, curse words, or other obscene or profane language.\n\
THREAT: Describes an intention to inflict pain, injury, or violence against an individual or group.\n\n'

result = 'result:\n'

In [54]:
import os
import openai

# replace with your api_key (should start with 'sh-///', https://beta.openai.com/account/api-keys)
openai.api_key = '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'

# using davinci-002 as a engine, as it has better performance among available engines
# replace your prompt
# below parameters are used for baseline result

res = []

for each_comment in comments_for_eval:
    comment_full_sentence = 'comment for evaluation:\n' + each_comment + '\n\n'
    prompt_wsb = instruction + label + definition + comment_full_sentence + result
    response = openai.Completion.create(
      engine="text-davinci-002",
      prompt=prompt_wsb,
      temperature=0,
      max_tokens=60,
      top_p=1.0,
      frequency_penalty=0.0,
      presence_penalty=0.0
    )
    
    res.append(response)

In [55]:
# print results
for i in range(len(res)):
    print(res[i]['choices'][0]['text'].split('\n'))

['0, 0, 0, 0, 0, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 0, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 0, 0']
['0, 0, 0, 0, 0, 0']
['0, 0, 0, 0, 0, 0']
['0, 0, 0, 0, 0, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 0, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 0, 0, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 0, 0, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 0, 0, 0']
['0, 0, 0, 0, 0, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 0, 0, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 0, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 0, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1, 1, 0']
['0, 0, 0, 1,

In [56]:
res[120]

<OpenAIObject text_completion id=cmpl-577iTdYS4CoKMjgXcGGDKLlvlR56e at 0x7fce680e0090> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "0, 0, 0, 1, 1, 0"
    }
  ],
  "created": 1652379641,
  "id": "cmpl-577iTdYS4CoKMjgXcGGDKLlvlR56e",
  "model": "text-davinci:002",
  "object": "text_completion"
}

In [57]:
gpt3_toxic, gpt3_sev_toxic, gpt3_identity, gpt3_insult, gpt3_profanity, gpt3_threat = [], [], [] ,[], [], []

for each_res in range(len(res)):
    toxic_res = int(res[each_res]['choices'][0]['text'].split(', ')[0])
    sev_toxic_res = int(res[each_res]['choices'][0]['text'].split(', ')[1])
    identity_res = int(res[each_res]['choices'][0]['text'].split(', ')[2])
    insult_res = int(res[each_res]['choices'][0]['text'].split(', ')[3])
    profanity_res = int(res[each_res]['choices'][0]['text'].split(', ')[4])
    threat_res = int(res[each_res]['choices'][0]['text'].split(', ')[5])
                                                                                                                                                                                                                             
    gpt3_toxic.append(toxic_res)
    gpt3_sev_toxic.append(sev_toxic_res)
    gpt3_identity.append(identity_res)
    gpt3_insult.append(insult_res)
    gpt3_profanity.append(profanity_res)
    gpt3_threat.append(threat_res)

In [58]:
# adding results into dataframe, by defining new columns

data['gpt3_toxic'] = gpt3_toxic
data['gpt3_sev_toxic'] = gpt3_sev_toxic
data['gpt3_identity'] = gpt3_identity
data['gpt3_insult'] = gpt3_insult
data['gpt3_profanity'] = gpt3_profanity
data['gpt3_threat'] = gpt3_threat

In [59]:
data.to_csv('./result_0512.csv')

In [77]:
# making pred_only result dataframe
data_pred_only_df = data.drop(['toxicity', 'severe_toxicity', 'identity_attack', 'insult', 'profanity', 'threat'], axis=1)
data_pred_only_df.rename(columns = {'gpt3_toxic':'toxicity', 'gpt3_sev_toxic':'severe_toxicity', 'gpt3_identity': 'identity_attack', 'gpt3_insult':'insult', 'gpt3_profanity': 'profanity', 'gpt3_threat': 'threat'}, inplace=True)
data_pred_only_df

Unnamed: 0,example_id,preceding_comment,comment_for_evaluation,following_comment,toxicity,severe_toxicity,identity_attack,insult,profanity,threat
0,275362,Which one of u bought my stocks? >:(,posted this in the other thread but the more i...,That thing is 1.64% rn. JPow has to say somet...,0,0,0,0,0,0
1,30108,Alexa play down by 311.,No one cares about 10 year bonds bro - thats j...,Sorry you were born the bad kind of retarded.,0,0,0,0,1,0
2,316461,Today = no Vaseline,The entire market is fucking hemorrhaging,Imagine putting your money in bonds lol,0,0,0,1,1,0
3,316957,definitely. keep buying guys !!!!! HURRRDURRr,damn so my puts are fukd,"Quick, everyone buy Into the bull trap again!",0,0,0,1,1,0
4,318752,Big if true.,Do you routinely comment your opinion on topic...,"They get all 4, fuck that.\n\nJust one Warlock...",0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...
795,219798,Buy calls. Plenty of opportunity,"Here, take a downvote!","you can also see it on yahoo, go to the fullsc...",0,0,0,1,0,0
796,68599,It’s a dark day lmaoooooo,How do I block someone,UWMC let’s go guys 🚀🚀🚀,0,0,0,0,0,0
797,31882,My amc 15 calls ain’t lookin too good,market starting to bounce a little... get that...,The market is completely dumping and is the in...,0,0,0,0,0,0
798,482069,Thank you to whoever sold me the PLTR $26 FD c...,yeah i seem to have timed that one almost perf...,Watching Rat Race. One of best comedies of all...,0,0,0,0,0,0


In [78]:
## saving file with pred_values only
data_pred_only_df.to_csv('./baseline_gpt3_2022-05-12.csv', index=False)

## Evaluation of the model

### Now given all labels using gpt-3, compare gpt-3 with human labeling
<b>- Results:  
total) f1: 0.63, precision: 0.63, recall: 0.62</b>  
toxicity) f1: 0.07, precision: 1.00, recall: 0.04  
severe_toxicity) f1: 0.39, precision: 0.58, recall: 0.29  
identity_attack) f1: 0.16, precision: 0.75, recall: 0.09  
insult) f1: 0.48, precision: 0.37, recall: 0.70  
profanity) f1: 0.83, precision: 0.79, recall: 0.86  
threat) f1:0.12, precision:0.07, recall:0.33  

In [60]:
def f1_precision_recall(y_true, y_pred): 
    
    # recall that f1 score = 2 * (precision * recall) / (precision + recall)
    # precision = tp / (tp + fp)
    # recall = tp / (tp + fn)
    tp, tn, fp, fn = 0, 0, 0, 0
    precision, recall = 0, 0
    
    for i in range(len(y_true)):
        if y_true[i] == 1 and y_pred[i] == 1: tp += 1
        elif y_true[i] == 0 and y_pred[i] == 0: tn += 1
        elif y_true[i] == 0 and y_pred[i] == 1: fp += 1
        elif y_true[i] == 1 and y_pred[i] == 0: fn += 1            

    precision = tp / (tp + fp)
    recall = tp / (tp + fn)

    f1 = 2 * (precision * recall) / (precision + recall)
    
    return f1, precision, recall

In [61]:
# will get f1, precision, recall, respectively
toxic_f1, toxic_precision, toxic_recall = f1_precision_recall(data['toxicity'], data['gpt3_toxic'])
sev_f1, sev_precision, sev_recall = f1_precision_recall(data['severe_toxicity'], data['gpt3_sev_toxic'])
idn_f1, idn_precision, idn_recall = f1_precision_recall(data['identity_attack'], data['gpt3_identity'])
insult_f1, insult_precision, insult_recall = f1_precision_recall(data['insult'], data['gpt3_insult'])
prof_f1, prof_precision, prof_recall = f1_precision_recall(data['profanity'], data['gpt3_profanity'])
threat_f1, threat_precision, threat_recall = f1_precision_recall(data['threat'], data['gpt3_threat'])

In [79]:
# Note: we can't calculate threat as threat_recall = 0/0. threat_precision = 0/8 = 0.0 (0 792 8 0)

print('toxicity) f1: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(toxic_f1, toxic_precision, toxic_recall))
print('severe_toxicity) f1: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(sev_f1, sev_precision, sev_recall))
print('identity_attack) f1: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(idn_f1, idn_precision, idn_recall))
print('insult) f1: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(insult_f1, insult_precision, insult_recall))
print('profanity) f1: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(prof_f1, prof_precision, prof_recall))
print('threat) f1:{:.3f}, precision:{:.3f}, recall:{:.3f}'.format(threat_f1, threat_precision, threat_recall))

toxicity) f1: 0.071, precision: 1.000, recall: 0.037
severe_toxicity) f1: 0.389, precision: 0.583, recall: 0.292
identity_attack) f1: 0.160, precision: 0.750, recall: 0.090
insult) f1: 0.480, precision: 0.365, recall: 0.700
profanity) f1: 0.827, precision: 0.795, recall: 0.862
threat) f1:0.118, precision:0.071, recall:0.333


In [64]:
# calcuating total(but can also average the results as rows for each results are even)
y_true_total = pd.concat([data['toxicity'], data['severe_toxicity'], data['identity_attack'], data['insult'], \
                          data['profanity'], data['threat']], axis=0)

y_pred_total = pd.concat([data['gpt3_toxic'], data['gpt3_sev_toxic'], data['gpt3_identity'], data['gpt3_insult'], \
                          data['gpt3_profanity'], data['gpt3_threat']], axis=0)

In [80]:
total_f1, total_precision, total_recall = f1_precision_recall(list(y_true_total), list(y_pred_total))
print('total) f1: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(total_f1, total_precision, total_recall))

total) f1: 0.626, precision: 0.631, recall: 0.621


In [None]:
---

## Appendix

### Trial 1: Multiple comments for evaluation
- Based on Openai’s instruction, making the list too long will make the model drift. As comments are not preprocessed and have unusual format(i.e. too many spaces or enters), some iteration failed to give the results for given comments. Also, there were unexpected result of showing all ‘0’s, and this was not great enough to test a model. 

In [24]:
# defining prompt
# prompt = instruction + label_name + values + definition + comment +answer
# omment is changed, while others remain stable
instruction = "instruction:\nEvaluate list of comments on the labels provided. Refer to the definitions of each labels below. Answer as format [' , , , , , '], [' , , , , , '], \n\n"
label_name = 'label provided:\nTOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT\n\n'
label_values = 'label values:\n1 - Matches the Definition\n0 - Does not match the Definition\n\n'
definition = 'definitions of each labels: \nTOXICITY: A rude, disrespectful, or unreasonable comment that is likely to make people leave a discussion. \nSEVERE_TOXICITY: A very hateful, aggressive, disrespectful comment or otherwise very likely to make a user leave a discussion or give up on sharing their perspective. This attribute is much less sensitive to more mild forms of toxicity, such as comments that include positive uses of curse words. \nIDENTITY_ATTACK: Negative or hateful comments targeting someone because of their identity. \nINSULT: Insulting, inflammatory, or negative comment towards a person or a group of people. \nPROFANITY: Swear words, curse words, or other obscene or profane language. \nTHREAT: Describes an intention to inflict pain, injury, or violence against an individual or group. \n\n'
answer = 'answers: \n'

In [25]:
import os
import openai

# replace with your api_key (should start with 'sh-///', https://beta.openai.com/account/api-keys)
openai.api_key = 'sk-//'

# using davinci-002 as a engine, as it has better performance among available engines
# replace your prompt
# below parameters are used for baseline result

# batch_idx = [100, 200, 300, 400, 500, 600, 700, 800]
batch_idx = [5, 10]

res = []

for ith_batch in batch_idx: 
    comments_list = ''
    for each_comment in data['comment_for_evaluation'][ith_batch-5:ith_batch].values:
        comments_list += '- ' + each_comment + '\n'
        
    comment_full_sentence = 'list of comments:\n' + comments_list + '\n'
    prompt_wsb = instruction + label_name + label_values + definition + comment_full_sentence + answer
    
    response = openai.Completion.create(
      engine="text-davinci-002",
      prompt=prompt_wsb,
      temperature=0,
      max_tokens=2000,
      top_p=1.0,
      frequency_penalty=0.0,
      presence_penalty=0.0
    )
    
    # append result into a new list
    res.append(response)
    

In [26]:
resres = []

for i in range(len(res)):
    resres.extend(res[i]['choices'][0]['text'].split('\n'))

In [27]:
resres

["['0', '0', '0', '0', '0', '0'], ",
 "['0', '0', '0', '1', '0', '0'], ",
 "['0', '0', '0', '0', '0', '0'], ",
 "['0', '0', '0', '0', '0', '0'], ",
 "['0', '0', '0', '0', '0', '0']",
 "['0', '1', '0', '0', '1', '0'], ['0', '0', '0', '1', '0', '0'], ['0', '0', '0', '0', '0', '0']"]

### Trial 2: change sequence of the prompt
- Instructions + label + definition + comment + result showed best result
- Below is some trial that showed low performance
- For example, putting definitions at the front shows low performance
- Some sequence showed same result, and we concluded to fix with above sequence

In [37]:
# defining prompt
# prompt = instruction + label + definition + comment + result
# instruction/label/definition/result is stable, while comment is changed
instruction = 'instructions:\n\
Please evaluate comment for evaluation on the labels provided. You can refer to the definitions of each labels below. Answer as format " , , , , , ".\n\n'

label = 'labels provided:\n\
TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT\n\n\
label values:\n\
1 - Matches the Definition\n\
0 - Does not match the Definition\n\n'

definition = 'definitions of each labels:\n\
TOXICITY: A rude, disrespectful, or unreasonable comment that is likely to make people leave a discussion.\n\
SEVERE_TOXICITY: A very hateful, aggressive, disrespectful comment or otherwise very likely to make a user leave a discussion or give up on sharing their perspective. This attribute is much less sensitive to more mild forms of toxicity, such as comments that include positive uses of curse words.\n\
IDENTITY_ATTACK: Negative or hateful comments targeting someone because of their identity.\n\
INSULT: Insulting, inflammatory, or negative comment towards a person or a group of people.\n\
PROFANITY: Swear words, curse words, or other obscene or profane language.\n\
THREAT: Describes an intention to inflict pain, injury, or violence against an individual or group.\n\n'

result = 'result:\n'

In [34]:
import os
import openai

# replace with your api_key (should start with 'sh-///', https://beta.openai.com/account/api-keys)
openai.api_key = 'sk-//'

# using davinci-002 as a engine, as it has better performance among available engines
# replace your prompt
# below parameters are used for baseline result

res = []
cnt = 0

for each_comment in comments_for_eval:
    comment_full_sentence = 'comment for evaluation:\n' + each_comment + '\n\n'
    prompt_wsb = definition + label + instruction + comment_full_sentence + result
    response = openai.Completion.create(
      engine="text-davinci-002",
      prompt=prompt_wsb,
      temperature=0,
      max_tokens=60,
      top_p=1.0,
      frequency_penalty=0.0,
      presence_penalty=0.0
    )
    cnt += 1
    res.append(response)
    if cnt == 3: break

In [35]:
# Low performance
for i in range(len(res)):
    print(res[i]['choices'][0]['text'].split('\n'))

['0, 0, 0, 0, 0, 0']
['0, 0, 0, 0, 0, 0']
['1, 1, 0, 1, 1, 0']


### Trial 3: Showing result as a string, not a binary value: 
- It didn’t show good results. Model failed to recognize all labels and print label names even if comment matches the label’s definitions. (Missing values)
- It is also hard to define which string to show the result. While trying 'yes/no', 'match/unmatch', 'positive/negative', we decided not to use any words that may give pre-defined bias to the model.
- Rather, using binary values with value definitions got better result.

In [38]:
# defining prompt
# prompt = instruction + label + definition + comment + result
# instruction/label/definition/result is stable, while comment is changed
instruction = 'instructions:\n\
Please evaluate comment for evaluation on the labels provided. You can refer to the definitions of each labels below. Answer as match/unmatch and show results as format " , , , , , ".\n\n'

label = 'labels provided:\n\
TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT\n\n'
# label values:\n\
# 1 - Matches the Definition\n\
# 0 - Does not match the Definition\n\n'

definition = 'definitions of each labels:\n\
TOXICITY: A rude, disrespectful, or unreasonable comment that is likely to make people leave a discussion.\n\
SEVERE_TOXICITY: A very hateful, aggressive, disrespectful comment or otherwise very likely to make a user leave a discussion or give up on sharing their perspective. This attribute is much less sensitive to more mild forms of toxicity, such as comments that include positive uses of curse words.\n\
IDENTITY_ATTACK: Negative or hateful comments targeting someone because of their identity.\n\
INSULT: Insulting, inflammatory, or negative comment towards a person or a group of people.\n\
PROFANITY: Swear words, curse words, or other obscene or profane language.\n\
THREAT: Describes an intention to inflict pain, injury, or violence against an individual or group.\n\n'

result = 'result:\n'

In [39]:
import os
import openai

# replace with your api_key (should start with 'sh-///', https://beta.openai.com/account/api-keys)
openai.api_key = 'sk-//'

# using davinci-002 as a engine, as it has better performance among available engines
# replace your prompt
# below parameters are used for baseline result

res = []
cnt = 0

for each_comment in comments_for_eval:
    comment_full_sentence = 'comment for evaluation:\n' + each_comment + '\n\n'
    prompt_wsb = instruction + label + definition + comment_full_sentence + result
    response = openai.Completion.create(
      engine="text-davinci-002",
      prompt=prompt_wsb,
      temperature=0,
      max_tokens=60,
      top_p=1.0,
      frequency_penalty=0.0,
      presence_penalty=0.0
    )
    cnt += 1
    res.append(response)
    if cnt == 3: break

In [40]:
# Low performance
for i in range(len(res)):
    print(res[i]['choices'][0]['text'].split('\n'))

['unmatch, match, match, match, match, match']
['unmatch, match, match, match, match,']
['unmatch, match, match, match, match,']
