# OpenAI prompt engineering test

### Name: Bingfeng Hu
### CID: 01137798

I declare that the below is of my own work, and I have worked on this assignment independently. Please see the cell below for package dependencies, uncomment to install the package dependencies.

In this notebook we test out the openAI APIs to help us to label our unlabelled news data. In particular, we look at the code to call the model APIs, how to speed up the calls to save time, engineer the prompt to get the best sentiment labels from the best possible OpenAI Chat models.

## Workflow

Take the filtered data from the initial EDA and then run the workflow below.

Test out openAI Code workflow for openAI



In [8]:
from dotenv import load_dotenv

# load the openai api key needed from .env to call openAI models from the API
load_dotenv()

True

In [13]:
import os
from openai import OpenAI

import pandas as pd
import numpy as np

# allow for asyncio
import asyncio
from openai import AsyncOpenAI

# Testing the model

Testing out the model for use case
https://github.com/openai/openai-python


In [10]:


client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        }
    ],
    model="gpt-3.5-turbo",
)

In [11]:
chat_completion.choices[0].message.content

'This is a test.'

In [17]:
client = AsyncOpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)


In [19]:
# change the section here such that we have dictionary below instead for each completion.


async def get_chat_completion(semaphore: asyncio.Semaphore, message: str):
    async with semaphore:
        return await client.chat.completions.create(
            messages=[
                {
                    "role": "user",
                    "content": message,
                }
            ],
            model="gpt-3.5-turbo",
        )

# change this function to include a dataframe passed in instead!!

async def main(max_concurrent_requests: int) -> None:
    # Semaphore to control the number of concurrent requests
    semaphore = asyncio.Semaphore(max_concurrent_requests)

    # List of messages to be sent
    messages = ["Say this is a test", "Say hello to Arunav", "Say hello to Gerardo"]

    # Using asyncio.gather with semaphore to limit concurrent requests
    tasks = [get_chat_completion(semaphore, message) for message in messages]
    results = await asyncio.gather(*tasks)

    # Processing the results
    for result in results:
        print(result)

# Specify the maximum number of concurrent requests
max_concurrent_requests = 5
await main(max_concurrent_requests) # asyncio.run(main())

ChatCompletion(id='chatcmpl-8a9ysyWzYE3KJkaN6L5HLFNnhmg0N', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='This is a test.', role='assistant', function_call=None, tool_calls=None))], created=1703628866, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=5, prompt_tokens=12, total_tokens=17))
ChatCompletion(id='chatcmpl-8a9ys7ylbOEkEhMpCsbETlslJ0pn2', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello Arunav!', role='assistant', function_call=None, tool_calls=None))], created=1703628866, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=5, prompt_tokens=13, total_tokens=18))
ChatCompletion(id='chatcmpl-8a9ysqSo5R7ygTVH5pdgsHg292MnQ', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello Gerar

In [1]:
# add code here to compare the two models

# take the news from the filtered news before to do this.

# Test improve the sentiment analysis and prompting

Set up prototype prompting for use later

In [195]:
# change the section here such that we have dictionary below instead for each completion.


async def get_news_sentiment(semaphore: asyncio.Semaphore, message: str):
    async with semaphore:

        response = await client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a news sentiment classifier, only return values -1, 0, 1 for negative, neutral and positive sentiment. Return only values -1, 0, 1",
                },
                {
                    "role": "user",
                    "content": message,
                }
            ],
            model="gpt-4-1106-preview",
            seed=30224
        )

        #return response

        return {"text": message, "prediction": response.choices[0].message.content}

# change this function to include a dataframe passed in instead!!

async def main_sentiment_labelling(max_concurrent_requests: int, messages) -> None:
    # Semaphore to control the number of concurrent requests
    semaphore = asyncio.Semaphore(max_concurrent_requests)

    # List of messages to be sent
    # messages = ["Say this is a test", "Say hello to Arunav", "Say hello to Gerardo"]

    # Using asyncio.gather with semaphore to limit concurrent requests
    tasks = [get_news_sentiment(semaphore, message) for message in messages]
    results = await asyncio.gather(*tasks)


    return list(results)



# Prototype sentiment labels

In [196]:
# Specify the maximum number of concurrent requests
max_concurrent_requests = 5
await main_sentiment_labelling(max_concurrent_requests, ["I am very happy", "I am very angry", "Meh I don't feel anything"]) # asyncio.run(main())

# complex prompts seems to confuse the model - use a simple one - given simple task!!

[{'text': 'I am very happy', 'prediction': '1'},
 {'text': 'I am very angry', 'prediction': '-1'},
 {'text': "Meh I don't feel anything", 'prediction': '0'}]

In [197]:
file_name = "processed_data/compiled_news_all_bert_compliant.json"

# df_useful_sources_short_articles.to_json(file_name, orient="records")

df_useful_sources_short_articles = pd.read_json(file_name)

In [198]:
df_useful_sources_short_articles.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5748 entries, 0 to 5747
Data columns (total 45 columns):
 #   Column                             Non-Null Count  Dtype 
---  ------                             --------------  ----- 
 0   organizations                      5748 non-null   object
 1   uuid                               5748 non-null   object
 2   author                             5748 non-null   object
 3   url                                5748 non-null   object
 4   ord_in_thread                      5748 non-null   int64 
 5   title                              5748 non-null   object
 6   locations                          5748 non-null   object
 7   highlightText                      5748 non-null   object
 8   language                           5748 non-null   object
 9   persons                            5748 non-null   object
 10  text                               5748 non-null   object
 11  external_links                     5748 non-null   object
 12  publis

In [199]:
df_useful_sources_short_articles["text_full"].values[:10]

array(['Anadarko not interested in selling down Mozambique stake -CEO\nHOUSTON, July 29 Anadarko Petroleum Corp has no interest in selling down its stake in its planned liquefied natural gas (LNG) project in Mozambique, the company\'s chief executive said on Wednesday.\nThe Houston company\'s desire to sell some of its interest in the project "approaches zero," CEO Al Walker told investors on a conference call to discuss second-quarter earnings.\nAnadarko has a 26.5 percent stake in the Area 1 license in the Rovuma Basin offshore Mozambique.\n(Reporting by Anna Driver )',
       'Greek debt restructuring is inevitable, says IMF chief\nWASHINGTON, July 29 Greece\'s international creditors will have no choice but to accept an easing of the terms of Athens\' debts, the head of the International Monetary Fund said on Wednesday.\n"It\'s inevitable that there is an element of debt restructuring," IMF Managing Director Christine Lagarde said at a news conference.\nThe IMF has teamed up with t

In [200]:
print(df_useful_sources_short_articles["text_full"].values[:10][-1])

FT Reports Axel Springer In Talks To Be Owner
Tip : Use comma (,) to separate multiple quotes. Learn more... 
Thu, Jul 23, 2015, 14:39 BST - UK Markets close in 1 hr 51 mins FT Reports Axel Springer In Talks To Be Owner By (c) Sky News 2015 | Sky News – 4 minutes 11 seconds ago View Photo 

The owner of the Financial Times (FT) has confirmed it is in "advanced" discussions on its sale, with the newspaper reporting Axel Springer is the front-runner. 
Pearson (Xetra: 858266 - news ) released a statement to the Stock Exchange moments after the Reuters news agency, citing a source close to the deal, reported that it was to sell the financial title to a "global, digital news company". 
The statement said: "Pearson notes recent press speculation and confirms that it is in advanced discussions regarding the potential disposal of FT Group although there is no certainty that the discussions will lead to a transaction." 
The FT later reported that German publisher Axel Springer, the owner of Bil

In [201]:
print(df_useful_sources_short_articles["text_full"].values[:10][3])

Chevron to lay off 1,500 workers amidst oil price slump
July 28 Chevron Corp, the second-largest U.S. oil company, said on Tuesday it would lay off 1,500 employees, about 2 percent of its global work force, as it trims costs to offset declining crude prices.
Nearly all of the layoffs will be in Texas, where the company has expanded in recent years to develop land in the Permian shale formation, and California, where Chevron is headquartered.
Fifty international employees will be laid off and roughly 600 contractor positions will be canceled, the company said in a statement.
Of the 1,500 jobs being eliminated, 270 are currently empty and will not be filled, Chevron said.
Chevron had previously labeled the Permian as one of its premium assets.
Oil prices have plunged by about 55 percent in the past year due to oversupply concerns both in the United States and internationally.
"In light of the current market environment, Chevron is taking action to reduce internal costs in multiple operat

In [202]:
print(df_useful_sources_short_articles["text_full"].values[:10][4])

Junk Bondholders Weigh Emerging-Market Exit From Benchmark Index
Bank of America Corp. will decide this week whether to remove emerging-market companies from its $2.2 trillion global high-yield index.
The lender asked junk-bond investors to vote on the matter as part of an annual review of its benchmarks after a slew of downgrades to Brazilian and Russian companies boosted their share of the index. Bank of America said it will implement any changes at the end of September.
The decision pits investors who want the measure to better reflect the corporate high-yield market against those seeking higher yields associated with extra political risk. A removal of emerging-market companies, which accounted for 18 percent of the index at the end of March, may trigger price swings if bondholders are forced to sell those securities and raise funding costs for excluded companies.
“It may be detrimental if emerging-market companies were removed from the index because this could narrow the investor b

In [203]:
print(df_useful_sources_short_articles["text_full"].values[:10][5])

BP less likely to be acquired after $18.7 bln settlement - CEO
49.90 -0.100 
LONDON, July 28 (Reuters) - BP is less likely to be acquired following its $18.7 billion settlement over the 2010 Macondo oil spill, Chief Executive Officer Bob Dudley said on Tuesday. 
The has been much speculation in recent months that the British oil and gas giant could become an acquisition target for a larger rival, leading the British government to warn it would oppose any takeover bid. 
While some analysts expected BP to be even more attractive following its settlement with the U.S. authorities this month to resolve most claims from the Gulf of Mexico oil spill, Dudley sought to quell any speculation. 
"As a result of the settlement in the U.S. it is actually less likely that someone would want to acquire BP and it is certainly not our intention to put the company up for sale," Dudley told reporters. 
Merger and acquisition activity in the energy sector has been rekindled as company valuations slumped a

In [204]:
max_concurrent_requests = 5
res = await main_sentiment_labelling(max_concurrent_requests, df_useful_sources_short_articles["text_full"].values[:20]) # asyncio.run(main())

In [205]:
res

[{'text': 'Anadarko not interested in selling down Mozambique stake -CEO\nHOUSTON, July 29 Anadarko Petroleum Corp has no interest in selling down its stake in its planned liquefied natural gas (LNG) project in Mozambique, the company\'s chief executive said on Wednesday.\nThe Houston company\'s desire to sell some of its interest in the project "approaches zero," CEO Al Walker told investors on a conference call to discuss second-quarter earnings.\nAnadarko has a 26.5 percent stake in the Area 1 license in the Rovuma Basin offshore Mozambique.\n(Reporting by Anna Driver )',
  'prediction': '1'},
 {'text': 'Greek debt restructuring is inevitable, says IMF chief\nWASHINGTON, July 29 Greece\'s international creditors will have no choice but to accept an easing of the terms of Athens\' debts, the head of the International Monetary Fund said on Wednesday.\n"It\'s inevitable that there is an element of debt restructuring," IMF Managing Director Christine Lagarde said at a news conference.\n

# Conclusion

We can see in the above that the latest GPT-4 model `gpt-4-1106-preview` is able to perform sentiment analysis relatively well in our tests. Next let's try this on labelling the entire dataset of interest for training, validation, and testing.

In [130]:
res[4]

{'text': 'Junk Bondholders Weigh Emerging-Market Exit From Benchmark Index\nBank of America Corp. will decide this week whether to remove emerging-market companies from its $2.2 trillion global high-yield index.\nThe lender asked junk-bond investors to vote on the matter as part of an annual review of its benchmarks after a slew of downgrades to Brazilian and Russian companies boosted their share of the index. Bank of America said it will implement any changes at the end of September.\nThe decision pits investors who want the measure to better reflect the corporate high-yield market against those seeking higher yields associated with extra political risk. A removal of emerging-market companies, which accounted for 18 percent of the index at the end of March, may trigger price swings if bondholders are forced to sell those securities and raise funding costs for excluded companies.\n“It may be detrimental if emerging-market companies were removed from the index because this could narrow 

In [131]:
res[3]

{'text': 'Chevron to lay off 1,500 workers amidst oil price slump\nJuly 28 Chevron Corp, the second-largest U.S. oil company, said on Tuesday it would lay off 1,500 employees, about 2 percent of its global work force, as it trims costs to offset declining crude prices.\nNearly all of the layoffs will be in Texas, where the company has expanded in recent years to develop land in the Permian shale formation, and California, where Chevron is headquartered.\nFifty international employees will be laid off and roughly 600 contractor positions will be canceled, the company said in a statement.\nOf the 1,500 jobs being eliminated, 270 are currently empty and will not be filled, Chevron said.\nChevron had previously labeled the Permian as one of its premium assets.\nOil prices have plunged by about 55 percent in the past year due to oversupply concerns both in the United States and internationally.\n"In light of the current market environment, Chevron is taking action to reduce internal costs i