# Prompt Engineering: Use OpenAI to Analyze Twitter Data 
This is a simple tutorial teaching prompt engineering basics and analyzing Twitter data with OpenAI large language models (LLM).
Please purchase an [OpenAI API](https://openai.com/index/openai-api/) and store it in a safe place. This tutorial use [AWS Secretes Manager](https://aws.amazon.com/secrets-manager/) to store the API keys.  

## Large Language Model Basics
LLM repeatable predicts the next world using supervised learning. To predict the following sentence: 

`Learning data science in the cloud with AI`

A model needs to learn to predict the following steps:

|Input|Output|
|:---|---|
|Learning data science |in |
|Learning data science in |the | 
|Learning data science in the |cloud |
|Learning data science in the cloud |with |
|Learning data science in the cloud with |AI|

To train a LLM model:
1. Training a base LLM model on a large amount of training data to predict the next word 
2. Fine-tune on examples where outputs follow instructions in the input 
3. Human rates quality of different LLM outputs 
4. Tune LLM to generate outputs with higher rates using RLHF (Reinforcement learning from human feedback)

## Set up OpenAI Models

Load the API keys with AWS Secrets Manage Function 

In [1]:
import boto3
from botocore.exceptions import ClientError
import json

def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

Install openai package

In [2]:
pip install openai

Collecting openai
  Downloading openai-1.52.1-py3-none-any.whl.metadata (24 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Downloading openai-1.52.1-py3-none-any.whl (386 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.9/386.9 kB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading jiter-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (325 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m325.2/325.2 kB[0m [31m52.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jiter, distro, openai
Successfully installed distro-1.9.0 jiter-0.6.1 openai-1.52.1
Note: you may need to restart the kernel to use updated packages.


Load the OpenAI API key and define a `openai_help` function.

In [3]:
from openai import OpenAI

openai_api_key  = get_secret('openai')['api_key']
client = OpenAI(api_key=openai_api_key)
model = 'gpt-4o'
temperature = 0

def openai_help(messages, model=model, temperature =temperature ):
    messages = messages
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature

    )
    return response.choices[0].message.content

Temperature: 
- Low temperature: always choose the most likely response, reliable, predictable responses  
- High temperature: diverse responses, more creative responses

Tokens and Models: 
- LLM predicts tokens, which are commonly occurring sequences of characters. 
- One token is about four characters in English, and 100 tokens are roughly 75 words. Check [token estimate](https://platform.openai.com/tokenizer).
- Different models can process various amounts of tokens with different performance and cost. Check [OpenAI models](https://platform.openai.com/docs/models) for more details.

Roles:
- system: specify the overall tone or behavior of the assistant 
- user: instruction given to the LLM
- assistant: LLM responsed content, we also can provide content in few-shot promoting or histories of conversations


A simple example using [gtp-4o](https://platform.openai.com/docs/models/gpt-4o) and temperature 0.

In [6]:
messages = [{"role": "user", "content": "What is the capital of California"}]

print(openai_help(messages))

The capital of California is Sacramento.


Add a system message asking LLM to act as a high school teacher with different temperatures.

In [7]:
messages = [
    {"role": "system", "content": "use tone as a high school teacher"},
    {"role": "user", "content": "What is the capital of USA"}
    ]

print(openai_help(messages, temperature = 0.8))

The capital of the United States is Washington, D.C. It's an important city because it's where the federal government operates, including the President, Congress, and the Supreme Court. If you have any more questions about U.S. geography or history, feel free to ask!


Add assistant messages to teach LLM what `##` is.

In [8]:
messages = [
    {"role": "user", "content": "What is 1##1"},
    {"role": "assistant", "content": "it is 11"},
    {"role": "user", "content": "What is 2##2"},
    {"role": "assistant", "content": "it is 22"},
    {"role": "user", "content": "What is 3##3"},
    ]
print(openai_help(messages))

It is 33.


## Prompt Engineering Principles 
- Use delimiters to separate different parts of a prompt to provide clear instructions and prevent prompt injections.
- Structure outputs in JSON documents or other formats to use the outputs in subsequent steps 
- Few-shot promoting: provide successful examples of a task and then ask the model to perform a similar task. 
- Chain of thought reasoning: request a series of reasoning steps in prompts to help the model achieve correct answers
- Chain of prompts: split a task into multiple prompts where each prompt can focus on a sub-task at a time and take different actions at different stages. It saves tokens, is easier to test, can involve human input, or use external tools.
- Interactive process 
  1. Try something first 
  2. Analyses the result, identify errors, and redefine the prompt 
  3. Test the prompts with different datasets 


An example using delimiters, structured output and few-shot promoting:

In [9]:
delimiter = '###'
sentence1 = 'I love cat.'
sentence2 = 'I love dog.'
messages = [
    {"role": "system", "content": f"""analyze the sentiment in a sentence delimitered by {delimiter},
                                     return the result as a JSON document"""},
    {"role": "user", "content": f"{delimiter}{sentence1}{delimiter}"},
    {"role": "assistant", "content": "{sentiment:positive}"},
    {"role": "user", "content": f"{delimiter}{sentence2}{delimiter}"}
    ]

print(openai_help(messages))

{ "sentiment": "positive" }


## Analyze Twitter data

Load some Twitter data from a text file. 

In [10]:
f =open("tweet_collection.txt", "r")
tweets = f.read()
f.close()

### Summarization 
- Analyze election tweets with delimiters 
- Change the size of the summarization 
- Summarize tweets and focus on different perspectives. 

In [12]:
tweet_sample = tweets.split("\n")[:50]
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimitered by {delimiter}"""},
    {"role": "user", "content": f"{delimiter}{tweet_sample}{delimiter}"},
    ]

print(openai_help(messages))

The tweets discuss various aspects of the upcoming election, including concerns about voter fraud, election interference, and political strategies. There are mentions of Kamala Harris's campaign efforts, with some questioning her strategy and others supporting her. Donald Trump and his supporters are vocal about election integrity, with some expressing skepticism about the election outcome. There are also discussions about media bias, early voting, and the importance of the election. Additionally, there are references to political figures and parties, such as Glenn Youngkin, Joe Rogan, and the Republican and Democratic parties, highlighting the contentious and high-stakes nature of the election.


In [11]:
tweet_sample = tweets.split("\n")[:10]
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimitered by {delimiter},
                                    limit the summary to 20 words"""},
    {"role": "user", "content": f"{delimiter}{tweet_sample}{delimiter}"},
    ]

print(openai_help(messages))

Tweets discuss election concerns, including voter fraud, political strategies, and potential election rigging, highlighting high stakes.


In [13]:
tweet_sample = tweets.split("\n")[:10]
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimitered by {delimiter},
                                    focuse on how people discuss about AI,
                                    limit the summary to 50 words"""},
    {"role": "user", "content": f"{delimiter}{tweet_sample}{delimiter}"},
    ]

print(openai_help(messages))

The tweets focus on political discussions, with no direct mention of AI. Conversations revolve around election concerns, voter fraud, and political strategies, highlighting the contentious nature of the upcoming election.


### Moderation 
- Iterate each tweet and use the [moeration endpoint](https://platform.openai.com/docs/api-reference/moderations) to identify flagged tweets
- Print flagged tweets


In [16]:
def flag_help(tweet):
    response = client.moderations.create(
        model="omni-moderation-latest",
        input=tweet)

    if response.results[0].flagged:
        print('===')
        cat_dict = response.results[0].categories.to_dict()
        for cat in cat_dict.keys():
            if cat_dict.get(cat):
                print (cat)
                print (tweet)

In [17]:
for tweet in tweets.split('\n')[60:70]:
    flag_help(tweet)

===
violence
RT @ecomarxi: “There’s an election in three weeks. We might lose because we’re committing genocide.”\n\n“I know! Let’s promise to do what we…
===
harassment
@RepSwalwell Your desperation is hilarious.  Election night will even be funnier.


### Transforming
- Translating to a different language 
- Transform tones, such as formal vs. informal.  


In [18]:
tweet_sample = tweets.split("\n")[:10]

for tweet in tweet_sample:
    messages = [
        {"role": "system", "content": f"""translate the tweets delimitered by {delimiter} into Chinese"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]

    print(openai_help(messages).strip(delimiter))

“转发 @MikeNellis: 我说过在选举日之前我会再做一次，所以我们开始吧……\n\n我将捐款给 @KamalaHarris 和其他民主党候选人……”
所有政治都是地方性的吗？这次选举不是这样 https://t.co/rzmTMWE3dc https://t.co/NYCmAYPTb9
“转发 @NotHoodlum: 格伦·杨金对选民欺诈非常关心。然而，当他17岁的儿子被抓到时，他却只字未提……”
"立即阅读：‘她为什么不努力工作’：政治专家质疑哈里斯的策略——哈里斯的低调日程引发政治专家的担忧，他们质疑在距离选举不到三周的情况下，她为何缺乏紧迫感……https://t.co/Mvhejh8Ajo"
“转推 @CollinRugg: 最新消息：MSNBC的Joy Reid提出新的阴谋论，指责*共和党人*操纵选举，称美国已经被…”
转发 @BillieJeanKing：距离可能是美国历史上最重要的选举还有整整3周的时间。 \n\n无论如何强调都不为过…
RT @CallForCongress: 这是一个选举策略\n\n立即实施武器禁运
“转发 @ScottAdamsSays: 如果特朗普在当前条件下输掉选举，我将不接受选举结果的有效性。\n\n我们距离…”
"@GuntherEagleman 我不知道，当她赢得选举时你会离开吗？"
RT @amarDgreat: @sardesairajdeep @ECISVEEP 首席选举专员在新闻发布会上表示：\n\n在哈里亚纳邦结果公布日：\n我们的计票开始…


In [19]:
tweet_sample = tweets.split("\n")[:10]

for tweet in tweet_sample:
    messages = [
        {"role": "system", "content": f"""rewrite the tweets delimitered by {delimiter} in the tone of 6 years old kid """},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]

    print(openai_help(messages).strip(delimiter))

"RT @MikeNellis: I'm gonna do this one more time before the big voting day, so here we go... I'm gonna give some of my allowance to @KamalaHarris and other people who need it for the election stuff!"
Politics is like a big game! This time, it's not just about our neighborhood! Look at this cool link! https://t.co/rzmTMWE3dc https://t.co/NYCmAYPTb9
"RT @NotHoodlum: Glenn Youngkin is super worried about people cheating in voting. But he didn't say anything when his own kid, who's 17, got in trouble for it! That's kinda silly, right?"
"Hey! Why isn't she working hard? Some grown-ups are wondering why Harris isn't doing lots of stuff with only a little bit of time left before the big election day! They think she should be super busy! https://t.co/Mvhejh8Ajo"
"RT @CollinRugg: Guess what! Joy Reid on TV said that the Republicans are playing a sneaky game with the election, like when you hide the last cookie! She thinks America is like a big game of pretend!"
RT @BillieJeanKing: Only three mo

### Inferring
- Use step-by-step instructions with delimiters to:
  1. Identify sentiments
  2. Identify emotions
  3. Extract mentioned people's names
  3. Identify whether a tweet supports Democratic, Republican, or unknown 
  4. Extract outputs into a structured JSON document. 
- Identify topics from Tweets. 


In [20]:
tweet_sample = tweets.split("\n")[:10]

for tweet in tweet_sample:
    messages = [
        {"role": "system", "content": f"""analyze the tweet delimitered by {delimiter} in the following steps:
                                        step 1 {delimiter} identify the tweet sentiment in a single word, either positive, negative or neutral;
                                        step 2 {delimiter} identify the emotions expressed in the tweet with a single word;
                                        step 3 {delimiter} extract the mentioned peoples;
                                        step 4 {delimiter} detect whether the tweet support Democra or Replublican, return the resunt in a singple word;
                                        step 5 {delimiter} organize the result in a json document with the keys <sentiment>, <emontion>,<mentioned>, <support>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]
    print(openai_help(messages))

{
  "sentiment": "positive",
  "emotion": "supportive",
  "mentioned": ["MikeNellis", "KamalaHarris"],
  "support": "Democrat"
}
{
  "sentiment": "neutral",
  "emotion": "indifference",
  "mentioned": [],
  "support": "neutral"
}
{
  "sentiment": "negative",
  "emotion": "concern",
  "mentioned": ["Glenn Youngkin"],
  "support": "Democrat"
}
{
  "sentiment": "negative",
  "emotion": "concern",
  "mentioned": ["Harris"],
  "support": "Republican"
}
{
  "sentiment": "negative",
  "emotion": "accusatory",
  "mentioned": ["Collin Rugg", "Joy Reid"],
  "support": "Democrat"
}
{
  "sentiment": "neutral",
  "emotion": "anticipation",
  "mentioned": ["BillieJeanKing"],
  "support": "neutral"
}
{
  "sentiment": "neutral",
  "emotion": "cynicism",
  "mentioned": ["CallForCongress"],
  "support": "neutral"
}
{
  "sentiment": "negative",
  "emotion": "distrust",
  "mentioned": ["ScottAdamsSays", "Trump"],
  "support": "Republican"
}
{
  "sentiment": "neutral",
  "emotion": "curiosity",
  "mentione

In [21]:
tweet_sample = tweets.split("\n")[:10]

messages = [
        {"role": "system", "content": f"""analyze the tweet delimitered by {delimiter} to identify 10 topics, 
                                  Do not wrap the json codes in JSON markers """},
        {"role": "user", "content": f"{delimiter}{tweet_sample}{delimiter} "}]
print(openai_help(messages))

{
  "topics": [
    "Election Day Donations",
    "Local vs National Politics",
    "Voter Fraud Concerns",
    "Kamala Harris's Campaign Strategy",
    "Election Rigging Accusations",
    "Significance of Upcoming Election",
    "Election Ploys and Strategies",
    "Election Outcome Acceptance",
    "Election Results Speculation",
    "Election Process and Counting"
  ]
}


### Expanding with multiple prompts 
- Identify which party receives majority supports
- Provide contexts in the system message
- Create a chatbot to answer users’ inquiry  


In [25]:
tweet_sample = tweets.split("\n")[:100]
analysis_result = []
from tqdm import tqdm
for tweet in tqdm(tweet_sample):
    messages = [
        {"role": "system", "content": f"""analyze the tweet delimitered by {delimiter} in the following steps:
                                        step 1 {delimiter} identify the tweet sentiment in a single word, either positive, negative or neutral;
                                        step 2 {delimiter} identify the emotions expressed in the tweet with a single word;
                                        step 3 {delimiter} extract the mentioned peoples;
                                        step 4 {delimiter} detect whether the tweet support Democratic or Replublican, return the resunt in a singple word;
                                        step 5 {delimiter} organize the result in a json document with the keys <sentiment>, <emontion>,<mentioned>, <support>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]
    analysis_result.append(openai_help(messages))
print(analysis_result)

100%|██████████| 100/100 [01:42<00:00,  1.03s/it]






In [24]:
messages = [
        {"role": "system", "content": f"""analyze the tweet analysis reuslt delimitered by {delimiter} in the following steps:
                                        step 1 {delimiter} count the number of tweets that support Democrat and Republican;
                                        step 2 {delimiter} identify the common sentiments and emotoions to each mentioned people;
                                        step 3 {delimiter} organize the result in a json document with keys <Democat count>, <Republican count>, <people name>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{analysis_result}{delimiter} "}]
analysis_summary = openai_help(messages)
print(analysis_summary)

{
  "Democrat count": 16,
  "Republican count": 34,
  "Kamala Harris": {
    "sentiments": ["positive", "negative", "neutral"],
    "emotions": ["supportive", "admiration", "satisfaction", "frustration", "criticism", "anticipation"]
  },
  "Donald Trump": {
    "sentiments": ["negative", "neutral"],
    "emotions": ["outrage", "humiliation", "concern", "informative", "anger"]
  },
  "MikeNellis": {
    "sentiments": ["positive"],
    "emotions": ["supportive"]
  },
  "Glenn Youngkin": {
    "sentiments": ["negative", "neutral"],
    "emotions": ["concern", "indifference"]
  },
  "Scott Adams": {
    "sentiments": ["negative"],
    "emotions": ["distrust"]
  },
  "MTGrepp": {
    "sentiments": ["negative", "neutral"],
    "emotions": ["frustration"]
  },
  "SenJohnKennedy": {
    "sentiments": ["negative"],
    "emotions": ["frustration"]
  },
  "Biden-Harris": {
    "sentiments": ["negative"],
    "emotions": ["frustration"]
  },
  "Joe Rogan": {
    "sentiments": ["negative"],
    "em

In [26]:
from openai import OpenAI

openai_api_key  = get_secret('openai')['api_key']
client = OpenAI(api_key=openai_api_key)
model = 'gpt-4o'
temperature = 0

chat_history = [

{"role": "system", "content": f"""you are an election chabot anwser user questions based on the tweets {delimiter} to answer user questions,
                                {delimiter}{tweet_sample}{delimiter} 
                                if user mentioned a people name in the {analysis_summary} people field,report the corresponding sentiment and emotion,
                            
                            """}
]

def chatbot(prompt):

    chat_history.append({"role": "user", "content": prompt})

    response = client.chat.completions.create(
        model=model,  # Use the model you prefer
        messages=chat_history
    )

    reply = response.choices[0].message.content

    chat_history.append({"role": "assistant", "content": reply})
    
    return reply

In [None]:
while True:
    user_input = input("You: ")
    if user_input.lower() in ['exit', 'quit']:
        print("Chatbot: Goodbye!")
        break
    reply = chatbot(user_input)
    print(f"Chatbot: {reply}")

You:  my name is kevin


Chatbot: Hello Kevin! How can I assist you today? Do you have any questions about the election or anything else?


You:  which party receves more support


Chatbot: Based on the data, the Republican count is higher at 34 mentions compared to the Democrat count, which is 16 mentions. This suggests that Republicans received more attention in the context of the provided tweets.


You:  what do people talk about harris


Chatbot: In the tweets mentioning Kamala Harris, the sentiments are mixed with positive, negative, and neutral tones. The emotions expressed include support, admiration, satisfaction, frustration, criticism, and anticipation. This indicates a diverse range of opinions and feelings towards her.


## Reference
- *“ChatGPT Prompt Engineering for Developers - DeepLearning.AI.”* n.d. DeepLearning.AI - Learning Platform. Accessed October 17, 2024. https://learn.deeplearning.ai/courses/chatgpt-prompt-eng/lesson/1/introduction.

- *“Building Systems with the ChatGPT API - DeepLearning.AI.”* n.d. DeepLearning.AI - Learning Platform. Accessed October 17, 2024. https://learn.deeplearning.ai/courses/chatgpt-building-system/lesson/1/introduction.

- *“OpenAI Documents.”* n.d. OpenAI Platform. Accessed October 18, 2024. https://platform.openai.com.
