# Prompt Engineering: Use OpenAI to Analyze Twitter Data 
This is a simple tutorial teaching prompt engineering basics and analyzing Twitter data with OpenAI large language models (LLM).
Please purchase an [OpenAI API](https://openai.com/index/openai-api/) and store it in a safe place. This tutorial uses [AWS Secretes Manager](https://aws.amazon.com/secrets-manager/) to store the API keys.  

## Large Language Model Basics
LLM repeatable predicts the next world using supervised learning. To predict the following sentence: 

`Learning data science in the cloud with AI`

A model needs to learn to predict the following steps:

|Input|Output|
|:---|---|
|Learning data science |in |
|Learning data science in |the | 
|Learning data science in the |cloud |
|Learning data science in the cloud |with |
|Learning data science in the cloud with |AI|

To train an LLM model:
1. Training a base LLM model on a large amount of training data to predict the next word 
2. Fine-tune on examples where outputs follow instructions in the input 
3. Human rates quality of different LLM outputs 
4. Tune LLM to generate outputs with higher rates using RLHF (Reinforcement learning from human feedback)

## Set up OpenAI Models

Load the API keys with AWS Secrets Manage Function 

In [1]:
import boto3
from botocore.exceptions import ClientError
import json

def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

## Install Python libraries.

- pymongo: manage the MongoDB database
- openai: call the OpenAI APIs.

In [2]:
pip install openai

Collecting openai
  Downloading openai-1.70.0-py3-none-any.whl.metadata (25 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Downloading openai-1.70.0-py3-none-any.whl (599 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m599.1/599.1 kB[0m [31m25.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading jiter-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (352 kB)
Installing collected packages: jiter, distro, openai
Successfully installed distro-1.9.0 jiter-0.9.0 openai-1.70.0
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install pymongo

Collecting pymongo
  Downloading pymongo-4.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m52.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.11.3
Note: you may need to restart the kernel to use updated packages.


Load the OpenAI API key and define a `openai_help` function.

In [4]:
from openai import OpenAI

openai_api_key  = get_secret('openai')['api_key']
client = OpenAI(api_key=openai_api_key)
model = 'gpt-4o'
temperature = 0

def openai_help(messages, model=model, temperature =temperature ):
    messages = messages
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature

    )
    return response.choices[0].message.content

Temperature: 
- Low temperature: always choose the most likely response, reliable, predictable responses  
- High temperature: diverse responses, more creative responses

Tokens and Models: 
- LLM predicts tokens, which are commonly occurring sequences of characters. 
- One token is about four characters in English, and 100 tokens are roughly 75 words. Check [token estimate](https://platform.openai.com/tokenizer).
- Different models can process various amounts of tokens at different performance levels and costs. Check [OpenAI models](https://platform.openai.com/docs/models) for more details.

Roles:
- system: specify the overall tone or behavior of the assistant 
- user: instruction given to the LLM
- assistant: LLM responded content, we also can provide content in few-shot promoting or histories of conversations


A simple example using [gtp-4o](https://platform.openai.com/docs/models/gpt-4o) and temperature 0.

In [5]:
messages = [{"role": "user", "content": "What is the capital of USA"}]

print(openai_help(messages))

The capital of the United States is Washington, D.C.


Add a system message asking LLM to act as a high school teacher with different temperatures.

In [6]:
messages = [
    {"role": "system", "content": "use tone as a high school teacher"},
    {"role": "user", "content": "What is the capital of USA"}
    ]

print(openai_help(messages, temperature = 0.8))

The capital of the United States is Washington, D.C. It's not only the political heart of the country but also rich in history and culture. If you ever get a chance to visit, you'll find many iconic landmarks like the White House, the Capitol Building, and the Lincoln Memorial.


Add assistant messages to teach LLM what `##` is.

In [7]:
messages = [
    {"role": "user", "content": "What is 1##1"},
    {"role": "assistant", "content": "it is 11"},
    {"role": "user", "content": "What is 2##2"},
    {"role": "assistant", "content": "it is 22"},
    {"role": "user", "content": "What is 3##3"},
    ]
print(openai_help(messages))

It is 33.


## Prompt Engineering Principles 
- Use delimiters to separate different parts of a prompt to provide clear instructions and prevent prompt injections.
- Structure outputs in JSON documents or other formats to use the outputs in subsequent steps 
- Few-shot promoting: provide successful examples of a task and then ask the model to perform a similar task. 
- Chain of thought reasoning: request a series of reasoning steps in prompts to help the model achieve correct answers
- Chain of prompts: split a task into multiple prompts where each prompt can focus on a sub-task at a time and take different actions at different stages. It saves tokens, is easier to test, can involve human input, or use external tools.
- Interactive process 
  1. Try something first 
  2. Analyses the result, identify errors, and redefine the prompt 
  3. Test the prompts with different datasets 


An example using delimiters, structured output and few-shot promoting:

In [8]:
delimiter = '###'
sentence1 = 'I love cat.'
sentence2 = 'I love dog.'
messages = [
    {"role": "system", "content": f"""analyze the sentiment in a sentence delimitered by {delimiter},
                                     return the result as a JSON document"""},
    {"role": "user", "content": f"{delimiter}{sentence1}{delimiter}"},
    {"role": "assistant", "content": "{sentiment:positive}"},
    {"role": "user", "content": f"{delimiter}{sentence2}{delimiter}"}
    ]

print(openai_help(messages))

{ "sentiment": "positive" }


## Analyze Twitter data

### Connect to the MongoDB cluster

In [9]:
import pymongo
from pymongo import MongoClient
mongodb_connect = get_secret('mongodb')['connection_string']

mongo_client = MongoClient(mongodb_connect)
db = mongo_client.demo # use or create a database named demo
tweet_collection = db.tweet_collection #use or create a collection named tweet_collection
tweet_collection.create_index([("tweet.id", pymongo.ASCENDING)],unique = True) # make sure the collected tweets are unique

'tweet.id_1'

### Extract Tweets

In [10]:
filter={

    
}
project={
    'tweet.text': 1, 
    'tweet.id': 1
}
#rename the client to mongo_client
result = mongo_client['demo']['tweet_collection'].find(
  filter=filter,
  projection=project
)

In [11]:
tweet_data = []
for tweet in result:
    tweet_data.append(tweet['tweet']['text'])

In [12]:
print('Number of tweets: ',len(tweet_data))

Number of tweets:  99


### Summarization 
- Analyze election tweets with delimiters 
- Change the size of the summarization 
- Summarize tweets and focus on different perspectives. 

In [13]:
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimited by {delimiter}"""},
    {"role": "user", "content": f"{delimiter}{tweet_data}{delimiter}"},
    ]

print(openai_help(messages))

The tweets discuss various perspectives and developments related to generative AI. Some users express skepticism or criticism, highlighting ethical concerns, the potential for misuse, and the impact on creative industries. Others focus on the technological advancements and applications, such as new AI models, courses, and business opportunities. There are mentions of generative AI's role in art, media, and customer service, as well as its potential to transform industries like gaming and healthcare. Additionally, some tweets emphasize the importance of human creativity and the limitations of AI-generated content.


In [14]:
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimited by {delimiter},
                                    limit the summary to 20 words"""},
    {"role": "user", "content": f"{delimiter}{tweet_data}{delimiter}"},
    ]

print(openai_help(messages))

The tweets discuss generative AI's impact on art, ethics, and industries, highlighting both excitement and criticism.


In [15]:
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimited by {delimiter},
                                    focus on how people discuss AI,
                                    limit the summary to 50 words"""},
    {"role": "user", "content": f"{delimiter}{tweet_data}{delimiter}"},
    ]

print(openai_help(messages))

People discuss AI with a mix of enthusiasm and skepticism. Generative AI is seen as both a tool for innovation and a threat to creativity, with debates on its ethical implications and impact on artists. Some celebrate its potential in fields like customer service and media, while others criticize its environmental impact and authenticity.


### Moderation 
- Iterate each tweet and use the [moeration endpoint](https://platform.openai.com/docs/api-reference/moderations) to identify flagged tweets
- Print flagged tweets


In [16]:
def flag_help(tweet):
    response = client.moderations.create(
        model="omni-moderation-latest",
        input=tweet)

    if response.results[0].flagged:
        print('===')
        cat_dict = response.results[0].categories.to_dict()
        for cat in cat_dict.keys():
            if cat_dict.get(cat):
                print (cat)
                print(tweet)

In [17]:
for tweet in tweet_data:
    flag_help(tweet)

===
harassment
RT @Chorrozard: STOP FUCKING POSTING GENERATIVE AI ON MY FUCKING TIMELINE!!!!!!!!!
===
harassment
RT @Artistreccs: Made this account as a huge fuck you to Ai generative art and the the rest of them losers tbh 

Commission real artists 👍
===
harassment
RT @imzeferino: generative AI art is is soulless, sad and pathetic... i won't even bother calling it ugly, because that is beside the point…
===
harassment
RT @Artistreccs: Made this account as a huge fuck you to Ai generative art and the the rest of them losers tbh 

Commission real artists 👍
===
harassment
RT @Artistreccs: Made this account as a huge fuck you to Ai generative art and the the rest of them losers tbh 

Commission real artists 👍
===
harassment
RT @Artistreccs: Made this account as a huge fuck you to Ai generative art and the the rest of them losers tbh 

Commission real artists 👍


### Transforming
- Translating to a different language 
- Transform tones, such as formal vs. informal.  


In [19]:
for tweet in tweet_data:
    messages = [
        {"role": "system", "content": f"""translate the tweets delimited by {delimiter} into Chinese"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]

    print(openai_help(messages).strip(delimiter))

RT @redstickynotes: 所以这是人身攻击谬误

另外，我有计算机科学学位，我不是艺术家，并且我反对这种生成式……
RT @jblefevre60: 💥8个生成式AI流行词！

#AI #机器学习 #深度学习 #数据科学 #生成式AI #LLM #Python #编程 #100天…
@NatesManagement @Blavin24 @Cedralian 并且对于 @Cedralian 的观点，如果你不喜欢，就不要使用这个平台。
我不同意这个伦理争论，简单的事实是衍生作品和模仿作品在法律上已经被允许超过100年。作为一个在生成式AI出现之前写过超过140个模仿作品的作家
RT @HorrorHijabi: 从未使用过任何类型的生成式AI或参与过任何AI趋势的感受 https://t.co/iTnP1I3ci5
RT @stupidformo: 用某个不仅未同意而且明确反对的人物形象/写作风格来训练生成式AI…
@le_wokeisme 那就不要只监管AI艺术，而是全面监管生成式AI。我认为让人们能够创造出虚假的图像并让人们误以为是真实的，这并不好。
我们很自豪地被 @Gartner_inc 在其2025年生成式AI创新指南中提及。

在 Ada，我们从不认为优秀的客户服务仅仅来自于表面层次的自动化。它需要坚实的基础——如深度知识管理、智能编排，https://t.co/wTFlQ7yc1L
RT @WIRED: 一个生成式AI应用程序使用的不安全数据库泄露了提示和数万张露骨图像，其中一些可能涉及未成年人。
RT @freezetheberry: 在生成式人工智能“艺术”日益流行的时代，我们有这样的艺术家证明机器永远无法替代……
RT @hankgreen: 我们从 Web2 学到的是，最有影响力的企业通过优化成瘾性来变得最有影响力。…
转发 @jblefevre60: 💥8个生成式AI流行词！ #AI #机器学习 #深度学习 #数据科学 #生成式AI #LLM #Python #编程 #100天…
RT @notasnervous: @TheKnownOtaku 当他说那句话时，他甚至没有在评论AI艺术。他当时正在观看一些程序生成的渲染……
RT @nosgov: 这就像说一部电影不是真的，因为它是用摄像机拍摄并在电脑上编辑的，而不是每次都现场重现…

In [21]:
for tweet in tweet_data:
    messages = [
        {"role": "system", "content": f"""rewrite the tweets delimited by {delimiter} in the tone like Stewie """},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]

    print(openai_help(messages).strip(delimiter))

Ah, the classic ad hominem fallacy, how delightfully pedestrian. And might I add, I possess a computer science degree, not that of an artist, and I stand firmly against this Generative nonsense.
Ah, splendid! A delightful collection of generative AI buzzwords to tickle one's intellect. How positively riveting! #AI #MachineLearning #DeepLearning #DataScience #GenerativeAI #LLM #Python #Coding #100DaysOf…
Oh, how delightfully droll! If you find yourself in a tizzy over the platform, might I suggest simply not using it? As for the ethical hullabaloo, do remember that derivative works and parody have been the toast of the legal town for over a century. As a writer with a repertoire of over 140 parodies, long before generative AI decided to join the party, I must say, it's all rather amusing, isn't it?
Oh, how delightfully quaint! It's as if one has been living under a rock, blissfully unaware of the digital revolution swirling around them. Do tell, what is it like to reside in such splendi

KeyboardInterrupt: 

### Inferring
- Use step-by-step instructions with delimiters to:
  1. Identify sentiments
  2. Identify emotions
  3. Extract mentioned people's names
  3. Identify whether a tweet supports Democratic, Republican, or unknown 
  4. Extract outputs into a structured JSON document. 
- Identify topics from Tweets. 


In [22]:
for tweet in tweet_data:
    messages = [
        {"role": "system", "content": f"""analyze the tweet delimited by {delimiter} in the following steps:
                                        step 1 {delimiter} identify the tweet sentiment in a single word, either positive, negative or neutral;
                                        step 2 {delimiter} identify the emotions expressed in the tweet with a single word;
                                        step 3 {delimiter} extract the mentioned peoples;
                                        step 4 {delimiter} detect whether the tweet support Democratic or Replublican, return the resunt in a single word;
                                        step 5 {delimiter} organize the result in a json document with the keys <sentiment>, <emontion>,<mentioned>, <support>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]
    print(openai_help(messages))

{
  "sentiment": "neutral",
  "emotion": "indifference",
  "mentioned": ["redstickynotes"],
  "support": "neutral"
}
{
  "sentiment": "neutral",
  "emotion": "informative",
  "mentioned": ["jblefevre60"],
  "support": "neutral"
}
{
  "sentiment": "neutral",
  "emotion": "disagreement",
  "mentioned": ["@NatesManagement", "@Blavin24", "@Cedralian"],
  "support": "neutral"
}
{
  "sentiment": "neutral",
  "emotion": "indifference",
  "mentioned": ["HorrorHijabi"],
  "support": "neutral"
}
{
  "sentiment": "negative",
  "emotion": "disapproval",
  "mentioned": ["stupidformo"],
  "support": "neutral"
}
{
  "sentiment": "negative",
  "emotion": "concern",
  "mentioned": ["@le_wokeisme"],
  "support": "neutral"
}
{"sentiment": "positive", "emotion": "pride", "mentioned": ["@Gartner_inc"], "support": "neutral"}
{
  "sentiment": "negative",
  "emotion": "concern",
  "mentioned": ["WIRED"],
  "support": "neutral"
}
{
  "sentiment": "neutral",
  "emotion": "indifference",
  "mentioned": ["freezet

KeyboardInterrupt: 

In [23]:

messages = [
        {"role": "system", "content": f"""analyze the tweet delimited by {delimiter} to identify 10 topics, 
                                  Do not wrap the json codes in JSON markers """},
        {"role": "user", "content": f"{delimiter}{tweet_data}{delimiter} "}]
print(openai_help(messages))

{
  "1": "Generative AI and Art",
  "2": "Ethical Concerns of Generative AI",
  "3": "Generative AI in Business and Innovation",
  "4": "Generative AI and Data Security",
  "5": "Generative AI in Education and Learning",
  "6": "Generative AI in Media and Content Creation",
  "7": "Generative AI and Environmental Impact",
  "8": "Generative AI and Legal Issues",
  "9": "Generative AI and Cultural Impact",
  "10": "Generative AI and Technological Advancements"
}


### Expanding with multiple prompts 
- Identify which party receives majority supports
- Provide contexts in the system message
- Create a chatbot to answer users’ inquiry  


In [24]:
analysis_result = []
from tqdm import tqdm
for tweet in tqdm(tweet_data):
    messages = [
        {"role": "system", "content": f"""analyze the tweet delimited by {delimiter} in the following steps:
                                        step 1 {delimiter} identify the tweet sentiment in a single word, either positive, negative or neutral;
                                        step 2 {delimiter} identify the emotions expressed in the tweet with a single word;
                                        step 3 {delimiter} extract the mentioned peoples;
                                        step 4 {delimiter} detect whether the tweet support Democratic or Replublican, return the resunt in a singple word;
                                        step 5 {delimiter} organize the result in a json document with the keys <sentiment>, <emontion>,<mentioned>, <support>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]
    analysis_result.append(openai_help(messages))


100%|██████████| 99/99 [01:29<00:00,  1.11it/s]


In [25]:
print(analysis_result)

['{\n  "sentiment": "neutral",\n  "emotion": "indifference",\n  "mentioned": ["redstickynotes"],\n  "support": "neutral"\n}', '{\n  "sentiment": "neutral",\n  "emotion": "informative",\n  "mentioned": ["jblefevre60"],\n  "support": "neutral"\n}', '{\n  "sentiment": "neutral",\n  "emotion": "disagreement",\n  "mentioned": ["NatesManagement", "Blavin24", "Cedralian"],\n  "support": "neutral"\n}', '{\n  "sentiment": "neutral",\n  "emotion": "indifference",\n  "mentioned": ["HorrorHijabi"],\n  "support": "neutral"\n}', '{\n  "sentiment": "negative",\n  "emotion": "disapproval",\n  "mentioned": ["stupidformo"],\n  "support": "neutral"\n}', '{\n  "sentiment": "negative",\n  "emotion": "concern",\n  "mentioned": ["@le_wokeisme"],\n  "support": "neutral"\n}', '{\n  "sentiment": "positive",\n  "emotion": "pride",\n  "mentioned": [\n    "@Gartner_inc"\n  ],\n  "support": "neutral"\n}', '{\n  "sentiment": "negative",\n  "emotion": "concern",\n  "mentioned": ["WIRED"],\n  "support": "neutral"\n}',

In [26]:
messages = [
        {"role": "system", "content": f"""analyze the tweet analysis reuslt delimited by {delimiter} in the following steps:
                                        step 1 {delimiter} count the number of tweets that support Democratic and Republican;
                                        step 2 {delimiter} identify the common sentiments and emotoions to each mentioned people;
                                        step 3 {delimiter} organize the result in a json document with keys <Democratic count>, <Republican count>, <people name>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{analysis_result}{delimiter} "}]
analysis_summary = openai_help(messages)
print(analysis_summary)

{
  "Democratic count": 0,
  "Republican count": 0,
  "people name": {
    "redstickynotes": {
      "sentiments": ["neutral"],
      "emotions": ["indifference"]
    },
    "jblefevre60": {
      "sentiments": ["neutral"],
      "emotions": ["informative"]
    },
    "NatesManagement": {
      "sentiments": ["neutral"],
      "emotions": ["disagreement"]
    },
    "Blavin24": {
      "sentiments": ["neutral"],
      "emotions": ["disagreement"]
    },
    "Cedralian": {
      "sentiments": ["neutral"],
      "emotions": ["disagreement"]
    },
    "HorrorHijabi": {
      "sentiments": ["neutral"],
      "emotions": ["indifference"]
    },
    "stupidformo": {
      "sentiments": ["negative"],
      "emotions": ["disapproval"]
    },
    "@le_wokeisme": {
      "sentiments": ["negative"],
      "emotions": ["concern"]
    },
    "@Gartner_inc": {
      "sentiments": ["positive"],
      "emotions": ["pride"]
    },
    "WIRED": {
      "sentiments": ["negative"],
      "emotions": ["co

## Create a chatbot

In [27]:
from openai import OpenAI

openai_api_key  = get_secret('openai')['api_key']
client = OpenAI(api_key=openai_api_key)
model = 'gpt-4o'
temperature = 0

chat_history = [

{"role": "system", "content": f"""you are a chabot answer user questions based on the tweets,
                                {delimiter}{tweet_data}{delimiter}, 
                                if user mentioned a people name in the {delimiter}{analysis_summary}{delimiter} people field,report the corresponding sentiment and emotion,
                            
                            """}
]

def chatbot(prompt):

    chat_history.append({"role": "user", "content": prompt})

    response = client.chat.completions.create(
        model=model,  # Use the model you prefer
        messages=chat_history
    )

    reply = response.choices[0].message.content

    chat_history.append({"role": "assistant", "content": reply})
    
    return reply

In [None]:
while True:
    user_input = input("You: ")
    if user_input.lower() in ['exit', 'quit']:
        print("Chatbot: Goodbye!")
        break
    reply = chatbot(user_input)
    print(f"Chatbot: {reply}")

## Reference
- Isa Fulford and Andrew Ng. n.d.-a. *“Building Systems with the ChatGPT API.”* DeepLearning.AI. Accessed October 25, 2024. https://www.deeplearning.ai/short-courses/building-systems-with-chatgpt/.
- ———. n.d.-b. *“ChatGPT Prompt Engineering for Developers.”* DeepLearning.AI. Accessed October 25, 2024. https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/.
- OpenAI. n.d. *“OpenAI Documents.”* OpenAI. Accessed October 18, 2024. https://platform.openai.com.
