# OpenAI Responses API: Advanced Tweet Analysis with File & Web Search Integration

## What is the OpenAI Responses API?

The Responses API is a new API released in March 2025. It is a combination of the traditional 
Chat Completions API and the Assistants API, providing support for:

- **Traditional Chat Completions:** Facilitates seamless conversational AI experiences.
- **Web Search:** Enables real-time information retrieval from the internet.
- **File Search:** Allows searching within files for relevant data.

Accordingly, the Assistants API will be retired in 2026. 

> **For new users, OpenAI recommends using the Responses API instead of the Chat Completions API to leverage its expanded capabilities.**

For a comprehensive comparison between the Responses API and the Chat Completions API, refer to the official OpenAI documentation: 
[Responses vs. Chat Completions](https://platform.openai.com/docs/guides/responses-vs-chat-completions).

## Summary of This Notebook
This notebook provides a hands-on guide for using the **OpenAI Responses API** to analyze tweets. 
It covers essential techniques such as:

- **Connecting to a MongoDB database** to store and retrieve tweets.
- **Extracting tweets** and converting them into a structured format for further analysis.
- **Creating a vector store** and uploading tweets for semantic search.
- **Using file search** to analyze private datasets.
- **Performing web search** to retrieve the latest public information.
- **Utilizing stateful responses** to maintain conversation context.
- **Combining file and web search** to enhance retrieval-augmented generation (RAG) applications.

By the end of this notebook, users will be able to integrate OpenAI's Responses API for efficient data retrieval 
and analysis of structured and unstructured data.

## Install Required Libraries
To use the OpenAI Responses API and interact with a MongoDB database, we need to install the following libraries:

- **`openai`**: Provides access to OpenAI's APIs, including the Responses API
- **`pymongo`**: A Python driver for MongoDB to store and retrieve tweets.

In [1]:
pip install openai pymongo -q

Note: you may need to restart the kernel to use updated packages.


## Import Required Libraries

In [2]:
from IPython.display import Markdown, display
import boto3
from botocore.exceptions import ClientError
import json
import io

## Retrieve Secrets from AWS Secrets Manager

In [3]:
def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

## Connect to MongoDB

In [4]:
import pymongo
from pymongo import MongoClient
mongodb_connect = get_secret('mongodb')['connection_string']

mongo_client = MongoClient(mongodb_connect)
db = mongo_client.demo # use or create a database named demo
tweet_collection = db.tweet_collection #use or create a collection named tweet_collection
# tweet_collection.create_index([("tweet.id", pymongo.ASCENDING)],unique = True) # make sure the collected tweets are unique

## Extract Tweets from MongoDB

In [5]:
filter={

    
}
project={
    'tweet.text': 1,
    '_id':0
}
#rename the client to mongo_client
result = mongo_client['demo']['tweet_collection'].find(
  filter=filter,
  projection=project
)

After retrieving tweets from MongoDB, we convert the query result into a list format for easier processing.
The data is then serialized into a JSON-formatted string, ensuring it can be properly stored and shared across different services.
Using `io.BytesIO`, we create an in-memory JSON file, eliminating the need for disk writes.
This approach is particularly useful for applications that require temporary file storage, such as uploading datasets
to OpenAI's file search API or cloud storage for further analysis.

In [6]:
result_list = list(result)

# Convert result list to JSON string
json_data = json.dumps(result_list, default=str, indent=4)

# Create an in-memory JSON file
json_bytes = io.BytesIO(json_data.encode("utf-8"))
json_bytes.name = "tweet.json" 

In [7]:
print('Number of tweets: ',len(result_list))

Number of tweets:  187


## Initialize OpenAI Client

In [8]:
from openai import OpenAI
openai_api_key  = get_secret('openai')['api_key']

client = OpenAI(api_key=openai_api_key)

## File Search API

### Introduction to File Search
File search API enables efficient retrieval of relevant information 
from uploaded files by leveraging vector-based indexing. This feature is particularly useful 
for searching large datasets, extracting insights, and improving retrieval-augmented generation (RAG) applications.

Unlike traditional keyword-based searches, the Responses API uses embeddings 
to identify semantically relevant content, making it ideal for analyzing structured 
and unstructured text data (OpenAI, 2025).

For more details, visit the official OpenAI documentation: 
[File Search in Responses API](https://platform.openai.com/docs/guides/tools-file-search).

### Create a Vector Store

In [9]:
vector_store = client.vector_stores.create(
    name="tweet_base"
)
vector_store_id = vector_store.id
# print(vector_store_id)

### Upload Tweets File

In [10]:
file = client.files.create(
            file=json_bytes,
            purpose="assistants",)

file_id = file.id
# print(file_id)

### Attach File to Vector Store

In [11]:
attach_status =client.vector_stores.files.create(
    vector_store_id=vector_store_id,
    file_id=file_id
            )

# print(attach_status.id)

### Query the Vector Store

In [12]:
query = "current cyber threats"

In [13]:
search_results = client.vector_stores.search(
    vector_store_id=vector_store_id,
    query=query
)

for result in search_results.data[:5]:
    print(result.content[0].text[:100] + '\n Relevant score: ' + str(result.score))

{
        "tweet": {
            "text": "RT @Arcteraio: Is Your Data Truly Protected? \ud83d\udd12\
 Relevant score: 0.6960252957328726
{
        "tweet": {
            "text": "Cyber threats are constant, but so is our commitment to pr
 Relevant score: 0.6352605431429063
Yes! \ud83d\udd12\n\nWith quantum computing on the horizon, traditional encrypt\u2026"
        }
   
 Relevant score: 0.6256805606161229
including groups linked to state-sponsored"
        }
    },
    {
        "tweet": {
            "t
 Relevant score: 0.6256509189654528
\n\nhttps://t.co/krKYfaomlN https://t.co/1E6uxXUMw6"
        }
    },
    {
        "tweet": {
     
 Relevant score: 0.6148298490216632


## OpenAI Response API

### Simple Response

In [14]:
simple_response = client.responses.create(
  model="gpt-4o",
  input=[
      {
          "role": "user",
          "content": query
      }
  ]
)

In [15]:
display(Markdown(simple_response.output_text))

As of the latest information, several prominent cyber threats are affecting individuals and organizations worldwide:

1. **Ransomware**: This remains a significant threat, with attackers encrypting data and demanding payment for its release. High-profile attacks target critical infrastructure and large enterprises.

2. **Phishing and Spear Phishing**: Cybercriminals are using increasingly sophisticated tactics to trick individuals into revealing sensitive information via deceptive emails, messages, or websites.

3. **Malware**: This includes various malicious software types, such as viruses, worms, and trojans, used to damage or infiltrate systems.

4. **Credential Stuffing**: Attackers use automated tools to try a large number of username-password combinations to gain unauthorized access to accounts.

5. **Business Email Compromise (BEC)**: Scammers impersonate company executives or trusted contacts to trick employees into transferring funds or sharing sensitive information.

6. **Distributed Denial-of-Service (DDoS) Attacks**: Attackers flood a network with traffic to disrupt service, often demanding ransom to cease the attack.

7. **Zero-Day Exploits**: These are attacks on software vulnerabilities before developers can issue a patch or fix, targeting both well-known and emerging software.

8. **IoT Vulnerabilities**: With the rise of the Internet of Things, weak security in connected devices can be exploited to gain network access.

9. **Insider Threats**: Employees or contractors with access to sensitive information may intentionally or unintentionally compromise security.

10. **Deepfakes and Synthetic Media**: These technologies are used for misinformation, impersonation, and fraud, making detection challenging.

11. **Supply Chain Attacks**: Cybercriminals target less secure elements of the supply chain to infiltrate larger organizations.

Staying informed about these threats and implementing strong cybersecurity measures is essential to mitigate risk.

### File Search Response

In [16]:

file_search_response = client.responses.create(
    input= query,
    model="gpt-4o",
    temperature = 0,
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id],
    }]
)

In [17]:
display(Markdown(file_search_response.output_text))


Current cyber threats include:

1. **Ransomware Attacks**: These are increasingly targeting various sectors, including architecture and corporate environments.

2. **Phishing and Social Engineering**: These tactics are used to deceive individuals into revealing sensitive information.

3. **State-Sponsored Attacks**: Groups linked to state actors, such as those from China, are targeting critical infrastructure.

4. **AI-Driven Threats**: AI is being used both to enhance cybersecurity and to develop more sophisticated cyber threats.

5. **Financial Crime in Virtual Assets**: The digital asset ecosystem is a new battlefield for cyber threats.

6. **Critical Infrastructure Vulnerabilities**: Cyber threats to transportation infrastructure could impact military mobilization.

7. **Quantum Computing Risks**: Emerging technologies like quantum computing pose new challenges to traditional encryption methods.

These threats highlight the need for robust cybersecurity measures and continuous adaptation to evolving risks.

## Web Search API

### Introduction to Web Search
The OpenAI Web Search tool allows models to retrieve real-time information from the internet. 
This capability is particularly useful for obtaining up-to-date data, fact-checking, and expanding knowledge 
without relying solely on pre-trained information. 

By leveraging OpenAI's web search functionality, the Responses API can fetch external data 
and provide accurate, relevant results in real time (OpenAI, 2025). 
This feature enhances applications that require the latest insights, such as news aggregation, research, 
or dynamic content generation.

For more details, visit the official OpenAI documentation: 
[Web Search in Responses API](https://platform.openai.com/docs/guides/tools-web-search).

### Perform Web Search

In [18]:
web_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= query,
    tools=[
        {
            "type": "web_search"
        }
    ]
)

In [19]:
display(Markdown(web_search_response.output_text))

As of April 2025, the cyber threat landscape is increasingly complex, with several prominent threats affecting various sectors:

**1. State-Sponsored Cyberattacks:**
Geopolitical tensions have led to a surge in cyberwarfare activities. Notably, in 2024, Russian-linked hackers targeted rural Texas water plants, highlighting vulnerabilities in critical infrastructure. These attacks often serve as tests or warnings rather than causing immediate harm. Additionally, countries like China and Iran have been implicated in digital espionage campaigns aimed at stealing sensitive data and planting malware for potential future conflicts. ([apnews.com](https://apnews.com/article/9eceaf30ddc984ed482f067db5dee405?utm_source=openai))

**2. AI-Driven Cybercrime:**
Criminal networks are increasingly leveraging artificial intelligence to enhance the speed, reach, and sophistication of their operations. This includes creating advanced malware, generating deceptive synthetic media, and conducting targeted cyberattacks on governments and critical infrastructure. Europol has raised concerns about these AI-driven attacks, emphasizing the need for robust regulations and international cooperation to mitigate these evolving threats. ([ft.com](https://www.ft.com/content/755593c8-8614-4953-a4b2-09a0d2794684?utm_source=openai))

**3. Ransomware Attacks:**
Ransomware remains a significant threat, with attacks becoming more frequent and sophisticated. The Royal ransomware group, active since 2022, is known for aggressive targeting, high ransom demands, and double extortion tactics. They have impacted various sectors, including healthcare, finance, and critical infrastructure, with ransom demands ranging from $250,000 to over $2 million. ([en.wikipedia.org](https://en.wikipedia.org/wiki/Royal_%28cyber_gang%29?utm_source=openai))

**4. Supply Chain Attacks:**
Cybercriminals are increasingly targeting supply chains to infiltrate organizations. In October 2024, attackers published over 287 malicious packages on the Node Package Manager (NPM) platform, using typosquatting techniques to trick developers into downloading compromised code. Such attacks can have widespread implications, affecting numerous organizations and their customers. ([en.wikipedia.org](https://en.wikipedia.org/wiki/Supply_chain_attack?utm_source=openai))

**5. Exploitation of Zero-Day Vulnerabilities:**
There has been a notable increase in the exploitation of zero-day vulnerabilities. In 2023, 97 unique zero-day vulnerabilities were exploited in the wild, a 56% increase from the previous year. Chinese cyber espionage groups were particularly active in leveraging these vulnerabilities, underscoring the growing threat from state-sponsored actors. ([la-cyber.com](https://www.la-cyber.com/Current-Threat-Data.php?id=2890&utm_source=openai))

**6. Insider Threats:**
Insider threats continue to pose significant risks, with employees or contractors intentionally or unintentionally compromising security. Regular training sessions on secure practices and implementing role-based access controls are essential measures to mitigate these risks. ([markets.businessinsider.com](https://markets.businessinsider.com/news/stocks/the-top-five-cyber-threats-impacting-every-industry-and-how-to-avoid-them-1033445156?utm_source=openai))

**7. Cloud Security Vulnerabilities:**
As organizations increasingly adopt cloud services, vulnerabilities in cloud security have become more prominent. Ensuring proper configuration, regular audits, and adherence to best practices are crucial to safeguarding data in the cloud. ([markets.businessinsider.com](https://markets.businessinsider.com/news/stocks/the-top-five-cyber-threats-impacting-every-industry-and-how-to-avoid-them-1033445156?utm_source=openai))

To effectively combat these threats, organizations should adopt a multi-faceted approach, including regular system updates, employee training, robust access controls, and collaboration with cybersecurity experts to stay ahead of emerging risks.


## Recent Developments in Cybersecurity:
- [Countries shore up their digital defenses as global tensions raise the threat of cyberwarfare](https://apnews.com/article/9eceaf30ddc984ed482f067db5dee405?utm_source=openai)
- [Criminals use AI in 'proxy' attacks for hostile powers, warns Europol](https://www.ft.com/content/755593c8-8614-4953-a4b2-09a0d2794684?utm_source=openai)
- [🤝 Adversaries team up](https://www.axios.com/newsletters/axios-codebook-21f72b20-f17c-11ef-aa27-e35ae5e39239?utm_source=openai) 

### Stateful Response

The OpenAI Responses API includes a stateful feature that enables continuity in interactions. 
By using the `response_id`, a conversation can persist across multiple queries, 
allowing users to refine or expand upon previous searches. This is particularly useful for iterative research, 
dynamic content generation, and applications that require follow-up queries based on prior responses.

In [20]:
fetched_response = client.responses.retrieve(response_id=web_search_response.id)
display(Markdown(fetched_response.output_text[:100]))

As of April 2025, the cyber threat landscape is increasingly complex, with several prominent threats

### Continue Query with Web Search

In [21]:
continue_query = 'find different news'

continue_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= continue_query,
    previous_response_id=web_search_response.id,
    tools=[
        {
            "type": "web_search"
        }
    ]
)

In [22]:
display(Markdown(continue_search_response.output_text))

As of April 2025, the cyber threat landscape continues to evolve, presenting new challenges across various sectors. Here are some of the latest developments:

**1. AI-Driven Organized Crime:**
Europol has raised alarms about organized crime groups leveraging artificial intelligence to enhance their operations. These groups use AI to craft multilingual messages, create realistic impersonations, and automate processes, complicating detection efforts. Notably, AI is being utilized to generate child sexual abuse material, and there are concerns about the potential emergence of fully autonomous AI-controlled criminal networks. ([reuters.com](https://www.reuters.com/world/europe/europol-warns-ai-driven-crime-threats-2025-03-18/?utm_source=openai))

**2. Deepfake Scams and Social Engineering:**
The advancement of deepfake technology has led to more sophisticated social engineering attacks. Cybercriminals are creating convincing videos and audio recordings to impersonate company executives or trusted contacts, deceiving employees into transferring funds or sharing confidential information. These scams exploit trust within organizations, making them particularly effective. ([boardoftrade.com](https://www.boardoftrade.com/news/article/the-top-5-emerging-cybersecurity-threats-how-to-prepare-your-business-in-2025?utm_source=openai))

**3. Supply Chain Vulnerabilities:**
Cybercriminals are increasingly targeting supply chains by infiltrating third-party vendors or service providers. By exploiting the weakest link in the supply chain, attackers can distribute malware through legitimate software updates or services, affecting numerous organizations and their customers. This tactic underscores the importance of comprehensive vendor security assessments. ([boardoftrade.com](https://www.boardoftrade.com/news/article/the-top-5-emerging-cybersecurity-threats-how-to-prepare-your-business-in-2025?utm_source=openai))

**4. Quantum Computing Risks:**
The advent of quantum computing poses a looming threat to traditional encryption methods. Quantum computers have the potential to break encryption standards currently considered secure, necessitating a transition to quantum-resistant cryptography to safeguard sensitive data. ([securityboulevard.com](https://securityboulevard.com/2025/01/key-cyber-threats-to-watch-in-2025/?utm_source=openai))

**5. Insider Threats:**
Insider threats continue to be a significant concern, with employees or contractors intentionally or unintentionally compromising security. Implementing identity and access management (IAM) and privileged access management (PAM) systems can help mitigate these risks by controlling and monitoring access to critical information. ([appliedtech.us](https://www.appliedtech.us/resource-hub/top-cyber-threats-to-prepare-for-in-2025/?utm_source=openai))

**6. Cloud Environment Vulnerabilities:**
As organizations increasingly rely on cloud computing, vulnerabilities in cloud environments have become more prominent. Attackers exploit misconfigured or outdated cloud setups to exfiltrate sensitive information. Adopting dedicated cloud security solutions, including identity and access management (IAM) and advanced encryption technologies, is crucial to protect against these threats. ([securityboulevard.com](https://securityboulevard.com/2025/01/key-cyber-threats-to-watch-in-2025/?utm_source=openai))

**7. Geopolitical Cyber Threats:**
Geopolitical tensions have led to an increase in state-sponsored cyberattacks targeting critical infrastructure. For instance, during the 2025 Asian Winter Games in Harbin, China accused the U.S. of cyberattacks on the event's information systems and other critical infrastructure within Heilongjiang province. ([en.wikipedia.org](https://en.wikipedia.org/wiki/Cyberattacks_by_country?utm_source=openai))

To effectively combat these evolving threats, organizations should adopt a multi-faceted approach, including regular system updates, employee training, robust access controls, and collaboration with cybersecurity experts to stay ahead of emerging risks.


## Recent Developments in Cybersecurity:
- [AI is turbocharging organized crime, EU police agency warns](https://apnews.com/article/846847536f6feb2bbb423943fd96e1f1?utm_source=openai)
- [Europol warns of AI-driven crime threats](https://www.reuters.com/world/europe/europol-warns-ai-driven-crime-threats-2025-03-18/?utm_source=openai)
- [Criminals use AI in 'proxy' attacks for hostile powers, warns Europol](https://www.ft.com/content/755593c8-8614-4953-a4b2-09a0d2794684?utm_source=openai) 

### Combining File Search and Web Search

This is an example of using file search to analyze private data and web search to retrieve public or the latest data. 
The Responses API allows developers to integrate these tools to enhance retrieval-augmented generation (RAG) applications. 
By combining file search with web search, users can leverage structured internal knowledge while also retrieving real-time 
information from external sources, ensuring comprehensive and up-to-date responses. 

In [23]:
combined_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= query,
    temperature = 0,
    instructions="Retrieve the results from the file search first, and use the web search tool to expand the results with news resources",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id],
    },
        {
            "type": "web_search"
        }
    ]
)

In [24]:
display(Markdown(combined_search_response.output_text))

Current cyber threats include a variety of sophisticated attacks and vulnerabilities. Some of the key threats are:

1. **Ransomware Attacks**: These attacks continue to be a significant threat, targeting various sectors including architecture and corporate environments.

2. **Phishing and Social Engineering**: Phishing remains a prevalent method for attackers to gain access to sensitive information. For example, Russia-linked groups have used phishing to target Ukraine.

3. **State-Sponsored Attacks**: There are ongoing threats from state-sponsored groups, such as those linked to China, targeting critical infrastructure like transportation.

4. **AI-Driven Threats**: Artificial Intelligence is being used both defensively and offensively, with AI-powered attacks becoming more common.

5. **Financial Crime in Virtual Assets**: The rise of digital currencies has led to new forms of financial crime, requiring updated strategies to combat these threats.

6. **Critical Infrastructure Vulnerabilities**: Cyber threats to infrastructure such as rail, ports, and airports pose risks to national security.

These threats highlight the need for robust cybersecurity measures, including AI-driven solutions, regular data backups, and comprehensive security strategies.