# OpenAI Responses API: Advanced Tweet Analysis with File & Web Search Integration

## What is the OpenAI Responses API?

The Responses API is a new API released in March 2025. It is a combination of the traditional 
Chat Completions API and the Assistants API, providing support for:

- **Traditional Chat Completions:** Facilitates seamless conversational AI experiences.
- **Web Search:** Enables real-time information retrieval from the internet.
- **File Search:** Allows searching within files for relevant data.

Accordingly, the Assistants API will be retired in 2026. 

> **For new users, OpenAI recommends using the Responses API instead of the Chat Completions API to leverage its expanded capabilities.**

For a comprehensive comparison between the Responses API and the Chat Completions API, refer to the official OpenAI documentation: 
[Responses vs. Chat Completions](https://platform.openai.com/docs/guides/responses-vs-chat-completions).

## Summary of This Notebook
This notebook provides a hands-on guide for using the **OpenAI Responses API** to analyze tweets. 
It covers essential techniques such as:

- **Connecting to a MongoDB database** to store and retrieve tweets.
- **Extracting tweets** and converting them into a structured format for further analysis.
- **Creating a vector store** and uploading tweets for semantic search.
- **Using file search** to analyze private datasets.
- **Performing web search** to retrieve the latest public information.
- **Utilizing stateful responses** to maintain conversation context.
- **Combining file and web search** to enhance retrieval-augmented generation (RAG) applications.

By the end of this notebook, users will be able to integrate OpenAI's Responses API for efficient data retrieval 
and analysis of structured and unstructured data.

## Install Required Libraries
To use the OpenAI Responses API and interact with a MongoDB database, we need to install the following libraries:

- **`openai`**: Provides access to OpenAI's APIs, including the Responses API
- **`pymongo`**: A Python driver for MongoDB to store and retrieve tweets.

In [1]:
pip install openai pymongo -q

Note: you may need to restart the kernel to use updated packages.


## Import Required Libraries

In [2]:
from IPython.display import Markdown, display
import boto3
from botocore.exceptions import ClientError
import json
import io

## Retrieve Secrets from AWS Secrets Manager

In [3]:
def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

## Connect to MongoDB

In [4]:
import pymongo
from pymongo import MongoClient
mongodb_connect = get_secret('mongodb')['connection_string']

mongo_client = MongoClient(mongodb_connect)
db = mongo_client.demo # use or create a database named demo
tweet_collection = db.tweet_collection #use or create a collection named tweet_collection
# tweet_collection.create_index([("tweet.id", pymongo.ASCENDING)],unique = True) # make sure the collected tweets are unique

## Extract Tweets from MongoDB

In [5]:
filter={

    
}
project={
    'tweet.text': 1,
    '_id':0
}
#rename the client to mongo_client
result = mongo_client['demo']['tweet_collection'].find(
  filter=filter,
  projection=project
)

After retrieving tweets from MongoDB, we convert the query result into a list format for easier processing.
The data is then serialized into a JSON-formatted string, ensuring it can be properly stored and shared across different services.
Using `io.BytesIO`, we create an in-memory JSON file, eliminating the need for disk writes.
This approach is particularly useful for applications that require temporary file storage, such as uploading datasets
to OpenAI's file search API or cloud storage for further analysis.

In [6]:
result_list = list(result)

# Convert result list to JSON string
json_data = json.dumps(result_list, default=str, indent=4)

# Create an in-memory JSON file
json_bytes = io.BytesIO(json_data.encode("utf-8"))
json_bytes.name = "tweet.json" 

In [7]:
print('Number of tweets: ',len(result_list))

Number of tweets:  299


## Initialize OpenAI Client

In [8]:
from openai import OpenAI
openai_api_key  = get_secret('openai')['api_key']

client = OpenAI(api_key=openai_api_key)

## File Search API

### Introduction to File Search
File search API enables efficient retrieval of relevant information 
from uploaded files by leveraging vector-based indexing. This feature is particularly useful 
for searching large datasets, extracting insights, and improving retrieval-augmented generation (RAG) applications.

Unlike traditional keyword-based searches, the Responses API uses embeddings 
to identify semantically relevant content, making it ideal for analyzing structured 
and unstructured text data (OpenAI, 2025).

For more details, visit the official OpenAI documentation: 
[File Search in Responses API](https://platform.openai.com/docs/guides/tools-file-search).

### Create a Vector Store

In [9]:
vector_store = client.vector_stores.create(
    name="tweet_base"
)
vector_store_id = vector_store.id
# print(vector_store_id)

### Upload Tweets File

In [10]:
file = client.files.create(
            file=json_bytes,
            purpose="assistants",)

file_id = file.id
# print(file_id)

### Attach File to Vector Store

In [11]:
attach_status =client.vector_stores.files.create(
    vector_store_id=vector_store_id,
    file_id=file_id
            )

# print(attach_status.id)

### Query the Vector Store

In [16]:
query = "2020 election"

In [17]:
search_results = client.vector_stores.search(
    vector_store_id=vector_store_id,
    query=query
)

for result in search_results.data[:5]:
    print(result.content[0].text[:100] + '\n Relevant score: ' + str(result.score))

https://t.co/C2JWmyQU3N"
        }
    },
    {
        "tweet": {
            "text": "RT @BrandtRo
 Relevant score: 0.6263302536258392
{
        "tweet": {
            "text": "RT @MZHemingway: 4 Years After The Biden Laptop Coverup, M
 Relevant score: 0.605885158622891
5 presidential election could be won or lost in Pennsylvania, and a recent Cygnal poll in the state\
 Relevant score: 0.48105299255569217
\u201cImpromptu concert.\u201d \ud83d\udd95\u2026"
        }
    },
    {
        "tweet": {
       
 Relevant score: 0.45444984537772215
}
    },
    {
        "tweet": {
            "text": "RT @SenJohnKennedy: Three weeks before the la
 Relevant score: 0.4446398144236681


## OpenAI Response API

### Simple Response

In [18]:
simple_response = client.responses.create(
  model="gpt-4o",
  input=[
      {
          "role": "user",
          "content": query
      }
  ]
)

In [19]:
display(Markdown(simple_response.output_text))

The 2020 United States presidential election took place on November 3, 2020. The main candidates were the incumbent president, Donald Trump, from the Republican Party, and Joe Biden, the former vice president, from the Democratic Party.

Joe Biden won the election, becoming the 46th president of the United States. He received 306 electoral votes compared to Donald Trump’s 232. Biden also won the popular vote, receiving over 81 million votes, which was about 51.3% of the total, while Trump received over 74 million votes, about 46.8%.

Kamala Harris, Biden's running mate, made history by becoming the first female vice president, as well as the first Black and South Asian vice president in U.S. history. The election saw a very high voter turnout and was conducted during the COVID-19 pandemic, which led to a significant increase in mail-in voting.

### File Search Response

In [20]:

file_search_response = client.responses.create(
    input= query,
    model="gpt-4o",
    temperature = 0,
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id],
    }]
)

In [21]:
display(Markdown(file_search_response.output_text))


The 2020 election was a highly contentious event, with numerous claims and discussions surrounding its legitimacy. Some tweets suggest that there were allegations of election interference and fraud, with accusations directed at both political parties. There were also discussions about the role of media and legal battles related to the election results.

## Web Search API

### Introduction to Web Search
The OpenAI Web Search tool allows models to retrieve real-time information from the internet. 
This capability is particularly useful for obtaining up-to-date data, fact-checking, and expanding knowledge 
without relying solely on pre-trained information. 

By leveraging OpenAI's web search functionality, the Responses API can fetch external data 
and provide accurate, relevant results in real time (OpenAI, 2025). 
This feature enhances applications that require the latest insights, such as news aggregation, research, 
or dynamic content generation.

For more details, visit the official OpenAI documentation: 
[Web Search in Responses API](https://platform.openai.com/docs/guides/tools-web-search).

### Perform Web Search

In [22]:
web_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= query,
    tools=[
        {
            "type": "web_search"
        }
    ]
)

In [23]:
display(Markdown(web_search_response.output_text))

The 2020 United States presidential election was held on November 3, 2020. The main candidates were:

- **Joe Biden**, the Democratic nominee, who was the former Vice President under Barack Obama and a long-serving Senator from Delaware.
- **Donald Trump**, the Republican nominee, who was the incumbent President.

Joe Biden won the election with 306 electoral votes compared to Donald Trump's 232. Biden received over 81 million votes, the most in U.S. history, and was inaugurated as the 46th President of the United States on January 20, 2021. Kamala Harris became Vice President, making history as the first woman, first Black, and first South Asian Vice President.

### Stateful Response

The OpenAI Responses API includes a stateful feature that enables continuity in interactions. 
By using the `response_id`, a conversation can persist across multiple queries, 
allowing users to refine or expand upon previous searches. This is particularly useful for iterative research, 
dynamic content generation, and applications that require follow-up queries based on prior responses.

In [24]:
fetched_response = client.responses.retrieve(response_id=web_search_response.id)
display(Markdown(fetched_response.output_text[:100]))

The 2020 United States presidential election was held on November 3, 2020. The main candidates were:

### Continue Query with Web Search

In [25]:
continue_query = 'find different news'

continue_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= continue_query,
    previous_response_id=web_search_response.id,
    tools=[
        {
            "type": "web_search"
        }
    ]
)

In [26]:
display(Markdown(continue_search_response.output_text))

The 2020 U.S. presidential election was marked by significant events and developments:

**Election Results and Aftermath**

- Joe Biden secured the presidency by winning key battleground states, including Pennsylvania, Michigan, and Wisconsin, thereby rebuilding the "Blue Wall" that had been pivotal in previous elections. ([time.com](https://time.com/5907674/joe-biden-wins-2020-election/?utm_source=openai))

- Despite numerous legal challenges and allegations of widespread voter fraud by the Trump campaign, then-Attorney General Bill Barr stated in December 2020 that the Department of Justice found no evidence of fraud that would change the election outcome. ([axios.com](https://www.axios.com/2020/12/01/barr-voter-fraud-trump?utm_source=openai))

**Post-Election Investigations**

- In January 2025, a report by U.S. Special Counsel Jack Smith concluded that Donald Trump engaged in a significant criminal effort to overturn the 2020 election results. The report detailed actions such as pressuring state officials and promoting false claims of voter fraud. ([reuters.com](https://www.reuters.com/world/us/us-justice-dept-releases-report-trump-attempt-overturn-2020-election-2025-01-14/?utm_source=openai))

- An audio recording from November 2020 revealed that Trump campaign staff in Wisconsin acknowledged their electoral defeat but planned to propagate claims of election fraud. This strategy was intended to "fan the flame" of misinformation despite internal recognition of the loss. ([apnews.com](https://apnews.com/article/166e3f52e42cc45cff8f28b75dd23cfc?utm_source=openai))

**Public and Media Reactions**

- TIME photographers documented the tense period following the election, capturing scenes of anxiety, protests, and celebrations across the country as votes were counted and results were announced. ([time.com](https://time.com/5909201/2020-election-photographs/?utm_source=openai))


## Key Developments in the Aftermath of the 2020 U.S. Presidential Election:
- [Special counsel report found Trump engaged in 'criminal effort' to overturn 2020 election](https://www.reuters.com/world/us/us-justice-dept-releases-report-trump-attempt-overturn-2020-election-2025-01-14/?utm_source=openai)
- [Trump campaign staff on 2020 election lies: 'fan the flame'](https://apnews.com/article/166e3f52e42cc45cff8f28b75dd23cfc?utm_source=openai)
- [Tape reveals Donald Trump pressured Michigan officials not to certify 2020 vote, a new report says](https://apnews.com/article/05d0d0d7a0adbec796caecdc6862588b?utm_source=openai) 

### Combining File Search and Web Search

This is an example of using file search to analyze private data and web search to retrieve public or the latest data. 
The Responses API allows developers to integrate these tools to enhance retrieval-augmented generation (RAG) applications. 
By combining file search with web search, users can leverage structured internal knowledge while also retrieving real-time 
information from external sources, ensuring comprehensive and up-to-date responses. 

In [27]:
combined_search_response = client.responses.create(
    model="gpt-4o",  # or another supported model
    input= query,
    temperature = 0,
    instructions="Retrieve the results from the file search first, and use the web search tool to expand the results with news resources",
    tools=[{
        "type": "file_search",
        "vector_store_ids": [vector_store_id],
    },
        {
            "type": "web_search"
        }
    ]
)

In [28]:
display(Markdown(combined_search_response.output_text))

The 2020 election was a highly contentious event with numerous claims and discussions surrounding its integrity and outcome. Some tweets from the file suggest allegations of election interference and fraud, with various individuals expressing skepticism about the results. There are also mentions of legal battles and political strategies leading up to the election.

If you need more detailed information or specific aspects of the 2020 election, feel free to ask!