In [2]:
%pip install tiktoken openai --quiet --upgrade

Note: you may need to restart the kernel to use updated packages.


# Using Responses API with Manual Token Management

This notebook demonstrates how to use the OpenAI Responses API with `store=False` (i.e. without persistent state storage) while manually managing and counting tokens using the [tiktoken](https://github.com/openai/tiktoken) package.

We'll cover:

1. Sending a conversation turn using the Responses API with `store=False` and chaining subsequent messages using `previous_response_id`.
2. Counting tokens in a list of messages using a custom function `num_tokens_from_messages`.

Let's get started.

## 1. Using the Responses API with `store=False`

In the following example, we use the Responses API to send a conversation turn. By setting `store=False`, the conversation is not stored automatically.

In [None]:
from openai import OpenAI

In [1]:
MODEL = "gpt-4.1-mini"

In [1]:
# Create an OpenAI client instance
client = OpenAI(
    # Replace with your actual API key or use: api_key=os.environ.get("OPENAI_API_KEY")
    api_key="YOUR_API_KEY_HERE"
)

## Managing Tokens Manually using tiktoken and a Custom Function

The following function counts the number of tokens used by a list of messages. It selects an appropriate encoding based on the provided model name and counts tokens in each message field. This is particularly useful if you need to monitor token usage to stay within your model's context window or track cost.

Below is the custom function `num_tokens_from_messages` followed by an example of how to call it on a list of messages.

In [24]:
import tiktoken

def num_tokens_from_messages(messages, model="gpt-4o-mini-2024-07-18"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using o200k_base encoding.")
        encoding = tiktoken.get_encoding("o200k_base")
    
    if model in {
        "gpt-3.5-turbo-0125",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        "gpt-4o-mini-2024-07-18",
        "gpt-4o-2024-08-06"
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0125.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0125")
    elif "gpt-4o-mini" in model:
        print("Warning: gpt-4o-mini may update over time. Returning num tokens assuming gpt-4o-mini-2024-07-18.")
        return num_tokens_from_messages(messages, model="gpt-4o-mini-2024-07-18")
    elif "gpt-4o" in model:
        print("Warning: gpt-4o and gpt-4o-mini may update over time. Returning num tokens assuming gpt-4o-2024-08-06.")
        return num_tokens_from_messages(messages, model="gpt-4o-2024-08-06")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(f"num_tokens_from_messages() is not implemented for model {model}.")
    
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

## Example of Manual Token Management

In [26]:
article_headings = [
    "I. Introduction A. Definition of the 2008 Financial Crisis B. Overview of the Causes and Effects of the Crisis C. Importance of Understanding the Crisis",
    "II. Historical Background A. Brief History of the US Financial System B. The Creation of the Housing Bubble C. The Growth of the Subprime Mortgage Market",
    "III. Key Players in the Crisis A. Government Entities B. Financial Institutions C. Homeowners and Borrowers",
    "IV. Causes of the Crisis A. The Housing Bubble and Subprime Mortgages B. The Role of Investment Banks and Rating Agencies C. The Failure of Regulatory Agencies D. Deregulation of the Financial Industry",
    "V. The Domino Effect A. The Spread of the Crisis to the Global Financial System B. The Impact on the Real Economy C. The Economic Recession",
    "VI. Government Responses A. The Troubled Asset Relief Program (TARP) B. The American Recovery and Reinvestment Act C. The Dodd-Frank Wall Street Reform and Consumer Protection Act",
    "VII. Effects on Financial Institutions A. Bank Failures and Bailouts B. Stock Market Decline C. Credit Freeze",
    "VIII. Effects on Homeowners and Borrowers A. Foreclosures and Bankruptcies B. The Loss of Home Equity C. The Impact on Credit Scores",
    "IX. Effects on the Global Economy A. The Global Financial Crisis B. The Impact on Developing Countries C. The Role of International Organizations",
    "X. Criticisms and Controversies A. Bailouts for Financial Institutions B. Government Intervention in the Economy C. The Role of Wall Street in the Crisis",
    "XI. Lessons Learned A. The Need for Stronger Regulation B. The Importance of Transparency C. The Need for Better Risk Management",
    "XII. Reforms and Changes A. The Dodd-Frank Wall Street Reform and Consumer Protection Act B. Changes in Regulatory Agencies C. Changes in the Financial Industry",
    "XIII. Current Economic Situation A. Recovery from the Crisis B. Impact on the Job Market C. The Future of the US Economy",
    "XIV. Comparison to Previous Financial Crises A. The Great Depression B. The Savings and Loan Crisis C. The Dot-Com Bubble",
    "XV. Economic and Social Impacts A. The Widening Wealth Gap B. The Rise of Populist Movements C. The Long-Term Effects on the Economy",
    "XVI. The Role of Technology A. The Use of Technology in the Financial Industry B. The Impact of Technology on the Crisis C. The Future of the Financial Industry",
    "XVII. Conclusion A. Recap of the Causes and Effects of the Crisis B. The Importance of Learning from the Crisis C. Final Thoughts",
    "XVIII. References A. List of Sources B. Additional Reading C. Further Research",
    "XIX. Glossary A. Key Terms B. Definitions",
    "XX. Appendix A. Timeline of the Crisis B. Financial Statements of Key Players C. Statistical Data on the Crisis",
]

In [27]:
system_prompt = "You are a helpful assistant for a financial news website. You are writing a series of articles about the 2008 financial crisis. You have been given a list of headings for each article. You need to write a short paragraph for each heading. You can use the headings as a starting point for your writing.\n\n"
system_prompt += "All of the subheadings:\n"

# Set up the messages list:
messages = []

# Add all of the subheadings to the system prompt to give the model context:
for heading in article_headings:
    system_prompt += f"{heading}\n"

# Append the first developer/system message to the messages list:
messages.append({"role": "system", "content": system_prompt})

# This will ensure that if the token count goes over the limit, the last message will be removed, 
# to ensure that the token count is reduced as the chat history grows:
MAX_TOKEN_SIZE = 2048

In [28]:
# Loop over all of the headings and generate a chunk for each one
for heading in article_headings:
    
    # Add on a user prompt to the chat history object:
    messages.append(
        {"role": "user", "content": f"Write a very large paragraph about {heading}. Make it very long and detailed."}
    )

    # Tell ChatGPT to generate a response:
    response = client.responses.create(
        mode=MODEL,
        input=messages,
        store=False
    )

    # Update the conversation history with the assistant's response
    messages.append({"role": "assistant", "content": response.output_text})
    print("Current message count", len(messages))
    print("Current token count", num_tokens_from_messages(messages))

    # Whilst the Chat history object is more than 2048 tokens, remove the oldest non-system/developer message:
    while num_tokens_from_messages(messages, model='gpt-4o-mini') > 2048:
        
        # Find the index of the first message that is not a system or developer message:
        non_system_msg_index = next(
            (i for i, msg in enumerate(messages) if msg["role"] not in ["system", "developer"]), None
        )

        # If there is a non-system message, remove it:
        if non_system_msg_index is not None:
            messages.pop(non_system_msg_index)
        print("Removed a message to reduce token count!")

Current message count 3
Current token count 848
Current message count 5
Current token count 1023
Current message count 7
Current token count 1200
Current message count 9
Current token count 1398
Current message count 11
Current token count 1574
Current message count 13
Current token count 1785
Current message count 15
Current token count 1971
Current message count 17
Current token count 2157
Removed a message to reduce token count!
Removed a message to reduce token count!
Current message count 17
Current token count 2175
Removed a message to reduce token count!
Removed a message to reduce token count!
Current message count 17
Current token count 2216
Removed a message to reduce token count!
Removed a message to reduce token count!
Current message count 17
Current token count 2221
Removed a message to reduce token count!
Removed a message to reduce token count!
Current message count 17
Current token count 2255
Removed a message to reduce token count!
Removed a message to reduce token co

## Summary

- We used the Responses API with `store=False` to send conversation turns and chain them manually.
- We defined a custom function `num_tokens_from_messages` to count tokens from a list of messages with tiktoken.

This approach gives you detailed control over conversation state and token management—useful for fine‑tuning agent workflows and staying within token limits.