In [1]:
%pip install openai --quiet

Note: you may need to restart the kernel to use updated packages.


 - streaming, basically rather than waiting for the entire response, you can use OpenAI streaming to get all of the changes that are happening in real time inside of Open AI's API and get those events streamed directly to you via your client so that you can give that to users immediately.
  - you can use this to optimize for latency, or if you want to show the user the response as it's being generated.
    - for example, if you have a long completion, you can start showing the user the first part of the completion while the rest is still being generated.
        - this is useful for chat applications, or any application where you want to show the user the response as soon as possible.
        - you can also use this to show the user the response as it's being generated, which can be useful for debugging or for understanding how the model is generating the response.

- this is essentially what streaming architecture is where rather than the entire response coming back in one go and waiting for the end response, we can stream the changes that happen in real time and increment those delta changes to become a final output.

# Streaming API Responses

In this notebook, we demonstrate how to stream responses from the OpenAI API using the Responses API. 

By default, the API generates the entire output before returning it. Using streaming, you can begin processing the response as the model generates output, which is useful for long completions or when you want to display partial results immediately.

In our example, we'll ask the model to recite the tongue twister "Peter Piper picked a peck of pickled peppers" five times. We then loop through the streamed events and print each event, allowing you to see how the response is built incrementally.

### Key Streaming Event Types (for reference):

- `response.created`
- `response.output_text.delta`
- `response.completed`
- `error`

For a full list, refer to the API reference.

## Code Example: Streaming a Tongue Twister

The following code demonstrates how to enable streaming by setting `stream=True` in your request. The example uses a different prompt to show how events are emitted and processed on-the-fly.

In [2]:
from openai import OpenAI

In [3]:
MODEL = "gpt-4.1-mini"

In [4]:
# Create an OpenAI client instance
client = OpenAI(
    # Replace with your actual API key or use: api_key=os.environ.get("OPENAI_API_KEY")
    api_key="YOUR_API_KEY_HERE"
)

In [10]:
# Request the model to recite the tongue twister five times using streaming
stream = client.responses.create(
    model=MODEL,
    input=[{
        "role": "user",
        "content": "Recite 'Peter Piper picked a peck of pickled peppers' five times in a row."
    }],
    stream=True
)

# Iterate over streaming events and print details
for event in stream:
    # Depending on the event type, you can handle it accordingly
    if hasattr(event, 'type'):
        print(f"Event Type: {event.type}")
    # If the event includes a delta with content, print it
    if hasattr(event, 'delta') and event.delta:
        print(f"Delta Content: {event.delta}")
    # Print a separator for clarity
    print("-------------------------")

Event Type: response.created
-------------------------
Event Type: response.in_progress
-------------------------
Event Type: response.output_item.added
-------------------------
Event Type: response.content_part.added
-------------------------
Event Type: response.output_text.delta
Delta Content: Sure
-------------------------
Event Type: response.output_text.delta
Delta Content: !
-------------------------
Event Type: response.output_text.delta
Delta Content:  Here
-------------------------
Event Type: response.output_text.delta
Delta Content:  it
-------------------------
Event Type: response.output_text.delta
Delta Content:  is
-------------------------
Event Type: response.output_text.delta
Delta Content: :


-------------------------
Event Type: response.output_text.delta
Delta Content: Peter
-------------------------
Event Type: response.output_text.delta
Delta Content:  Piper
-------------------------
Event Type: response.output_text.delta
Delta Content:  picked
---------------

In [11]:
# Request the model to recite the tongue twister five times using streaming
stream = client.responses.create(
    model=MODEL,
    input=[{
        "role": "user",
        "content": "Recite 'Peter Piper picked a peck of pickled peppers' five times in a row."
    }],
    stream=True
)

text = ''

# Iterate over streaming events and print details
for event in stream:
    # Depending on the event type, you can handle it accordingly
    if event.type == 'response.output_text.delta':
        text += event.delta
        print(text)

Sure
Sure!
Sure! Here
Sure! Here it
Sure! Here it is
Sure! Here it is five
Sure! Here it is five times
Sure! Here it is five times in
Sure! Here it is five times in a
Sure! Here it is five times in a row
Sure! Here it is five times in a row:


Sure! Here it is five times in a row:

Peter
Sure! Here it is five times in a row:

Peter Piper
Sure! Here it is five times in a row:

Peter Piper picked
Sure! Here it is five times in a row:

Peter Piper picked a
Sure! Here it is five times in a row:

Peter Piper picked a pe
Sure! Here it is five times in a row:

Peter Piper picked a peck
Sure! Here it is five times in a row:

Peter Piper picked a peck of
Sure! Here it is five times in a row:

Peter Piper picked a peck of pick
Sure! Here it is five times in a row:

Peter Piper picked a peck of pickled
Sure! Here it is five times in a row:

Peter Piper picked a peck of pickled peppers
Sure! Here it is five times in a row:

Peter Piper picked a peck of pickled peppers.
Sure! Here it is five times 

## Summary

- **Streaming Enabled:** By setting `stream=True` in the API request, we stream the model's output incrementally.
- **Event Handling:** The code loops over each event, printing its type and any text deltas as they are generated.
- **Example Prompt:** We used a tongue twister prompt as a different example to show that you can stream various types of outputs.

This notebook provides a basic starting point for leveraging streaming responses to optimize for latency or real-time display of model output.