# **Using OpenAI API with Python**

## **Securely Handling Sensitive Data in Colab**

To use OpenAI's GPT models the API key needs to be set up.

Before being able to use the Responses API
endpoint we need to authenticate ourselves using the **API key**. Having finished the preparatory assignments you should have your own API key which you can use during the course as well as the post assignment.

When working with sensitive information like API keys or passwords in Google Colab, it's crucial to handle data securely. Two common approaches for this are using **Colab's Secrets Manager**, which stores and retrieves secrets without exposing them in the notebook, and `getpass`, a Python function that securely prompts users to input secrets during runtime without showing them. Both methods help ensure your sensitive data remains protected.

### **Option 1: Using Google Colab Secrets Manager**

Google Colab provides an integrated Secrets Manager, allowing you to securely store and retrieve sensitive information such as API keys or authentication tokens without hardcoding them in your notebook.

**Step 1: Store Your Secret in Colab**

1.   In the Colab notebook, navigate to the left sidebar.
2.   Click on the **“Secrets”** tab (represented by a key icon).
3. Add your secret by clicking on **“+ Add a new secret”**. For example, you might add a secret called `OPENAI_API_KEY` with the value of your API key.

**Step 2: Access the Secret in Your Notebook**

Once you've added a secret, you can easily access it from within the notebook.

`OPENAI_API_KEY` is the name of the secret you've added in the Colab Secrets Manager. It will be retrieved securely without having to expose the key in the notebook.


In [None]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

### **Option 2: Using Python's `getpass` for Secret Input**

Alternatively, the `getpass` module allows you to securely input secrets (e.g., passwords or API keys) during runtime, making sure they're not visible in the notebook output.

Here, the `getpass.getpass` function prompts the user to enter the secret without displaying it as they type, ensuring that sensitive data isn't exposed.

In [None]:
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass()

## **Installing OpenAI**

In this notebook you will learn how to use OpenAI API with Python for various tasks such as **text generation**, **speech-to-text**,  **text-to-speech**, and  **image generation**.

In [None]:
!pip install -q openai

## **Text Generation**

OpenAI's text generation models (often called large language models, or LLMs) generate text in response to input instructions. These models can understand and generate natural language and code, and can be guided using carefully designed prompts.

Text generation is accessed through the **Responses API**, which is the unified API for generating text and other outputs from OpenAI models.

For an overview of available models, see:  
https://platform.openai.com/docs/models





### **Prompts and Inputs**

Models generate text in response to **inputs**, often referred to as *prompts*. Designing a prompt is essentially how you “program” a language model.

A prompt may include:

- clear instructions describing the task  
- background context about the topic or audience  
- constraints or formatting requirements  
- examples of desired behavior  

Well-designed prompts significantly improve the quality and reliability of the generated output.



### **Using the Responses API**

To generate text, you send an input to a model and receive the model's generated output in response.


In [None]:
from openai import OpenAI
client = OpenAI()

#### Example: Simple instruction

In [None]:
#Example with a simple instruction

#Example models to try
#
#Frontier models:
#gpt-5.2
#gpt-5-mini
#gpt-5-nano

response = client.responses.create(
    model="gpt-5-nano",
    input="Explain generative AI to a 6 year old."
)

The generated text can be accessed using `response.output_text`.


In [None]:
print(response.output_text)

#### Optional parameters

Some models support optional parameters that influence how text is generated.

- **`temperature`**  
  Controls the randomness of generated text.
  
  - Lower values → more focused and deterministic responses  
  - Higher values → more creative and varied responses  

> **Note:** For some reasoning-focused models, the `temperature` parameter may have limited or no effect.


#### Structured input (optional)

For more complex prompts, you can provide structured input using message-like objects. This can help organize instructions, but is not required.

Each message may include:
- **`role`** (for example, `system` or `user`)
- **`content`** (the text of the message)

Roles are optional and are mainly used for clarity in longer or multi-step prompts.


In [None]:
response = client.responses.create(
    model="gpt-4.1-mini",
    input=[
        {"role": "system", "content": "You explain concepts clearly and simply."},
        {"role": "user", "content": "Explain generative AI in simple terms."}
    ]
)

print(response.output_text)

You can make OpenAI output (such as ChatGPT responses) more readable in a Jupyter Notebook by displaying it as formatted Markdown using IPython's `display` functions.


In [None]:
from IPython.display import Markdown

In [None]:
display(Markdown(response.output_text))

### **Prompting Guidelines**

The following guidelines help you write effective prompts for language models.  
They apply regardless of whether you use plain text input or structured messages with roles.




#### 1. Be clear and specific (most important)

Avoid vague instructions. Clearly state what you want the model to do, including any constraints or edge cases.

**Example:**  
“Write a Python function that calculates the Fibonacci sequence. Handle negative inputs and include explanatory comments.”

---

#### 2. Define the desired output format

Tell the model how the response should be structured. This improves readability and makes outputs easier to reuse.

**Example:**  
“Structure the response with a short introduction, three bullet points, and a one-sentence conclusion.”

---

#### 3. Limit the scope of the task

Narrow prompts lead to more relevant and focused answers.

**Example:**  
“Recommend three family-friendly activities in Paris. Include a short description for each.”

---

#### 4. Provide relevant context

Give background information about the audience, domain, or assumptions when needed.

**Example:**  
“Explain this concept for an audience with no prior programming experience.”

---

#### 5. Specify tone and style

Explicitly describe the desired tone, level of formality, or writing style.

**Example:**  
“Use a friendly, conversational tone and simple language.”

---



#### 6. Use examples when helpful

Providing examples of desired input–output behavior can improve results, especially for classification or formatting tasks.

(This technique is known as *few-shot prompting* and is covered in the next section.)

---

#### **Key takeaway**

Clear instructions matter more than prompt structure.  
Roles and message formats can help organize prompts, but they are optional.

In [None]:
# This example demonstrates the most important concept:
# clear and specific instructions lead to better outputs.

response = client.responses.create(
    model="gpt-4.1",
    input=(
        "Explain generative AI to a 6-year-old using a fun analogy. "
        "Avoid technical terms and keep the explanation under 150 words."
    )
)

# The generated text can be accessed directly via response.output_text
display(Markdown(response.output_text))

In [None]:
# This example shows how output structure and tone
# can be controlled directly through instructions.

response = client.responses.create(
    model="gpt-4.1",
    input=(
        "Write a short, lighthearted article explaining why hiking is great for beginners.\n\n"
        "Structure the response as:\n"
        "- A 2-sentence introduction\n"
        "- Three bullet points listing benefits\n"
        "- A one-sentence conclusion\n\n"
        "Use simple language and friendly humor."
    )
)

display(Markdown(response.output_text))


In [None]:
# This example demonstrates how to tailor responses
# to a specific audience using instructions alone.

response = client.responses.create(
    model="gpt-4.1",
    input=(
        "Explain the difference between ML APIs (e.g., Google Cloud Vision, ChatGPT) "
        "and open-source, off-the-shelf pre-trained models.\n\n"
        "Target audience: executive MBA students.\n"
        "Use business-relevant examples and keep the explanation concise."
    )
)

display(Markdown(response.output_text))


In [None]:
# Roles are used here to separate background instructions
# from the specific task. This is optional and mainly
# improves readability.

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "system",
            "content": (
                "Explain complex topics in simple, child-friendly language. "
                "Use fun analogies and avoid technical jargon."
            )
        },
        {
            "role": "user",
            "content": "Explain generative AI to a 6-year-old."
        }
    ]
)

display(Markdown(response.output_text))


#### **Few-shot Prompting**

Few-shot prompting is a technique where you include a small number of examples (typically 2-5) directly in the prompt to demonstrate the desired format, style, or behavior for a task.

Each example consists of an input and the corresponding expected output. By observing these input-output pairs, the model can infer the underlying pattern and apply it to new, unseen inputs—without explicitly programming the rules.

Few-shot prompting is especially useful for tasks such as classification, translation, rewriting, and style transformation, where showing examples is often more effective than describing the logic in words.




In the following examples, we use few-shot prompting to demonstrate how examples can guide the model's behavior.


In [None]:
# Few-shot prompting example:
# The prompt includes a small number of example input–output pairs
# to demonstrate how corporate jargon should be translated into
# simple, plain English. These examples act as instructions that
# guide the model’s behavior.

response = client.responses.create(
    model="gpt-4.1",
    input=(
        "Translate corporate jargon into plain English."
        "Examples:"
        "Corporate: New synergies will help drive top-line growth."
        "Plain English: Things working well together will increase revenue."
        "Corporate: Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."
        "Plain English: Let's talk later when we're less busy about how to do better.\n\n"
        "Corporate: This late pivot means we don't have time to boil the ocean for the client deliverable."
        "Plain English:"
    ),
    temperature=0
)

print(response.output_text)


In [None]:
# Few-shot classification example:
# Example title–category pairs are provided inline to show
# how sports-related titles should be classified.

response = client.responses.create(
    model="gpt-4.1",
    input=(
        "Classify sports-related titles into categories. "
        "Examples: "
        "Title: Paris auf der Saslong nicht zu biegen. Category: Ski Alpin. "
        "Title: Kobayashi geht in Führung. Category: Skispringen. "
        "Title: Hütter fährt in Abfahrt aufs Podest. Category: Ski Alpin. "
        "Title: Schweizer Freudentag: Flury brilliert vor Hählen. Category: Ski Alpin. "
        "Title: Arsenal unterliegt Tottenham im Derby. Category: Fussball. "
        "Title: Seoanes Gladbach muss sich mit Remis begnügen. Category:"
    ),
    temperature=0,
)

print(response.output_text)


### **Customizing the Responses API for Targeted Tasks**


In this section, we'll customize model behavior for targeted tasks by creating **small helper functions**. Each helper function uses:

- **static instructions** (the reusable part of the prompt), and  
- **dynamic input** (the user-provided text passed as a parameter).

This pattern represents a foundational way to customize model behavior.  Higher-level tools such as prompt templates and agent frameworks build on the same idea by abstracting static instructions and dynamic inputs.

> In Lab07, we will build on this concept using LangChain Prompt Templates to define reusable prompts with placeholders for dynamic input.


#### Example: Text Translation (English → German)

The function below keeps the translation instructions fixed while passing the text to translate dynamically.

In [None]:
# The prompt is built as a formatted string:
# static instructions define the task, and the variable `text`
# is inserted dynamically using a Python f-string.

# Example:
# If text = "My name is Barbara.",
# the line
# f"English: {text}" becomes "English: My name is Barbara."


def translateFromEnglishToGerman(text):
    response = client.responses.create(
        model="gpt-4.1",
        input=(
          "Translate the following sentence from English to German. "
          "Return only the German translation.\n\n"
          f"English: {text}\n"
          "German:"
        ),
        temperature=0.2,
    )
    return response.output_text


In [None]:
translatedText = translateFromEnglishToGerman("My name is Barbara. What is yours?")

In [None]:
print(translatedText)

#### Example: Image Prompt Generation

The function below turns a short idea into a richer image-generation prompt. The instructions are fixed, and the user’s idea is passed in dynamically.


In [None]:
def createImagePrompt(text):
    response = client.responses.create(
        model="gpt-4.1",
        input=(
            "You are a skilled prompt generator. Rewrite the user's idea into a vivid, "
            "high-quality image-generation prompt. Include subject, environment, lighting, "
            "mood, composition, and style. Keep it to 1–2 sentences.\n\n"
            f"Idea: {text}\n"
            "Image prompt:"
        )
    )
    return response.output_text

In [None]:
imagePrompt = createImagePrompt("nice and cinematic mountain ranges")

In [None]:
display(Markdown(imagePrompt))

## **Text to Speech Generation**

OpenAI provides text-to-speech (TTS) capabilities that allow you to convert generated text into spoken audio. This is useful for applications such as:

- narrating written content
- producing spoken audio in multiple languages
- generating audio files programmatically

Text-to-speech models support multiple built-in voices, such as  
`alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, and `shimmer`.

For an overview of supported languages and voices, see:  
https://platform.openai.com/docs/guides/text-to-speech


So far, we have generated and transformed text. We can now use this text as input to generate spoken audio.


### **Audio Generation Example**

In [None]:
# This function converts text into spoken audio and streams
# the generated speech directly into an MP3 file.

def generateTextToSpeech(text):
    speech_file_path = "speech.mp3"
    with client.audio.speech.with_streaming_response.create(
        model="gpt-4o-mini-tts",
        voice="coral",
        input=text
    ) as response:
        response.stream_to_file(speech_file_path)
        return speech_file_path


In [None]:
import IPython.display as ipd
fileName=generateTextToSpeech("Today is a wonderful day to build something people love!")
ipd.Audio(filename=fileName)

The following example combines text translation and text-to-speech into a simple end-to-end pipeline.


In [None]:
# Simple pipeline: text translation (reusing the functions we created earlier) followed by speech generation

translatedText = translateFromEnglishToGerman(
    "My name is Barbara. What is yours?"
)

fileName = generateTextToSpeech(translatedText)
ipd.Audio(filename=fileName)


## **Speech to Text (Audio API)**

OpenAI provides speech-to-text capabilities that allow you to transcribe spoken audio into text or translate it into English. These capabilities are based on the Whisper model.

Speech-to-text can be used to:
- transcribe audio in its original language
- translate and transcribe audio into English

Supported audio formats include `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, and `webm`.  
Uploaded files are currently limited to 25 MB.

### **Transcription Example**

In [None]:
def transcribeAudio(fileName):
  with open(fileName, "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
      model="whisper-1",
      file=audio_file,
      response_format="text"
  )
  return transcript

You will now download and transcribe a short news podcast.


In [None]:
!wget -O Podcast.mp3 https://download-media.srf.ch/world/audio/4x4_Podcast_radio/2025/01/4x4_Podcast_radio_AUDI20250128_NR_0038_3c81668cc4664fff9a26020d4eb47f0a.mp3


In [None]:
# The file name which is passed as a parameter to the function transcribeAudio needs to correspond to the file you have uploaded
transcribedAudio= transcribeAudio("Podcast.mp3")
display(Markdown(transcribedAudio))

### **Summarizing the transcript**

 We can now use the transcript as input to generate summaries using a text generation model.



In [None]:
response = client.responses.create(
    model="gpt-4.1",
    input=(
        "Create a bullet list of the different news topics covered in the following "
        "podcast transcript. Write in German and use 1–2 sentences per bullet point.\n\n"
        f"Transcript:\n{transcribedAudio}"
    )
)

summary = response.output_text
display(Markdown(summary))


### **Focused summarization (topic filtering)**

The next example creates a focused summary, restricted to specific topics.


In [None]:
response = client.responses.create(
    model="gpt-4.1",
    input=(
        "Summarize the following podcast transcript in German, focusing only on topics "
        "related to artificial intelligence (KI). Limit the summary to a maximum of "
        "20 sentences.\n\n"
        f"Transcript:\n{transcribedAudio}"
    )
)

summary = response.output_text
display(Markdown(summary))


### **From text back to speech**

We can reuse the summary as input for text-to-speech generation.


In [None]:
fileName=generateTextToSpeech(summary)
ipd.Audio(filename=fileName)

### **Transcription with translation**

If the original language is not suitable, audio can be transcribed and translated into English.


In [None]:
def transcribeAndTranslateAudio(fileName):
    with open(fileName, "rb") as audio_file:
        transcript = client.audio.translations.create(
            model="whisper-1",
            file=audio_file,
            response_format="text"
        )
    return transcript

In [None]:
display(Markdown(transcribeAndTranslateAudio("Podcast.mp3")))

### **Working with longer audio files**

Whisper supports audio files up to 25 MB. Longer audio files must be split into smaller segments before transcription.


In [None]:
!pip install -q pydub

In [None]:
# code adapted from: https://platform.openai.com/docs/guides/speech-to-text/longer-inputs

from pydub import AudioSegment

def segmentAudio(audioFile):
  audio = AudioSegment.from_mp3(audioFile)
  # PyDub handles time in milliseconds
  two_minutes = 2 * 60 * 1000
  segment = audio[:two_minutes]
  output_file = "TwoMinutes_" + audioFile
  segment.export(output_file, format="mp3")
  return output_file

You will be asked to upload a file to be transcribed.

To explore the segmentation capabilities of the Audio API you can experiment with one longer audio files such as https://www.srf.ch/audio/tagesgespraech.

In [None]:
!wget -O Tagesgespraech.mp3 "https://download-media.srf.ch/world/audio/Tagesgespraech_radio/2025/01/Tagesgespraech_radio_AUDI20250118_NR_0030_bf244ffb90af4470abd2cebd91107e91.mp3?d=ap&assetId=641aa6a1-597f-3c50-bce5-a8e0193e76cc"

We can then play the split audio file.

In [None]:
twoMinuteFile = segmentAudio("Tagesgespraech.mp3")
ipd.Audio(filename=twoMinuteFile)

.. and even translate it to another language.

In [None]:
display(Markdown(transcribeAndTranslateAudio(twoMinuteFile)))

To experiment a bit more with Whisper visit [here](https://github.com/openai/openai-cookbook/blob/main/examples/Whisper_processing_guide.ipynb). Moreover, if you want to learn more on prompting with Whisper visit [here](https://github.com/openai/openai-cookbook/blob/main/examples/Whisper_prompting_guide.ipynb).

## **Image Generation**

OpenAI models can generate images directly from text prompts. Image generation allows you to create visual content based on natural language descriptions, making it useful for tasks such as illustration, design ideation, and creative exploration.

Image generation is driven by a text prompt that describes the desired scene, style, and visual details.



### **Creating Images from Text**

OpenAI provides image-generation models that create images from text prompts. Depending on your account configuration, different image-generation models may be available.

To keep this notebook accessible to everyone, we present **two options**:
- an older DALL·E-based model that may work without additional account verification
- a newer image-generation model that requires account verification



#### Option 1: DALL·E Image Generation (Legacy Model)

This option uses an older DALL·E-based image model.  
It may be available without completing additional account verification steps.


In [None]:
from openai import OpenAI
client = OpenAI()

response = client.images.generate(
  model="dall-e-3",
  prompt="a downhill skier in the swiss alps",
  size="1024x1024"
)

image_url = response.data[0].url

In [None]:
# Load and display the generated image
from PIL import Image
import urllib.request

with urllib.request.urlopen(image_url) as url:
    img=Image.open(url)
    display(img)

#### Option 2: Image Generation with gpt-image-1 (Account Verification Required)

This option uses a newer image-generation model. Access to this model requires account verification.


In [None]:
import base64
from openai import OpenAI
client = OpenAI()

response = client.images.generate(
    model="gpt-image-1",
    prompt="a downhill skier in the Swiss Alps, cinematic lighting",
    size="1024x1024",
)

b64 = response.data[0].b64_json

In [None]:
from io import BytesIO
from PIL import Image

img = Image.open(BytesIO(base64.b64decode(b64)))
display(img)

#### **Key takeaway**

Both options generate images from text prompts using the same workflow.
The difference lies in the image-generation model that is available for your account.
If the newer model is not accessible, the legacy DALL·E option can still be used.


## **Multimodal Example: Analyze the Generated Image**

Now that we have generated an image, we can pass it back to a multimodal model together with a text instruction. This allows the model to describe what it sees or extract structured information from the image.


In [None]:
def analyzeGeneratedImage(image_url):
    response = client.responses.create(
        model="gpt-4.1",
        input=[{
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Describe what you see in this image in 3 bullet points."},
                {"type": "input_image", "image_url": image_url},
            ],
        }],
    )
    return response.output_text

In [None]:
#Here we reuse the image_url from the example above

analysis = analyzeGeneratedImage(image_url)
display(Markdown(analysis))


The same image can be reused with a different instruction to guide the model toward a new task.


In [None]:
analysis = analyzeGeneratedImage(image_url)
display(Markdown(analysis))

# Same image, different instruction
response = client.responses.create(
    model="gpt-4.1",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "What might be happening just before this scene?"},
            {"type": "input_image", "image_url": image_url},
        ],
    }],
)

display(Markdown(response.output_text))


In a multimodal prompt, different input types (such as text and images) are combined in a single message, allowing the model to reason over them together.
