# Section 1: OpenAI API Set Up

To use the OpenAI API, you'll need an API key. If you don't already have one, create a key [here](https://platform.openai.com/account/api-keys).

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `OPENAI_API_KEY`. Then pass the key to the SDK:

In [1]:
# Install the OpenAI Python library
%pip install openai



In [2]:
# Import the OpenAI library
import openai
import os

# Used to securely store your API key
from google.colab import userdata

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

Before you can make any API calls, you need to initialize the OpenAI client.

In [3]:
# Initialize the OpenAI client
client = openai.OpenAI()

Now you can make your first API call using the `gpt-4o-mini` model.

In [4]:
response = client.responses.create(
    model="gpt-4o-mini",
    max_output_tokens=50,
    input=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Дай ми история за българската история на български"}
    ]
)

# The Responses API puts text outputs in `response.output_text`
print(response.output_text)


Разбира се! Ето една кратка история за България:

### История на България

България е една от най-древните нации в Европа, с корени, датиращи от V век. Пър


## 1. Max Output Tokens

In this section, we will demonstrate the use of the `max_tokens` parameter to control the maximum length of the model's output.

In [None]:
response_with_max_tokens = client.responses.create(
  model="gpt-4o-mini",
  input=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in bulgarian."} # Defines the role of the user
  ],
  max_output_tokens=50 # Limit the output to a maximum of 50 tokens
)

print(response_with_max_tokens.output_text)

Разказ за българската история

България е една от най-древните държави в Европа, с история, която датира от VI век пр.н.е., когато траките населявали тези земи. Пр


## 2. Temperature Parameter

The `temperature` parameter controls the randomness of the model's output. Higher values like 0.8 will make the output more random and creative, while lower values like 0.2 will make it more focused and deterministic.

In [None]:
# High temperature example
response_high_temp = client.responses.create(
  model="gpt-4o-mini",
  input=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in bulgarian."} # Defines the role of the user
  ],
  max_output_tokens=100,
  temperature=0.8 # Set temperature to a higher value for more randomness
)

print("Response with high temperature:")
print(response_high_temp.output_text)

Response with high temperature:
Разказ за българската история

Българската история е дълга и интересна, изпълнена с важни събития и културни постижения. Началото на българската държава се свързва с основаването на Първата българска държава през 681 година от хан Аспарух. Той води своя народ през река Дунав и създава нова територия, която


In [None]:
# Low temperature example
response_low_temp = client.responses.create(
  model="gpt-4o-mini",
  input=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in bulgarian."} # Defines the role of the user
  ],
  max_output_tokens=100,
  temperature=0.2 # Set temperature to a lower value for less randomness
)

print("\nResponse with low temperature:")
print(response_low_temp.output_text)


Response with low temperature:
Разказ за българската история

България има дълга и вълнуваща история, която започва още в древността. Първите известни жители на територията на днешна България са траките, които оставят след себе си множество културни и исторически следи. Те са известни със своите ритуали, изкуство и златни съкровища.

През VII век, на терит


## 3. Top-p (Nucleus Sampling)

The `top_p` parameter, also known as nucleus sampling, controls the diversity of the model's output by considering the smallest set of words whose cumulative probability exceeds the `top_p` threshold. Lower values will result in more focused and deterministic output, similar to lower temperature, while higher values will lead to more diverse and creative output.

In [None]:
# High top_p example
response_high_top_p = client.responses.create(
  model="gpt-4o-mini",
  input=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in bulgarian."} # Defines the role of the user
  ],
  max_output_tokens=100,
  top_p=0.9 # Set top_p to a higher value for more diverse output
)

print("Response with high top_p:")
print(response_high_top_p.output_text)

Response with high top_p:
Разбира се! Ето една кратка история за българската история:

---

**История на България**

България е държава с дълга и богата история, която датира от V век, когато се формира Първата българска държава. През 681 година хан Аспарух създава българска държава на територията на днешна Североизточна България, след успешна бит


In [None]:
# Low top_p example
response_low_top_p = client.responses.create(
  model="gpt-4o-mini",
  input=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in bulgarian."} # Defines the role of the user
  ],
  max_output_tokens=100,
  top_p=0.1 # Set top_p to a lower value for more focused output
)

print("\nResponse with low top_p:")
print(response_low_top_p.output_text)


Response with low top_p:
Разбира се! Ето една кратка история за българската история:

---

**История на България**

България има дълга и богата история, която започва още в древността. Първите известни жители на територията на днешна България са траките, които оставят значителен културен следа. През 681 година, хан Аспарух основава Първата българска държава,


## 4. Prompt Caching

Prompt caching is a technique where the language model remembers and reuses parts of a prompt from previous requests to reduce latency and cost. In the example below, we use a large static text block as part of the system message. When subsequent requests include this same static prefix, the OpenAI API can utilize caching, as indicated by the cached_tokens value in the usage details. This demonstrates how repeated elements in prompts can benefit from caching.

In [None]:
# 📦 Big static block to exceed 1024 tokens
static_text = ("This is static text for caching.\n" * 300)  # Adjust repeat count if needed

# Static prefix = same every request → cacheable
STATIC_PREFIX = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "system", "content": static_text}
]

def run_turn(messages):
    res = client.chat.completions.create(
        model="gpt-4o-mini-2024-07-18",  # caching works on GPT-4o and GPT-4o-mini
        messages=messages,
        temperature=0.2,
        max_tokens=100
    )
    reply = res.choices[0].message.content
    cached = res.usage.prompt_tokens_details.cached_tokens
    print("\nAssistant:", reply)
    print(f"Prompt tokens: {res.usage.prompt_tokens}")
    print(f"Cached tokens: {cached}")
    print(f"Completion tokens: {res.usage.completion_tokens}")
    return reply

# 🔹 TURN 1 — First request (creates the cache)
messages = [
    *STATIC_PREFIX,
    {"role": "user", "content": "Give me three fun facts about Bulgaria in bulgarian."}
]
a1 = run_turn(messages)

# 🔹 TURN 2 — Same static prefix, new question (prefix is cached)
messages = [
    *STATIC_PREFIX,
    {"role": "user", "content": "Give me three fun facts about Bulgaria in bulgarian."},
    {"role": "assistant", "content": a1},
    {"role": "user", "content": "Give me three fun facts about Macedonia in bulgarian."}
]
run_turn(messages)


Assistant: Разбира се! Ето три интересни факта за България на български:

1. **Старият град Пловдив** е един от най-старите непрекъснато населявани градове в света, с история, която датира повече от 6 000 години.

2. **Българската роза** е известна в целия свят. Страната е един от най-големите производители на розово масло
Prompt tokens: 2133
Cached tokens: 0
Completion tokens: 100

Assistant: Разбира се! Ето три интересни факта за Македония на български:

1. **Охридското езеро** в Македония е едно от най-дълбоките и най-старите езера в Европа, с дълбочина от около 288 метра и история, която датира над 4 милиона години.

2. **Скопие**, столицата на Македония, е известна със своите мног
Prompt tokens: 2253
Cached tokens: 2176
Completion tokens: 100


'Разбира се! Ето три интересни факта за Македония на български:\n\n1. **Охридското езеро** в Македония е едно от най-дълбоките и най-старите езера в Европа, с дълбочина от около 288 метра и история, която датира над 4 милиона години.\n\n2. **Скопие**, столицата на Македония, е известна със своите мног'

## 5. OpenAI Batch Processing

This section demonstrates how to use OpenAI's batch processing feature to run multiple API requests asynchronously. This can be useful for processing large numbers of requests more efficiently.

### 1. Prepare the batch input file

First, we create a JSONL file containing the details for each individual API request we want to include in the batch. Each line in the file is a JSON object representing a single request.

In [None]:
import json
import pathlib

batch_lines = [
    {
        "custom_id": "req-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "Give me three facts about Paris."}
            ]
        }
    },
    {
        "custom_id": "req-2",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "Give me three facts about Rome."}
            ]
        }
    }
]

pathlib.Path("batch.jsonl").write_text(
    "\n".join(json.dumps(line) for line in batch_lines),
    encoding="utf-8"
)

494

### 2. Upload the batch input file

Next, we upload the prepared JSONL file to OpenAI. This file will be used as the input for our batch job.

In [None]:
batch_file = client.files.create(file=open("batch.jsonl", "rb"), purpose="batch")
print("Uploaded file:", batch_file.id)

Uploaded file: file-GzeSTcTwou1myYPd1M7ErW


### 3. Create the batch job

Now, we create the batch job itself, specifying the uploaded input file ID, the endpoint to use for the requests (in this case, chat completions), and the completion window (how long OpenAI has to process the batch).

In [None]:
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)
print("Batch id:", batch.id)

Batch id: batch_68a210c2b83881909076ec41007ebb33


### 4. Poll for completion

Batch jobs are processed asynchronously. We need to poll the batch status periodically until it is completed.

In [None]:
import time

while True:
    b = client.batches.retrieve(batch.id)
    print("Status:", b.status)
    if b.status == "completed":
        break
    time.sleep(5)

Status: completed


### 5. Download and print results

Once the batch job is completed, we can download the output file, which contains the results for each request in the batch. We then parse the JSONL output and print the responses.

In [None]:
out_id = b.output_file_id
out_data = client.files.content(out_id).content.decode("utf-8").splitlines()

for line in out_data:
    rec = json.loads(line)
    cid = rec["custom_id"]
    if "response" in rec:
        content = rec["response"]["body"]["choices"][0]["message"]["content"]
        print(f"\n{cid} → {content}")
    else:
        print(f"\n{cid} → ERROR: {rec['error']}")


req-1 → Sure! Here are three facts about Paris:

1. **Cultural Capital**: Paris is often referred to as the "Cultural Capital of the World" due to its rich history in art, fashion, literature, and philosophy. It is home to renowned museums such as the Louvre, which houses thousands of works of art including the Mona Lisa.

2. **Landmarks**: The Eiffel Tower, constructed for the 1889 World's Fair, is one of the most recognizable structures in the world and a symbol of Paris. Standing at 1,083 feet (330 meters) tall, it attracts millions of visitors each year.

3. **Historical Significance**: Paris has played a central role in many significant historical events, including the French Revolution. The city is known for its beautiful architecture and historic neighborhoods, such as Montmartre, which have been influential in shaping European history.

req-2 → Sure! Here are three interesting facts about Rome:

1. **Ancient Civilization**: Rome is one of the oldest continuously inhabited citi

## 6. Tool Calling

This section demonstrates how to use the Tool Calling feature with OpenAI Assistants, allowing the assistant to call external functions (like a custom function or a built-in web search) to retrieve information or perform actions based on the user's request. The example below shows how to define a custom function for adding numbers and how the model can utilize both this custom function and a web search tool.

In [None]:


# Custom function
def add_numbers(a, b):
    print("The Function is called")
    return {"sum": a + b}

# Define tools - custom function + built-in web search
tools = [
    {
        "type": "function",
        "name": "add_numbers",
        "description": "Add two numbers and return the sum.",
        "parameters": {
            "type": "object",
            "properties": {
                "a": {"type": "number"},
                "b": {"type": "number"}
            },
            "required": ["a", "b"]
        }
    },
    {
        "type": "web_search_preview"
    }
]

prompt = input("Your prompt: ")

# Create input list
input_list = [{"role": "user", "content": prompt}]

# First call using Responses API
response = client.responses.create(
    model="gpt-4o-mini",
    tools=tools,
    input=input_list,
)

# Add response output to input list
input_list += response.output

# Check for function calls in output
function_calls = [item for item in response.output if item.type == "function_call"]

if function_calls:
    # Process each function call
    for call in function_calls:
        if call.name == "add_numbers":
            # Execute custom function
            args = json.loads(call.arguments)
            result = add_numbers(args["a"], args["b"])

            # Add function result to input list
            input_list.append({
                "type": "function_call_output",
                "call_id": call.call_id,
                "output": json.dumps(result),
            })
        # Note: web_search_preview is handled automatically by OpenAI

    # Get final response
    final_response = client.responses.create(
        model="gpt-4o-mini",
        input=input_list,
        tools=tools,
    )

    print(final_response.output_text)
else:
    print(response.output_text)

Your prompt: Какво е времето в София?
В момента в София е слънчево с температура около 31°C (88°F).

## Времето в София 22, Враца, 3320 Козлодуй, България:
В момента: Слънчево, 88°F (31°C)

Дневна прогноза:
* неделя, август 17: Минимална: 67°F (20°C), Максимална: 92°F (34°C), Прогноза: Предимно слънчево
* понеделник, август 18: Минимална: 63°F (17°C), Максимална: 86°F (30°C), Прогноза: Гръмотевична буря на места в района
* вторник, август 19: Минимална: 53°F (12°C), Максимална: 79°F (26°C), Прогноза: Гръмотевична буря на места в района
* сряда, август 20: Минимална: 56°F (13°C), Максимална: 88°F (31°C), Прогноза: По-топло
* четвъртък, август 21: Минимална: 65°F (18°C), Максимална: 97°F (36°C), Прогноза: Много горещо
* петък, август 22: Минимална: 63°F (17°C), Максимална: 93°F (34°C), Прогноза: Слънчево с мараня
* събота, август 23: Минимална: 55°F (13°C), Максимална: 86°F (30°C), Прогноза: Предимно слънчево


Прогнозата показва, че в следващите дни ще има предимно слънчево време с възм

## 7. Frequency and Presence Penalty

Frequency and presence penalties can be used to control the model's tendency to repeat tokens.

*   **Frequency Penalty:** Penalizes tokens based on how many times they have appeared in the text so far. Higher values reduce repetition of frequently occurring tokens.
*   **Presence Penalty:** Penalizes tokens based on whether they appear in the text so far. Higher values encourage the model to introduce new topics or terms.

In [None]:
# Frequency Penalty Example
response_freq_penalty = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in Bulgarian."} # Defines the role of the user
  ],
  max_tokens=100,
  frequency_penalty=1.0 # Increase frequency penalty to reduce repetition,
)

print("Response with Frequency Penalty:")
print(response_freq_penalty.choices[0].message.content)

Response with Frequency Penalty:
Разказ за българската история

В началото на IX век, българската държава, основана от хан Аспарух, започва да се утвърджава на Балканския полуостров. Хан Аспарух е водил битки с Византийската империя и през 681 година успява да създаде първата българска държава в земите около река Дунав.

Скоро след


In [None]:
# Negative Frequency Penalty Example
response_neg_freq_penalty = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in Bulgarian."} # Defines the role of the user
  ],
  max_tokens=100,
  frequency_penalty=-2.0 # Decrease frequency penalty to encourage repetition
)

print("\nResponse with Negative Frequency Penalty:")
print(response_neg_freq_penalty.choices[0].message.content)


Response with Negative Frequency Penalty:
Разбира се! Ето една кратка история за българската история:

### История на България

България е една от най-древните държави в Европа, с история, която се простира хилядолетия назад. Първото българско държавно образувание, Първото българско царство, е основано през 681 година, когато хан Аспарух, вожд на българите, се


In [None]:
# Presence Penalty Example
response_pres_penalty = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in Bulgarian."} # Defines the role of the user
  ],
  max_tokens=100,
  presence_penalty=1.0 # Increase presence penalty to encourage new topics
)

print("\nResponse with Presence Penalty:")
print(response_pres_penalty.choices[0].message.content)


Response with Presence Penalty:
Разказ за българската история

България има дълга и забележителна история, която започва преди повече от 1300 години. Първото българско царство е основано през 681 г. от хан Аспарух. Със своето обединение на различни племена, той полага основите на една от първите държави в Европа.

През 864 г. княз Бор


In [None]:
# Negative Presence Penalty Example
response_neg_pres_penalty = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."}, # Defines the role of the AI model
    {"role": "user", "content": "Give me a story about the Bulgarian history in Bulgarian."} # Defines the role of the user
  ],
  max_tokens=100,
  presence_penalty=-2.0 # Decrease presence penalty to discourage new topics
)

print("\nResponse with Negative Presence Penalty:")
print(response_neg_pres_penalty.choices[0].message.content)


Response with Negative Presence Penalty:
Историята на България е дълга и оспорвана, изпълнена с множество важни събития и културни трансформации.

Създаването на Първото българско царство е датирано около 681 година, когато хан Аспарух, хан на българите, създава държава на Балканите. Българите, които произхождат от Централна Азия, успяват


## 8. Streaming

Streaming allows the model's response to be sent back in chunks as it's being generated, rather than waiting for the entire response to be ready. This can improve the perceived latency and user experience.

In [None]:
import sys

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Explain black holes in two short paragraphs."}
    ],
    temperature=0.7,
    stream=True,   # ← enable streaming
)

# Print tokens as they arrive
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta and delta.content: # Corrected access using delta.content
        sys.stdout.write(delta.content)
        sys.stdout.flush()

print()  # newline at the end

Black holes are regions in space where the gravitational pull is so strong that nothing, not even light, can escape from them. They are formed when massive stars exhaust their nuclear fuel and collapse under their own gravity at the end of their life cycle. The boundary surrounding a black hole is called the event horizon, beyond which no information or matter can escape. Black holes can vary in size, with stellar black holes formed from individual stars and supermassive black holes found at the centers of galaxies, containing millions to billions of times the mass of the Sun.

Despite being invisible, black holes can be detected through their interactions with nearby matter. When a black hole pulls in gas and dust from a companion star, the material heats up and emits X-rays, which can be observed by telescopes. Additionally, black holes influence the motion of stars and gas in their vicinity, providing indirect evidence of their existence. The study of black holes not only deepens ou

##9. Sending Images to GPT-4o Mini

GPT-4o mini can process images. This section demonstrates how to send an image to the model and get a description of its content.

In [None]:
import base64

# Function to encode the image to base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image file
image_path = "/content/image_1.jpeg" # Make sure 'image_1' is uploaded to your Colab environment

# Getting the base64 string
base64_image = encode_image(image_path)

response = client.responses.create(
  model="gpt-4o-mini",
  input=[
    {"role": "system", "content": "You are a helpful assistant that describes images."},
    {"role": "user", "content": [
      {"type": "input_text", "text": "Describe this image:"},
      {"type": "input_image", "image_url": f"data:image/jpeg;base64,{base64_image}"}
    ]}
  ],
  max_output_tokens=300,
)

print("Description of the image:")
print(response.output_text)

Description of the image:
The image features a cat with a soft, fluffy coat that appears to be a mix of gray and cream colors. Its face is round with large, expressive eyes that give it a curious and endearing look. The cat is sitting with its front paws neatly together, and its ears are perked up, indicating attentiveness. The background is plain, emphasizing the cat's features.


## 10. OpenAI Assistants

This section demonstrates how to use the OpenAI Assistants API, which allows you to build AI assistants that can understand context, maintain conversation history, and leverage tools.

### 1. Create a new thread

Threads represent a conversation session between a user and an Assistant.

In [None]:
# 1. Create a new thread
thread = client.beta.threads.create()
print("Thread created with ID:", thread.id)

  thread = client.beta.threads.create()


Thread created with ID: thread_jcYbmapE1AoMjKIKMY7XouZu


### 2. Add a user message to the thread

Messages are added to the thread. They can be from the user or the Assistant.

In [None]:
# 2. Add a user message to the thread
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Hello, can you introduce yourself?"
)
print("User message added to thread.")

  client.beta.threads.messages.create(


User message added to thread.


### 3. Run the assistant on the thread

A Run is an execution of an Assistant on a Thread. The Assistant uses its configuration and the Thread's messages to generate a response.

In [None]:
# 3. Run the assistant on the thread
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id="asst_lcgYb6wZaRp6UDxdhTi4qPoq" # Replace with your Assistant ID if needed
)
print("Assistant run initiated with ID:", run.id)

  run = client.beta.threads.runs.create(


Assistant run initiated with ID: run_QJe5759huBqGZj8uvqzgmWku


### 4. Wait for completion and fetch messages

We poll the run status until it is completed, then retrieve the messages.

In [None]:
# 4. Wait for completion and fetch messages
import time

while True:
    run_status = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    print("Run status:", run_status.status)
    if run_status.status == "completed":
        break
    time.sleep(1)

  run_status = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)


Run status: completed


### 5. Retrieve and display messages

Finally, we retrieve all messages from the thread and display them in chronological order.

In [None]:
# 5. Retrieve all messages and display
messages = client.beta.threads.messages.list(thread_id=thread.id)

for m in reversed(messages.data):
    role = m.role
    # Check if content exists and is of type text
    if m.content and m.content[0].type == 'text':
        content = m.content[0].text.value
        print(f"{role}: {content}")
    elif m.content and m.content[0].type == 'image_file':
        # Handle image files if necessary, or just note them
        print(f"{role}: [Image file]")
    else:
        print(f"{role}: [Unsupported content type]")

  messages = client.beta.threads.messages.list(thread_id=thread.id)


user: Hello, can you introduce yourself?
user: Hello, can you introduce yourself?
assistant: Hello! I'm ChatGPT, an AI language model created by OpenAI. I'm here to help you with a wide range of tasks, from answering questions and providing explanations to generating creative text and assisting with problem-solving. How can I assist you today?


# Section 2: Anthropic API Set Up

To use the Anthropic API, you'll need an API key. If you don't already have one, create a key [here](https://console.anthropic.com/settings/keys).

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `ANTHROPIC_API_KEY`. Then pass the key to the SDK:

In [None]:
# Install the Anthropic Python library
%pip install anthropic


Collecting anthropic
  Downloading anthropic-0.64.0-py3-none-any.whl.metadata (27 kB)
Downloading anthropic-0.64.0-py3-none-any.whl (297 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.2/297.2 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: anthropic
Successfully installed anthropic-0.64.0


In [None]:
# Import the Anthropic library
import anthropic
import os

# Used to securely store your API key
from google.colab import userdata

os.environ['ANTHROPIC_API_KEY'] = userdata.get('ANTHROPIC_API_KEY')

In [None]:
client = anthropic.Anthropic()

In [None]:
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1500,
    thinking={
        "type": "enabled",
        "budget_tokens": 1024
    },
    messages=[{
        "role": "user",
        "content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
    }]
)

# The response will contain summarized thinking blocks and text blocks
for block in response.content:
    if block.type == "thinking":
        print(f"\nThinking summary: {block.thinking}")
    elif block.type == "text":
        print(f"\nResponse: {block.text}")


Thinking summary: This is asking about primes that are congruent to 3 modulo 4. Let me think about this.

First, let me recall what we know about primes modulo 4:
- The only even prime is 2
- All other primes are odd, so they're either ≡ 1 (mod 4) or ≡ 3 (mod 4)

The question is asking whether there are infinitely many primes p such that p ≡ 3 (mod 4).

This is actually a classic theorem in number theory. The answer is yes, and there's a beautiful proof similar to Euclid's proof that there are infinitely many primes.

Let me sketch the proof:

Suppose there are only finitely many primes congruent to 3 modulo 4. Let's call them p₁, p₂, ..., pₖ.

Consider the number N = 4(p₁p₂...pₖ) - 1.

Note that N ≡ 4(p₁p₂...pₖ) - 1 ≡ -1 ≡ 3 (mod 4).

Now, N must have prime divisors. Since N is odd, all its prime divisors are odd, so they're either ≡ 1 (mod 4) or ≡ 3 (mod 4).

If all prime divisors of N were ≡ 1 (mod 4), then N would be ≡ 1 (mod 4) (since the product of numbers ≡ 1 (mod 4) is ≡ 1 (mo

## 1. Streaming

This section demonstrates how to stream responses from the Anthropic API, which can provide a faster perceived response time by delivering the model's output in chunks as it is generated.

In [None]:
from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    temperature=0.3,
    top_p=0.9,
    messages=[{"role": "user", "content": "Explain transformers in 3 bullets."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")   # live output
    final = stream.get_final_message()

# (Optional) print a newline after streaming
print()

• **Architecture**: Transformers use self-attention mechanisms to process all parts of an input sequence simultaneously, rather than sequentially, allowing them to capture relationships between any two positions regardless of distance.

• **Key Innovation**: The "attention is all you need" approach eliminates the need for recurrent or convolutional layers, using multi-head attention to focus on different aspects of the input and enabling much more efficient parallel processing.

• **Impact**: They've revolutionized N


### 3. Prompt Caching

Anthropic models support prompt caching to improve efficiency and reduce latency for repeated parts of the input. By marking certain system blocks with `cache_control`, you can indicate to the API that this content is stable and can potentially be cached for subsequent requests.

In [None]:
# Big reusable block (simulating large context)
BIG_CONTEXT = "Pride and Prejudice " * 50

system_blocks = [
    {"type": "text", "text": "You are a helpful literature assistant."},
    {"type": "text", "text": BIG_CONTEXT, "cache_control": {"type": "ephemeral"}}
]

def ask(question: str):
    resp = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=100,
        system=system_blocks,
        messages=[{"role": "user", "content": question}]
    )
    print(resp.content[0].text)
    u = resp.usage
    print(f"Usage: input={u.input_tokens} output={u.output_tokens} "
          f"cache_create={getattr(u, 'cache_creation_input_tokens', 0)} "
          f"cache_read={getattr(u, 'cache_read_input_tokens', 0)}\n")

print("---- First call (creates cache) ----")
ask("Name two central themes in the book.")

print("---- Second call (reads from cache) ----")
ask("Briefly describe Elizabeth Bennet’s character arc.")

---- First call (creates cache) ----
Two central themes in "Pride and Prejudice
Usage: input=273 output=10 cache_create=0 cache_read=0

---- Second call (reads from cache) ----
Two central themes in "Pride and Prejudice
Usage: input=273 output=10 cache_create=0 cache_read=0



# **Section 3: Gemini API Set Up**

In [None]:
%pip install -q -U google-genai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.1/43.1 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/229.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m225.3/229.3 kB[0m [31m9.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m229.3/229.3 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25h

**Make your first request**

Here is an example that uses the generateContent method to send a request to the Gemini API using the Gemini 2.5 Flash model.

In [None]:
from google import genai

os.environ['GEMINI_API_KEY'] = userdata.get('GEMINI_API_KEY')

# The client gets the API key from the environment variable `GEMINI_API_KEY`.
client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash", contents="Explain how AI works in a few words"
)
print(response.text)

AI learns patterns from data to make smart decisions or predictions.


Gamini Flash Thinking

In [None]:
from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain how AI works in a few words",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=0) # Disables thinking
    ),
)
print(response.text)

AI works by finding patterns in data.


In [None]:
from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Provide a list of 3 famous physicists and their key contributions",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=1024) # Disables thinking
    ),
)
print(response.text)

Here are 3 famous physicists and their key contributions:

1.  **Isaac Newton** (1642–1727)
    *   Developed the **Laws of Motion**, forming the basis of classical mechanics.
    *   Formulated the **Law of Universal Gravitation**, explaining the force that governs the motion of planets and falling objects.
    *   Co-invented **calculus** (with Gottfried Leibniz), a fundamental tool in mathematics and science.
    *   Pioneering work in **optics**, including the nature of light and color.

2.  **Albert Einstein** (1879–1955)
    *   Developed the **theories of Special and General Relativity**, revolutionizing our understanding of space, time, gravity, and the universe.
    *   Famous for the mass-energy equivalence formula, **E=mc²**.
    *   Explained the **photoelectric effect** (for which he won the Nobel Prize), a crucial step in the development of quantum mechanics.
    *   Provided a theoretical explanation for **Brownian motion**, confirming the existence of atoms.

3.  **Mari

In [None]:
from google import genai
from google.genai import types

client = genai.Client()

prompt = """
Alice, Bob, and Carol each live in a different house on the same street: red, green, and blue.
The person who lives in the red house owns a cat.
Bob does not live in the green house.
Carol owns a dog.
The green house is to the left of the red house.
Alice does not own a cat.
Who lives in each house, and what pet do they own?
"""

thoughts = ""
answer = ""

for chunk in client.models.generate_content_stream(
    model="gemini-2.5-pro",
    contents=prompt,
    config=types.GenerateContentConfig(
      thinking_config=types.ThinkingConfig(
        include_thoughts=True
      )
    )
):
  for part in chunk.candidates[0].content.parts:
    if not part.text:
      continue
    elif part.thought:
      if not thoughts:
        print("Thoughts summary:")
      print(part.text)
      thoughts += part.text
    else:
      if not answer:
        print("Answer:")
      print(part.text)
      answer += part.text

Thoughts summary:
**Dissecting the Puzzle**

I've started by deconstructing the request. My initial focus is on the core goal: matching each person to their house color and pet. I'm identifying the key entities—Alice, Bob, Carol, red, green, blue, dog, cat, and bird. Now I am ready to begin interpreting the clues.



**Organizing the Variables**

I'm now implementing the table setup as the core framework. The goal is to visually represent the interconnected entities. The initial structure is in place, and I am populating the table with the information. I am also planning on updating the spatial relationship with a corresponding visual representation.



**Interpreting Connections**

My thinking has progressed to actively linking clues. Specifically, I'm integrating clues 3 and 5. I realize Carol owning a dog now directly contrasts with the red house/cat relationship established in clue 3, allowing me to conclude Carol doesn't live in the red house. I will now integrate this new knowled

# QWEN 3

In [None]:
from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switch between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)
To disable thinking, you just need to make changes to the argument enable_thinking like the following:

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # True is the default value for enable_thinking.
)

#Cerebras

In [6]:
%pip install --upgrade cerebras_cloud_sdk
import os
from cerebras.cloud.sdk import Cerebras

Collecting cerebras_cloud_sdk
  Downloading cerebras_cloud_sdk-1.49.0-py3-none-any.whl.metadata (19 kB)
Downloading cerebras_cloud_sdk-1.49.0-py3-none-any.whl (91 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.4/91.4 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: cerebras_cloud_sdk
Successfully installed cerebras_cloud_sdk-1.49.0


In [11]:
os.environ['CEREBRAS_API_KEY'] = userdata.get('CEREBRAS_API_KEY')

client = Cerebras(
    # This is the default and can be omitted
    api_key=os.environ.get("CEREBRAS_API_KEY")
)

stream = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "Hello there, please tell me 3 dad jokes"
        }
    ],
    model="gpt-oss-120b",
    stream=True,
    max_completion_tokens=65536,
    temperature=1,
    top_p=1,
    reasoning_effort="medium"
)

for chunk in stream:
  print(chunk.choices[0].delta.content or "", end="")

Sure thing! Here are three classic dad jokes for you:

1. **Why did the scarecrow win an award?**  
   *Because he was outstanding in his field!*

2. **I’m reading a book about anti‑gravity.**  
   *It’s impossible to put down!*

3. **Why do you never see elephants hiding in trees?**  
   *Because they’re really good at it.* 😄

#Fireworks AI

In [5]:
!pip install --upgrade fireworks-ai openai
import os

Collecting openai
  Using cached openai-1.102.0-py3-none-any.whl.metadata (29 kB)


In [6]:
from fireworks import LLM

os.environ['FIREWORKS_API_KEY'] = userdata.get('FIREWORKS_API_KEY')

# Basic usage - SDK automatically selects optimal deployment type
llm = LLM(model="llama4-maverick-instruct-basic", deployment_type="auto", api_key=os.environ.get("FIREWORKS_API_KEY"))

response = llm.chat.completions.create(
    messages=[{"role": "user", "content": "Hello there, please tell me 3 dad jokes"}]
)

print(response.choices[0].message.content)

You want to groan and roll your eyes, huh? Here are three dad jokes for you:

1. Why did the scarecrow win an award? Because he was outstanding in his field! (get it?)
2. I told my wife she was drawing her eyebrows too high. She looked surprised.
3. Why did the mushroom go to the party? Because he was a fun-gi!

Hope these corny jokes made you chuckle (or at least roll your eyes in amusement)!
