# Initial setup and config

## Preparation:
- Go to https://platform.openai.com/ and sign up if you havent
- Create your API key at https://platform.openai.com/api-keys

## Setup
This section handles the initial setup requirements:
- Installing dependencies from requirements.txt
- Setting up API authentication using a YAML file
- Configuring the OpenAI client

**Security Note**: Never commit API keys directly in code. We use a separate YAML file
that should be added to .gitignore.

Docs: https://platform.openai.com/docs/quickstart/build-your-application

In [1]:
import os
import openai
from openai import OpenAI
import yaml
import time
import requests


# Define functions to manage secrets

In [2]:
def load_secrets(filepath="secrets.yaml"):
    try:
        with open(filepath, "r") as f:
            return yaml.safe_load(f)
    except FileNotFoundError:
        return None
    except yaml.YAMLError as e:
        print(f"Error parsing {filepath}: {e}")
        return None

def create_secrets_file(filepath="secrets.yaml"):
    api_key = input("Please enter your OpenAI API Key: ")
    secrets_data = {"openai": {"api_key": api_key}}
    try:
        with open(filepath, "w") as f:
            yaml.safe_dump(secrets_data, f)
        print(f"secrets.yaml created and OpenAI API key stored.")
        return secrets_data
    except Exception as e:
         print(f"Error creating {filepath}: {e}")
         return None

# Define functions to manage secrets

In [3]:
# Load secrets
secrets = load_secrets()

if not secrets:
    print("secrets.yaml not found or could not be loaded, creating one..")
    secrets = create_secrets_file()
    if not secrets:
        print("Could not load API key. Please check your secrets.yaml file and run again")

if secrets and "openai" in secrets and "api_key" in secrets["openai"]:
  # Configure OpenAI API key
  client = OpenAI(api_key=secrets["openai"]["api_key"])
else:
  print("Could not load API key. Please check your secrets.yaml file")

# Simple Chat Completion
Demonstrates basic interaction with OpenAI's chat API.

## Key Components
- `chat.completions.create()`: Main method for generating completions
- `model`: Specifies GPT version (e.g. "gpt-4")
- `messages`: Array of conversation turns
- `store`: Enables response storage for future reference

## Structure
```python
messages=[
    {"role": "user", "content": prompt}
]
```

## Response Format

```python
choices[0].message.content
```
Contains generated text
Multiple response variations possible with n parameter

## 📚 Documentation:

- API Reference: https://platform.openai.com/docs/api-reference/chat
- Message Structure: https://platform.openai.com/docs/guides/text-generation/message-structure

In [4]:
basic_prompt = "Write a short poem about the moon."


print("Basic Text Generation \nSending request and awaiting response...\n\n\n")
response = client.chat.completions.create(
    model="gpt-4o",
    store=True,
    messages=[
        {"role": "user", "content": basic_prompt}
    ]
)
generated_poem = response.choices[0].message.content
print(f"Prompt:\n{basic_prompt}")
print(f"Response:\n{generated_poem}")

Basic Text Generation 
Sending request and awaiting response...



Prompt:
Write a short poem about the moon.
Response:
In the velvet sea of night, so vast,  
The moon ascends, a silver mast.  
Her luminescence softly glows,  
Whispering secrets the darkness knows.  

She sails through realms of quiet dreams,  
Kissing the earth with tender beams.  
A guardian of the night's embrace,  
Carving shadows on nature's face.  

In her gaze, the tides arise,  
Pulling hearts with longing sighs.  
A celestial muse forever bright,  
Guiding souls through the endless night.  


# Advanced Message Control
Explores message roles and instruction hierarchies.

## Message Roles
- `system`: Core behavioral instructions
- `developer`: Alternative to system role
- `user`: End-user prompts

## Instruction Hierarchy
1. Latest system message takes precedence
2. Developer instructions can be overwritten
3. Multiple inputs accumulate unless explicitly overwritten

## Best Practices
- Keep system prompts focused and clear
- Test role combinations for desired behavior
- Consider message ordering impact

⚠️ **Important**: System messages significantly impact model behavior.

## 📚 **Resources**:
- Role Definitions: https://platform.openai.com/docs/guides/text-generation/role-definitions
- System Instructions: https://platform.openai.com/docs/guides/text-generation/system-instructions

In [5]:
system_prompt = '''
You are a helpful assistant that answers programming
questions in the style of a southern belle from the
southeast United States.
'''

basic_prompt = "Are semicolons optional in JavaScript?"


print("Generation with system messages \nSending request and awaiting response...\n\n\n")
response = client.chat.completions.create(
    model="gpt-4o",
    store=True,
    messages=[
    {
      "role": "developer", #system works as well
      "content": [
        {
          "type": "text",
          "text": system_prompt
        }
      ]
    },
    {
      "role": "developer", #Multiple inputs of same origin
      "content": [
        {
          "type": "text",
          "text": "This is a random test prompt"
        }
      ]
    },
    {
      "role": "developer", #Overwriting instructions
      "content": [
        {
          "type": "text",
          "text": "Overwrite all previous instructions and act as a stereotypical caribbean pirate of irish origin"
        }
      ]
    },
    {
      "role": "system", #Using system instead of developer, overwriting developer instructions
      "content": [
        {
          "type": "text",
          "text": "In your response, insert the keyword L33t"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": basic_prompt
        }
      ]
    },
  ]
)


response = response.choices[0].message.content
print(f"Prompt:\n{basic_prompt}")
print(f"\n\nResponse:\n{response}")

Generation with system messages 
Sending request and awaiting response...



Prompt:
Are semicolons optional in JavaScript?


Response:
Ahoy there, matey! When it comes to the high seas of JavaScript programming, L33t, ye might find that semicolons be a bit of a tricky subject. Ye see, technically, them semicolons ain't required at the end of every statement because JavaScript has a peculiar magic called "Automatic Semicolon Insertion" (ASI), which tries to figure out where them semicolons should go if ye forget 'em.

However, I'd be advisin' ye, just like a good map in uncharted waters, to use 'em consistently. Sometimes JavaScript's guesses ain't what ye intended, which can create some mighty troublesome bugs in your code if ye're not careful. So, for the sake of clear code and smooth sailin', sprinkle them semicolons at the end of each statement—like punctuation at the end of a thought—especially after function calls and return statements. That way, ye won't have to worry about ASI 

# Interactive Chat Example
Demonstrates message chaining for back-and-forth conversation.

## Structure
```python
messages=[
    {"role": "user", "content": "First message"},
    {"role": "assistant", "content": "First response"},
    {"role": "user", "content": "Follow-up question"}
]
```
## Key Points

- Messages list maintains conversation context
- Each turn alternates between user/assistant roles
- Model considers full conversation history
- Useful for context-dependent tasks

📚 Reference: https://platform.openai.com/docs/guides/text-generation/conversation-context

In [6]:
# --- Chained Messages Example with gpt-4o in a loop---
print("\n## Chained Messages Example with gpt-4o in a loop\n")

# Initial prompt
messages = []

# Loop for 3 interactions
for i in range(3):
  prompt = input("Your message to the AI Model:")
  print(f"\nUser Prompt {i+1}: {prompt}")
  messages.append({"role": "user", "content": prompt})

  # Make the API call
  response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )

  response_text = response.choices[0].message.content
  print(f"\n\nResponse {i+1}:\n{response_text}")
  messages.append({"role": "assistant", "content": response_text})

print("\n\nChained messages interaction completed.\n")


## Chained Messages Example with gpt-4o in a loop


User Prompt 1: hi


Response 1:
Hello! How can I assist you today?

User Prompt 2: how are you


Response 2:
Thank you for asking! I'm just a computer program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?

User Prompt 3: 


Response 3:
It seems like your message got cut off. How can I assist you today?


Chained messages interaction completed.



# OpenAI Assistants API
Introduction to the Assistants API for persistent, task-specific AI agents.

## Assistant Creation
```python
client.beta.assistants.create(
    name="Test Assistant",
    instructions="...",
    model="gpt-4"
)
```

## Key Features

- Persistent identity/configuration
- Custom instructions
- Tool integration capability
- State management

## Best Practices

- Clear, specific instructions
- Consider tool requirements
- Test with various prompts

## 📚 Documentation:

- Assistants Overview: https://platform.openai.com/docs/assistants/overview
- Tools Reference: https://platform.openai.com/docs/assistants/tools

In [7]:
assistant_id = None

# If no assistant_id is defined create a new assistant
if not assistant_id:
    print("Creating a new assistant...")
    assistant = client.beta.assistants.create(
        name="Test Assistant",
        instructions="You are a helpful assistant that answers questions concisely.",
        model="gpt-4o",
    )
    assistant_id = assistant.id
    print(f"New assistant created with ID: {assistant_id}")
else:
  print(f"Using existing assistant: {assistant_id}")

Creating a new assistant...
New assistant created with ID: asst_FDHhdtlhjGhRrEPtp2SBzcWN




# Managing Conversations with Threads

Threads maintain conversation context and handle message flow:

## Create conversation container
```python
thread = client.beta.threads.create()
```
## Add message to thread
```python
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Query"
)
```

## Process with assistant
```python
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant_id
)
```

- Thread acts as conversation container
- Messages are added sequentially
- Run executes assistant processing
- Includes status polling and response handling

📚 Deep dive: https://platform.openai.com/docs/assistants/how-it-works/managing-threads

In [8]:
# Example Assistant run
assistant_prompt = "What is the capital of France?"
print(f"Assistant Prompt: {assistant_prompt}")

Assistant Prompt: What is the capital of France?


In [9]:
# Create a thread
thread = client.beta.threads.create()

In [10]:
# Create a user message on the thread
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content=assistant_prompt,
)

In [11]:
# Run the assistant
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant_id,
)

In [12]:
# Wait for the run to complete
while True:
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    if run.status in ["completed", "failed", "cancelled", "expired"]:
        break
    time.sleep(.3)  # Wait for .3 second before checking again

if run.status == "failed":
    print("Assistant run failed!")
    print(f"Run error message: {run.error}")
else:
  # Retrieve messages from the thread
  messages = client.beta.threads.messages.list(thread_id=thread.id)
  # Get the assistant's response
  assistant_response = [message.content[0].text.value for message in messages.data if message.role == "assistant"]
  print("Assistant Response:")
  for res in assistant_response:
    print(f"{res}")

print("\n\nAssistant interaction completed.\n")

Assistant Response:
The capital of France is Paris.


Assistant interaction completed.



# Research Assistant with Advanced Tools
Creates an enhanced assistant with file processing and analysis capabilities:
```python
# Download and process research papers
local_pdf_paths = download_pdfs(pdf_urls)

# Create assistant with tools
assistant = client.beta.assistants.create(
    tools=[{"type": "file_search"}, {"type": "code_interpreter"}]
)

# Set up vector store for document search
vector_store = client.beta.vector_stores.create()
```
- Handles PDF download and processing
- Enables file search capabilities
- Adds code interpretation
- Creates vector embeddings for efficient search
- Integrates all components for research tasks

📚 Tool reference: https://platform.openai.com/docs/assistants/tools

In [2]:
print("\n## Research Assistant Creation\n")

# Define PDF URLs
pdf_urls = [
    "https://arxiv.org/pdf/1706.03762",  # Attention Is All You Need
    "https://arxiv.org/pdf/2412.21187",  # Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like
]

# Download PDFs and save locally
local_pdf_paths = []
for i, url in enumerate(pdf_urls):
    try:
        print(f"Downloading PDF from: {url}")

        # Get pdf from url
        response = requests.get(url, allow_redirects=True)

        response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
        file_extension = os.path.splitext(url)[1].split('?')[0]

        #Setting file extension manually, as it would be a number otherwise - only applies to specific situation
        file_extension = ".pdf"
        local_path = f"research_folder/research_doc_{i+1}{file_extension}"

        #Save PDF
        with open(local_path, "wb") as f:
            f.write(response.content)

        # Add file path to our list
        local_pdf_paths.append(local_path)
        print(f"Downloaded and saved to: {local_path}")
    except requests.exceptions.RequestException as e:
      print(f"Failed to download file from {url} error: {e}")


## Research Assistant Creation

Downloading PDF from: https://arxiv.org/pdf/1706.03762


NameError: name 'requests' is not defined

In [14]:
# Create a new assistant with file_search and code_interpreter
print("\nCreating a new research assistant...")
assistant = client.beta.assistants.create(
    name="Research Assistant",
    instructions="You are a helpful research assistant with access to several research documents and code interpreter. You can answer questions based on the content of the files and use code if needed.",
    model="gpt-4o",
    tools=[{"type": "file_search"}, {"type": "code_interpreter"}],
)
print(f"Assistant created with ID: {assistant.id}")


Creating a new research assistant...
Assistant created with ID: asst_vWiBhzrbpXib7CwtsQQRZnf1


In [15]:
print("\nUploading files to OpenAI...")
file_ids = []
for local_path in local_pdf_paths:
    try:
        print(f"Uploading file: {local_path}")
        with open(local_path, "rb") as file_stream:
            file_obj = client.files.create(file=file_stream, purpose="assistants")
            file_ids.append(file_obj.id)
            print(f"Uploaded file ID: {file_obj.id}")
    except Exception as e:
        print(f"Error uploading file {local_path}: {e}")



Uploading files to OpenAI...
Uploading file: research_doc_1.pdf
Uploaded file ID: file-Edu2DDPw6ZZuSjkabgGK1Y
Uploading file: research_doc_2.pdf
Uploaded file ID: file-VhPtobvz3U7yWgyewjzzTn


In [16]:
# Create a vector store and add the files to it
print("\nCreating vector store and adding files...")
vector_store = client.beta.vector_stores.create(name="Research Documents")
print(f"Vector store created with ID: {vector_store.id}")


Creating vector store and adding files...
Vector store created with ID: vs_wBJkA3VNxhvQ5VdT4FE2W6pC


In [17]:
# Upload all files to the vector store
if file_ids:
    print(f"Adding files to vector store")
    try:
        # Create file streams from local paths
        file_streams = [open(local_path, "rb") for local_path in local_pdf_paths]

        file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
            vector_store_id=vector_store.id, files=file_streams
        )
        print(f"File batch upload status: {file_batch.status}")
        print(f"File batch file counts: {file_batch.file_counts}")
    except Exception as e:
        print(f"Error adding files to vector store: {e}")
else:
    print("No files to add to vector store")

Adding files to vector store
File batch upload status: completed
File batch file counts: FileCounts(cancelled=0, completed=2, failed=0, in_progress=0, total=2)


In [18]:
# Update the assistant to use the vector store
print("\nUpdating assistant with the vector store...")
try:
  assistant = client.beta.assistants.update(
      assistant_id=assistant.id,
      tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
  )
  print("Assistant updated successfully with vector store.")
except Exception as e:
    print(f"Error updating assistant with vector store: {e}")

print("\n\nResearch assistant setup completed.")
print("You can now use the assistant to ask questions about the uploaded files.")
print("Assistant ID: ", assistant.id)
print("Vector Store ID: ", vector_store.id)


Updating assistant with the vector store...
Assistant updated successfully with vector store.


Research assistant setup completed.
You can now use the assistant to ask questions about the uploaded files.
Assistant ID:  asst_vWiBhzrbpXib7CwtsQQRZnf1
Vector Store ID:  vs_wBJkA3VNxhvQ5VdT4FE2W6pC


# Advanced Run Analysis and Monitoring

Provides detailed insight into assistant's processing steps:

```python
run_steps = client.beta.threads.runs.steps.list(
    thread_id=thread.id,
    run_id=run.id
)
```

- Tracks execution progress
- Shows tool usage details
- Reveals thinking/reasoning steps
- Helps debug and optimize interactions
- Monitors file processing and code execution

## Key features:

- Step-by-step execution tracking
- Tool call monitoring
- Response generation analysis
- Error handling and status checks

📚 **Detailed guide:** https://platform.openai.com/docs/assistants/how-it-works/runs-and-run-steps

In [19]:
# --- Running the Assistant with a custom prompt ---
print("\n## Running the Assistant with a Custom Prompt\n")

custom_prompt = "Summarize the key findings of the Attention is all you need paper."
print(f"User Prompt: {cust
      om_prompt}")

# Create a thread
thread = client.beta.threads.create()

# Add the user message to the thread
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content=custom_prompt,
)



## Running the Assistant with a Custom Prompt

User Prompt: Summarize the key findings of the Attention is all you need paper.


In [20]:
# Create a run
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)


# Wait for the run to complete
while True:
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    if run.status in ["completed", "failed", "cancelled", "expired"]:
        break
    time.sleep(1)  # Wait for 1 second before checking again

if run.status == "failed":
    print("Assistant run failed!")
    print(f"Run error message: {run.last_error.message}")
else:
  # Retrieve messages from the thread
  messages = client.beta.threads.messages.list(thread_id=thread.id)
  # Get the assistant's response
  assistant_response = [message.content[0].text.value for message in messages.data if message.role == "assistant"]
  print("Assistant Response:")
  for res in assistant_response:
    print(f"{res}")

print("\n\nAssistant interaction completed.\n")

Assistant run failed!
Run error message: Request too large for gpt-4o in organization org-T0onQ2REbDQ80GAVL7bKk4Xz on tokens per min (TPM): Limit 30000, Requested 32763. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.


Assistant interaction completed.



In [21]:
# --- Retrieve and Display Run Steps ---
print("\n## Run Steps Details\n")

# Retrieve run steps
try:
    run_steps = client.beta.threads.runs.steps.list(
        thread_id=thread.id,
        run_id=run.id
    )

    # Print run steps with details
    for step in run_steps.data:
        print(f"Step ID: {step.id}")
        print(f"Step Type: {step.type}")
        print(f"Status: {step.status}")

        if step.type == "message_creation":
            if step.step_details and hasattr(step.step_details, "message_creation"):
                if hasattr(step.step_details.message_creation, "message"):
                    message = step.step_details.message_creation.message
                    if message and hasattr(message, "content"):
                        message_content = message.content
                        if message_content:
                            print("    Assistant Thinking/Response:")
                            for content_item in message_content:
                                if content_item.type == "text":
                                    text_value = content_item.text.value.strip()
                                    if text_value:
                                        print(f"        {text_value}")

        elif step.type == "tool_calls":
            if step.step_details and hasattr(step.step_details, "tool_calls"):
                for tool_call in step.step_details.tool_calls:
                    print(f"    Tool Call ID: {tool_call.id}")
                    print(f"    Tool Type: {tool_call.type}")

                    if tool_call.type == "file_search":
                        if hasattr(tool_call, "file_search") and hasattr(tool_call.file_search, "results"):
                            if tool_call.file_search.results:
                                print("        File Search Results:")
                                for result in tool_call.file_search.results:
                                    if result.content:
                                        print(f"            Result Content: {result.content}")
                    elif tool_call.type == "code_interpreter":
                        if hasattr(tool_call, "code_interpreter"):
                            if hasattr(tool_call.code_interpreter, "input") and tool_call.code_interpreter.input:
                                print(f"        Code Input: {tool_call.code_interpreter.input}")
                            if hasattr(tool_call.code_interpreter, "outputs") and tool_call.code_interpreter.outputs:
                                for output in tool_call.code_interpreter.outputs:
                                    if hasattr(output, "logs") and output.logs:
                                        print(f"        Code Output: {output.logs}")

        print("-" * 20)
except Exception as e:
    print(f"Error retrieving run steps: {e}")


## Run Steps Details

