# Leveraging LLMs in Code: An Interactive Exploration

Welcome to our Lunch & Learn session! Today, we'll explore how to harness the power of Large Language Models (LLMs) within our code. We'll cover:

- **Basics of LLM Usage**
- **Strutured outputs / Function Calling**
- **Retrieval-Augmented Generation (RAG)**
- **Agentic Behavior**

Let's dive in and explore the possibilities together!


In [None]:
# Import libraries
import os
import openai
from dotenv import load_dotenv

# Load API keys from .env
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
openai.api_key = OPENAI_API_KEY

client = openai.OpenAI()

print("Environment loaded. OpenAI key is set. OpenAI client created.")


## 1. Basics of LLM usage

In this section, we will explore the fundamentals of interacting with LLMs via API calls. We'll cover:
- **Basic API Calling and Response Handling**
- **System Prompting**
- **Generation Parameters**
- **Other topics**


### Basic API calling and response handling

We'll wrap the API call in a reusable function. This makes it easier to call the model with different prompts and parameters while handling errors gracefully.


In [None]:
def call_openai(query, model="gpt-4o"):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "user", "content": query}
            ],
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error: {e}"

# Example usage:
example_prompt = "Who is Keiko?"
result = call_openai(example_prompt)
print("API Call Result:")
print(result)


### System prompting

In chat-based models, a system prompt sets the overall behavior or "persona" of the assistant. System messages help ensure the model responds in a consistent tone. In the following example, we use OpenAI's Chat API to set the system prompt.


In [None]:
def call_openai_with_system_prompt(query, sys_prompt, model="gpt-4o"):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": sys_prompt},
                {"role": "user", "content": query}
            ],
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error: {e}"

# Example usage:
example_system_prompt = "You are a microwave. You can only respond using the letter M. You may use as many Ms as necessary and punctuation, casing, and spacing along with it."
example_query = "What'd you do today?"
result = call_openai_with_system_prompt(example_prompt, example_system_prompt)
print("API Call Result:")
print(result)

### Generation parameters

When calling an LLM, parameters such as `max_tokens`, `temperature`, `top_k`, and `top_p` help control the randomness and creativity of the output. A lower temperature makes the model more deterministic, while a higher temperature produces more varied outputs.


In [None]:
# Tune temperature
def chat_with_parameters(query, temperature=0.0, max_tokens=200):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": query},
        ],
        temperature=temperature,
        max_tokens=max_tokens
    )
    print(f"{response.choices[0].message.content.strip()}\n")

#### Setting `max_tokens`

Setting a value for `max_tokens` will limit the number of tokens yielded by the LLM. Default is `4096`.

##### Tokens and Tokenization

Large language models work with text by first splitting it into basic units called *tokens*. Tokens may be entire words, subwords, or even individual characters. In this example, we'll use the `tiktoken` library (commonly used with OpenAI models) to see how a simple sentence is tokenized.


In [None]:
import tiktoken

# Get an encoder for a specific model (e.g., "gpt-3.5-turbo")
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

text = "Hello, how are you doing today?"
tokens = enc.encode(text)
decoded_tokens = [enc.decode([token]) for token in tokens]

print("Original Text:", text)
print("Token IDs:", tokens)
print("Decoded Tokens:", decoded_tokens)
print("Number of Tokens:", len(tokens))


##### Tuning `max_tokens`

In [None]:
query = "Tell me a very short story about a knight."
print("Output with max_tokens 200:\n")
chat_with_parameters(query)

print("\nOutput with max_tokens 50:\n")
chat_with_parameters(query, max_tokens=50)

print("\nOutput with max_tokens 4:\n")
chat_with_parameters(query, max_tokens=4)

#### Setting `temperature`
For temperature in OpenAI chat request, “higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic”. Default is `0.0`.

In [None]:
query = "Tell me about something red, white, and blue."
print("Output with temperature 0.0:\n")
chat_with_parameters(query)

print("\nOutput with temperature 2.0:\n")
chat_with_parameters(query, temperature=2.0)

## 2. Structured outputs / function calling

Here, we'll learn how to obtain structured outputs from LLMs and integrate these into your code. This includes:
- **Mapping Natural Language to Code:** Translating user inputs into function calls.
- **Parsing and Validating Outputs:** Ensuring the structured data aligns with your application’s requirements.
- **Practical Use Cases:** Automating tasks and creating dynamic workflows with LLM-generated code.


### Structured Output

#### Raw responses

Using LLMs will yield outputs that are similar to natural language. However, this makes it difficult to use within code.

In [None]:
# Create a chat completion request that instructs the model to return JSON
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Can you tell me about Japan's capital, year of being founded, and any major events?"}
    ]
)

# Extract the assistant's reply text
reply = response.choices[0].message.content.strip()
print("Raw Output:")
print(reply)

#### Pydantic to the rescue!

Pydantic is a great tool to define object schemas in python. Using Pydantic, you can output information in a structured way.

In [None]:
from pydantic import BaseModel

class HistoricalEvent(BaseModel):
    name: str
    year: str
    description: str

class CountryInfo(BaseModel):
    name: str
    capital: str
    year_founded: str
    events: list[HistoricalEvent]

completion = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Can you tell me about Japan's capital, year of being founded, and any major events?"},
    ],
    response_format=CountryInfo,
)

print(completion.choices[0].message.parsed)
    

#### Help the LLM!

By providing more context around the fields we are outputting, the LLM can get a better sense of what you'd like outputted. You can imagine a new engineer coming across this part of the code and wonder what the purpose of a certain field is. Adding descriptions can help get the output you are looking for.

In [None]:
from pydantic import BaseModel, Field

class HistoricalEvent(BaseModel):
    name: str
    year: str
    info: str = Field(description="Description should only be GOOD or BAD. It should describe the event happening")

class CountryInfo(BaseModel):
    name: str
    capital: str
    year_founded: str
    events: list[HistoricalEvent]

completion = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Can you tell me about Japan's capital, year of being founded, and any major events?"},
    ],
    response_format=CountryInfo,
)

print(completion.choices[0].message.parsed)
    

### Function calling

Function calling, similar to structure outputs, constrains the LLM output. However, instead of just outputting a specific data schema, we can expose "functions" or "tools" to the LLM to use at its disposal.

In [None]:
import json

def calculate_order_total(order_items):
    """
    Given a list of order items (each with a name, quantity, and price_per_item),
    calculate the total cost and return a JSON string with details.
    """
    total = 0
    details = []
    for item in order_items:
        cost = item["quantity"] * item["price_per_item"]
        total += cost
        details.append({
            "name": item["name"],
            "quantity": item["quantity"],
            "price_per_item": item["price_per_item"],
            "item_total": cost
        })
    result = {
        "order_details": details,
        "order_total": total
    }
    return json.dumps(result)
  
# Test the function manually:
order_test = [
    {"name": "pizza", "quantity": 3, "price_per_item": 12},
    {"name": "soda", "quantity": 2, "price_per_item": 2}
]
print("Manual test output:")
print(calculate_order_total(order_test))


In [None]:
from pydantic import BaseModel, Field
from typing import List
import json

# Define a model for a single order item
class OrderItem(BaseModel):
    name: str = Field(..., description="Name of the item.")
    quantity: int = Field(..., description="Number of units ordered.")
    price_per_item: float = Field(..., description="Price for one unit of the item.")

# Define the overall order schema, which is a list of order items
class OrderSchema(BaseModel):
    order_items: List[OrderItem] = Field(..., description="List of order items.")

# Generate the JSON schema from the Pydantic model
order_schema_json = OrderSchema.model_json_schema()
print("Generated JSON Schema for Order:")
print(order_schema_json)


In [None]:
# Define our function schema using the generated JSON schema from Pydantic
order_function_schema = [
    {
        "type": "function",
        "function": {
            "name": "calculate_order_total",
            "description": "Calculate the total cost of an order given a list of order items.",
            # Use the Pydantic model's JSON schema for the function parameters.
            "parameters": OrderSchema.model_json_schema(),
        }
    }
]

# For demonstration, let's print out the final function schema
print("Function Schema for 'calculate_order_total':")
print(json.dumps(order_function_schema, indent=2))


In [None]:
# Example user message for ordering items
messages = [
    {"role": "system", "content": "You are an order processing assistant."},
    {"role": "user", "content": "I want to order 3 pizzas at $12 each and 2 sodas at $2 each."}
]

# (Assuming you've already set up your OpenAI client as shown earlier)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=order_function_schema,
    tool_choice="auto"
)

# The model should now generate a function call with arguments matching the schema.
response_message = response.choices[0].message
print("Model's Response Message:")
print(response_message)

In [None]:
## Check for function call and execute the Python function if present
if response_message.tool_calls:
    # Parse the JSON arguments from the function call
    try:
        function_args = json.loads(response_message.tool_calls[0].function.arguments)
    except json.JSONDecodeError as e:
        print("Error decoding JSON:", e)
        function_args = {}

    print("Parsed Function Arguments:")
    print(function_args)

    # Execute the function with the parsed arguments
    function_result = calculate_order_total(function_args.get("order_items", []))
    print("Function Execution Result:")
    print(function_result)

else:
    print("No function call was generated by the model.")


## 3. Retrieval-Augmented Generation (RAG)

In this section, we'll dive into Retrieval-Augmented Generation, a powerful technique that combines:
- **Data Retrieval:** Pulling in relevant external data.
- **Contextual Generation:** Feeding that data into an LLM to produce more accurate and context-aware responses.
- **Real-World Applications:** Enhancing responses in customer support, knowledge bases, and beyond.


In [None]:
# Import libraries
import os
import json
import requests
import numpy as np
import faiss
import openai
from dotenv import load_dotenv

# Load API keys from .env
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MERGE_API_KEY = os.getenv("MERGE_API_KEY")
MERGE_ACCOUNT_TOKEN = os.getenv("MERGE_ACCOUNT_TOKEN")
openai.api_key = OPENAI_API_KEY

client = openai.OpenAI()

print("Environment loaded. OpenAI and Merge API keys are set. OpenAI client created.")


In [None]:
def fetch_hr_data():
    # Replace the URL with the actual endpoint from Merge that returns HR data
    url = "https://api.merge.dev/api/hris/v1/dependents"  
    headers = {"Authorization": f"Bearer {MERGE_API_KEY}", "X-Account-Token": MERGE_ACCOUNT_TOKEN}
    response = requests.get(url, headers=headers)
    response.raise_for_status()  # Raises an error if the API call fails
    data = response.json()
    
    # Assume data contains a list of dependents records in the "results" key
    documents = []
    for item in data.get("results", []):
        # Build a simple text snippet for each employee record:
        doc = (
            f"Name: {item.get('first_name', '')} {item.get('last_name', '')}\n"
            f"Relationship: {item.get('relationship', '')}\n"
            f"Gender: {item.get('gender', '')}"
        )
        documents.append(doc)
    return documents

# Fetch and inspect the documents
documents = fetch_hr_data()
print(f"Fetched {len(documents)} documents from Merge API.")
print("Sample document:\n", documents[0] if documents else "No data")


In [None]:
def compute_embeddings(docs):
    embeddings = []
    for doc in docs:
        res = client.embeddings.create(
            input=doc,
            model="text-embedding-3-small"
        )
        embedding = res.data[0].embedding
        embeddings.append(embedding)
    return np.array(embeddings).astype('float32')

# Compute embeddings for our HR documents
embeddings = compute_embeddings(documents)
print("Computed embeddings shape:", embeddings.shape)


In [None]:
# Determine the dimensionality of our embeddings
dimension = embeddings.shape[1]
# Create a FAISS index (L2 distance based)
index = faiss.IndexFlatL2(dimension)
# Add our embeddings to the index
index.add(embeddings)
print("FAISS index built with", index.ntotal, "documents.")


In [None]:
def query_index(query, k=3):
    # Compute the query's embedding
    res = client.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    )
    query_embedding = np.array(res.data[0].embedding, dtype='float32')
    # Perform the search in the FAISS index
    distances, indices = index.search(np.array([query_embedding]), k)
    return indices[0]

# Example query
query = "Who are the sons named Mark?"
top_indices = query_index(query, k=12)
print("Top matching document indices:", top_indices)

# Display the top matching documents
print("\nTop matching documents:")
for idx in top_indices:
    print("\n--- Document ---")
    print(documents[idx])


In [None]:
# Retrieve the documents corresponding to the top indices
retrieved_docs = [documents[i] for i in top_indices]

# Construct the prompt
prompt = (
    "You are an HR assistant. Use the following HR documents to answer the question.\n\n"
    f"HR Documents:\n{chr(10).join(retrieved_docs)}\n\n"
    f"Question: {query}\n\nAnswer:"
)

print("Augmented prompt for the LLM:\n")
print(prompt)


In [None]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "developer", "content": prompt}
    ],
)
answer = completion.choices[0].message.content.strip()
print("Generated Answer:\n", answer)


In [None]:
query = "Who are the wives?"
top_indices = query_index(query, k=12)
retrieved_docs = [documents[i] for i in top_indices]
prompt = (
    "You are an HR assistant. Use the following HR documents to answer the question.\n\n"
    f"HR Documents:\n{chr(10).join(retrieved_docs)}\n\n"
    f"Question: {query}\n\nAnswer:"
)
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "developer", "content": prompt}
    ],
)
answer = completion.choices[0].message.content.strip()
print("Generated Answer:\n", answer)

# Agentic Behavior

This section explores the concept of agentic behavior in LLMs. We will cover:
- **Defining Agentic Behavior:** Understanding how LLMs can take initiative or simulate decision-making.
- **Applications:** Setting up LLMs to act autonomously in workflows, such as proactive task management or dynamic response generation.
- **Considerations:** Balancing autonomy with control and ensuring safety in agent-driven systems.


In [3]:
# Import required libraries
import os
import openai
from dotenv import load_dotenv

# Load environment variables from .env (make sure you have OPENAI_API_KEY set)
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    raise ValueError("Please set your OPENAI_API_KEY in a .env file")
openai.api_key = OPENAI_API_KEY
client = openai.OpenAI()

print("Environment loaded. Ready to use the OpenAI API!")


Environment loaded. Ready to use the OpenAI API!


In [5]:
def agent_decide(current_state, goal):
    """
    Given a current state and a goal, ask the LLM to decide on the next action.
    The LLM should output a decision in the format:
    
        Action: <action>
        Reasoning: <reasoning>
    
    """
    prompt = f"""
You are an autonomous agent with the ability to take initiative.
Your goal is: "{goal}"

Your current state is: "{current_state}"

Please decide on the best next action to achieve the goal, and explain your reasoning.
Respond in the following format:

Action: <action>
Reasoning: <your reasoning>

Make sure to be clear and concise.
"""
    response = client.chat.completions.create(
        model="gpt-4o",  # or any model that suits your demo
        messages=[
            {"role": "system", "content": prompt},
        ],
        max_tokens=150,
        temperature=0.7,  # A bit of randomness to simulate initiative
        top_p=1.0,
        n=1,
        stop=None,
    )
    
    decision_text = response.choices[0].message.content.strip()
    return decision_text

# Example usage:
goal = "Plan a team meeting for the upcoming project kickoff."
current_state = "Current time: 9:00 AM. No meeting scheduled yet. Team availability is unknown."

print("Agent Decision Example:\n")
print(agent_decide(current_state, goal))


Agent Decision Example:

Action: Send an email to the team requesting their availability for the upcoming week.

Reasoning: Before scheduling a meeting, it is important to know when the team members are available. Sending an email will allow me to gather their availability efficiently and ensure that the meeting time accommodates everyone's schedule. This is a necessary step to ensure maximum participation and effectiveness of the kickoff meeting.


In [8]:
# Define the goal for the agent
goal = "Plan a team meeting for the upcoming project kickoff."

# Initialize the state
state = "Current time: 9:00 AM. No meeting scheduled yet. Team availability is unknown."

print("Starting Agent Simulation...\n")
print(f"Initial State: {state}\n")

# Run a loop to simulate a few decision steps
num_steps = 3  # Number of decision steps for the demo
for step in range(1, num_steps + 1):
    print(f"--- Step {step} ---")
    decision = agent_decide(state, goal)
    print(f"{decision}\n")
    
    # For demonstration, update the state by appending the decision (a simplistic update)
    state += " | " + decision
    print(f"Updated State: {state}\n")


Starting Agent Simulation...

Initial State: Current time: 9:00 AM. No meeting scheduled yet. Team availability is unknown.

--- Step 1 ---
Action: Send an email to team members requesting their availability for a meeting next week.

Reasoning: Before scheduling a meeting, it's important to know when the team members are available. Sending an email allows me to gather this information efficiently, ensuring that the meeting is scheduled at a time when everyone can attend. This is a crucial step towards organizing a successful project kickoff meeting.

Updated State: Current time: 9:00 AM. No meeting scheduled yet. Team availability is unknown. | Action: Send an email to team members requesting their availability for a meeting next week.

Reasoning: Before scheduling a meeting, it's important to know when the team members are available. Sending an email allows me to gather this information efficiently, ensuring that the meeting is scheduled at a time when everyone can attend. This is a c