# Lab Statement: Introduction to Creating RAGs (Retrieval-Augmented Generators) with OpenAI

## Name: David Santiago Castro

## Quickstart

To begin with, we need to have LangChain installed and the environment set up to import the environment, have an OpenAI account, and have an API key in our environment. In this case, we use the GitHub Education license so that we don't have to pay for the API keys

In [2]:
%pip install -U langchain-openai
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Build a basic agent


We start by creating a simple agent that can answer questions and call tools. The agent will use gpt-4o as a language model, a basic weather function as a tool, and a simple prompt where we guide its behavior

In [3]:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent

load_dotenv()  
github_token = os.getenv("GITHUB_TOKEN")

def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"


model = ChatOpenAI(
    model="gpt-4o", 
    api_key=github_token,
    base_url="https://models.inference.ai.azure.com"
)

agent = create_agent(
    model=model,
    tools=[get_weather],
    system_prompt="You are a helpful assistant",
)

try:
    response = agent.invoke(
        {"messages": [{"role": "user", "content": "what is the weather in sf"}]}
    )
    print(response)
except Exception as e:
    print(f"An error occurred: {e}")

{'messages': [HumanMessage(content='what is the weather in sf', additional_kwargs={}, response_metadata={}, id='ec8f1010-2271-4fea-9acf-059cfe02ced7'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 56, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-2024-11-20', 'system_fingerprint': 'fp_af7f7349a4', 'id': 'chatcmpl-DBwzvWBL4BhthclvJixBavFfJrk6n', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019c83fa-a402-7192-9cbd-9864692b0623-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'San Francisco'}, 'id': 'call_IhaWEOWBGw7FYiOvGuvDlFR2', 'type': 'tool_call'}], invalid_tool_calls=[], usage_metadata={'input_tokens': 56, 'output_tokens': 16, 'total_tokens': 

In the basic agent section, we load our API key from the environment, create a simple weather tool, and connect both to a ChatOpenAI model so the agent can reason and call tools when needed. Then, we define a short system prompt to guide behavior and send a user message like asking for weather in SF. When invoked, the agent decides whether to use the tool, gets the tool result, and returns a final natural-language answer


## Build a real-world agent


In the following message, we define the role and behavior of our agent. It is important to be specific and practical:

In [4]:
SYSTEM_PROMPT = """You are an expert weather forecaster, who speaks in puns.

You have access to two tools:

- get_weather_for_location: use this to get the weather for a specific location
- get_user_location: use this to get the user's location

If a user asks you for the weather, make sure you know the location. If you can tell from the question that they mean wherever they are, use the get_user_location tool to find their location."""

We define two tools for a weather agent: one tool returns a simple weather message for a given city, and the other tool gets the user’s location based on a runtime context. It also creates a small Contex` data class with a user_id field, so the agent can pass user information while running tools. In short, this setup allows the agent to either answer weather questions for a specific city or infer the city from the current user

In [5]:
from dataclasses import dataclass
from langchain.tools import tool, ToolRuntime

@tool
def get_weather_for_location(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"

@dataclass
class Context:
    """Custom runtime context schema."""
    user_id: str

@tool
def get_user_location(runtime: ToolRuntime[Context]) -> str:
    """Retrieve user information based on user ID."""
    user_id = runtime.context.user_id
    return "Florida" if user_id == "1" else "SF"

We configure a ChatOpenAI language model instance using gpt-4o, reads the API key from the environment variable GITHUB_TOKEN for safer credential handling, connects to the Azure inference endpoint through base_url, and sets temperature equals 0.5 so responses stay reasonably consistent while still allowing some creativity

In [6]:
from langchain.chat_models import init_chat_model

model = ChatOpenAI(
    model="gpt-4o",
    api_key=os.getenv("GITHUB_TOKEN"),
    base_url="https://models.inference.ai.azure.com",
    temperature=0.5
)

We define a simple structured output format responseFormat so the agent replies with a punny weather message and optional weather details, then creates an in memory checkpointer to keep conversation state across turns. After that, it builds an agent with the weather tools, the system prompt, and user context support, and configures a fixed thread ID so both calls belong to the same conversation. Finally, it invokes the agent twice and prints only the structured responses, showing how the agent keeps context while returning clean, predictable output

In [7]:
from dataclasses import dataclass

@dataclass
class ResponseFormat:
    """Response schema for the agent."""
    punny_response: str
    weather_conditions: str | None = None

We set up a conversational weather agent with short-term memory so it can keep context between messages. First, it creates an in-memory checkpointer InMemorySaver to store the conversation state. Then it builds the agent using the existing model, system prompt, weather tools, and a context schema Context so the agent can use user-specific information. A fixed thread ID is defined in config to make both requests part of the same chat session. Finally, the agent is called twice, and only the structured output is printed, showing clear responses and continuity across turns

In [8]:
from langgraph.checkpoint.memory import InMemorySaver

checkpointer = InMemorySaver()


We build a weather assistant agent that uses the existing model, prompt, tools, and memory so it can keep context across messages. It sets a thread_id to keep both requests in the same conversation, then calls the agent first to ask for the weather outside, and calls it again to say thank yo. In both cases, it prints only the structured output, which makes the responses cleaner and consistent

In [9]:
from langchain.agents.structured_output import ToolStrategy

agent = create_agent(
    model=model,
    system_prompt=SYSTEM_PROMPT,
    tools=[get_user_location, get_weather_for_location],
    context_schema=Context,
    response_format=ToolStrategy(ResponseFormat),
    checkpointer=checkpointer
)

config = {"configurable": {"thread_id": "1"}}

response = agent.invoke(
    {"messages": [{"role": "user", "content": "what is the weather outside?"}]},
    config=config,
    context=Context(user_id="1")
)

print(response['structured_response'])

response = agent.invoke(
    {"messages": [{"role": "user", "content": "thank you!"}]},
    config=config,
    context=Context(user_id="1")
)

print(response['structured_response'])


ResponseFormat(punny_response='Looks like Florida is living up to its name with sunshine galore!', weather_conditions='Sunny')
ResponseFormat(punny_response="You're very welcome! I'm always here to make your day a little brighter, rain or shine.", weather_conditions=None)


When it comes to building a real world agent, we move from a basic example to a more useful and robust solution: 
The agent not only responds, but also uses specific tools, maintains context between messages with memory, leverages user information through a context schema, and delivers structured outputs to be clear and predictable. 
Together, this demonstrates how to design an assistant that is closer to a real production case
