1. Chat Model Initialization
The OpenAI chat model (ChatOpenAI) is instantiated.
Environment variables are loaded to securely pass the API key.
Temperature settings can be configured to adjust response creativity.

In [2]:
import datetime
import inspect
import json
from typing import (
    List,
    Dict,
    Literal,
    Callable,
    Any,
    get_type_hints
)


from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from langchain_core.prompts import PromptTemplate, FewShotPromptTemplate

from openai import OpenAI
client = OpenAI()  # reads OPENAI_API_KEY from env

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.0,  # no api_key arg needed if env var is set, Env var name must be exact: OPENAI_API_KEY (uppercase). If you set openai_api_key anywhere, rename it.
)

In [3]:
# test connection
llm.invoke("Hello there")

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 9, 'total_tokens': 18, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CNgxwciKIaLbx7JmFStesR7a3610m', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--b1a357ed-fda9-4ef9-bd0f-02ae42f1475b-0', usage_metadata={'input_tokens': 9, 'output_tokens': 9, 'total_tokens': 18, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

2. Message Structuring
LangChain supports structured message inputs including:
SystemMessage: Sets the assistant’s behavior.
HumanMessage: Represents user input.
AIMessage: Includes prior assistant responses.
Including previous assistant responses helps with few-shot learning, as it teaches the LLM how to respond in context.

In [None]:
messages = [ 
    SystemMessage("You are a geography tutor"),
    HumanMessage("What's the capital of Brazil?")
]

In [None]:
llm.invoke(messages)

In [None]:
# give it more shot
messages = [ 
    SystemMessage("You are a geography tutor"),
    HumanMessage("What's the capital of Brazil?"),
    AIMessage("The capital of Brazil is Brasília"),
    HumanMessage("What's the capital of Canada?"),
]

In [None]:
llm.invoke(messages)

3. Prompt Templates
LangChain offers two styles of templated prompting:

a. Basic PromptTemplate
Defines a single-variable input prompt with a placeholder.
  prompt = PromptTemplate(
    input_variables=["topic"],
    template="Tell me a joke about {topic}"
  )
The prompt can be formatted using .format() or .invoke():
  formatted_prompt = prompt.format(topic="Python")
  llm.invoke(prompt.invoke({"topic": "Python"}))

In [None]:
#pure python prompt: 01
topic = "Python"
prompt = f"Tell me a joke about {topic}"
llm.invoke(prompt)

In [None]:
# or this 
prompt = "Tell me a joke about {topic}"
llm.invoke(prompt.format(topic = "Python"))

Prompt templates

In [4]:
prompt_template = PromptTemplate(
    template="Tell me a joke about {topic}"
)

In [6]:
# put it into an invoke;
llm.invoke(
    prompt_template.invoke({"topic":"Python"})
)

AIMessage(content='Why do Python programmers prefer dark mode?\n\nBecause light attracts bugs!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 13, 'total_tokens': 26, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CNh51ZVaeDShv46aWRFnBOrFj7bpO', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--2a0a58d5-9bc3-43f3-a6a1-4023e460dc7f-0', usage_metadata={'input_tokens': 13, 'output_tokens': 13, 'total_tokens': 26, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

b. Few-Shot PromptTemplate
Combines multiple structured examples to guide the LLM’s reasoning.
Components:
examples: List of example dictionaries (input → thought → output).
example_prompt: A PromptTemplate defining the format for each example.
suffix: The actual question for the current prompt.

examples = [
  {"input": "A train leaves City A...", "thought": "...", "output": "2 hours"},
  {"input": "A store applies a 20% discount...", "thought": "...", "output": "..."},
  ...
]

example_prompt = PromptTemplate(
  input_variables=["input", "thought", "output"],
  template="Question: {input}\nThought: {thought}\nResponse: {output}"
)

few_shot_prompt = FewShotPromptTemplate(
  examples=examples,
  example_prompt=example_prompt,
  suffix="Question: {input}",
  input_variables=["input"]
)

llm.invoke(few_shot_prompt.invoke({"input": "If today is Wednesday, what day will it be in 10 days?"}))
The result shows the model following the reasoning steps provided in the examples and outputting: "Saturday."

In [7]:
# more shot!
example_prompt = PromptTemplate(
    template="Question: {input}\nThought: {thought}\nResponse: {output}"
)

examples = [
    {
        "input": "A train leaves city A for city B at 60 km/h, and another train leaves city B for city A at 40 km/h. If the distance between them is 200 km, how long until they meet?", 
        "thought": "The trains are moving towards each other, so their relative speed is 60 + 40 = 100 km/h. The time to meet is distance divided by relative speed: 200 / 100 = 2 hours.",
        "output": "2 hours",
    },
    {
        "input": "If a store applies a 20% discount to a $50 item, what is the final price?", 
        "thought": "A 20% discount means multiplying by 0.8. So, $50 × 0.8 = $40.",
        "output": "$40",
    },
    {
        "input": "A farmer has chickens and cows. If there are 10 heads and 32 legs, how many of each animal are there?", 
        "thought": "Let x be chickens and y be cows. We have two equations: x + y = 10 (heads) and 2x + 4y = 32 (legs). Solving: x + y = 10 → x = 10 - y. Substituting: 2(10 - y) + 4y = 32 → 20 - 2y + 4y = 32 → 2y = 12 → y = 6, so x = 4.",
        "output": "4 chickens, 6 cows",
    },
    {
        "input": "If a car travels 90 km in 1.5 hours, what is its average speed?", 
        "thought": "Speed is distance divided by time: 90 km / 1.5 hours = 60 km/h.",
        "output": "60 km/h",
    },
    {
        "input": "John is twice as old as Alice. In 5 years, their combined age will be 35. How old is Alice now?", 
        "thought": "Let Alice's age be x. Then John’s age is 2x. In 5 years, their ages will be x+5 and 2x+5. Their sum is 35: x+5 + 2x+5 = 35 → 3x + 10 = 35 → 3x = 25 → x = 8.33.",
        "output": "8.33 years old",
    },
]

In [8]:
prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=["input"],
)

In [9]:
response = llm.invoke(
    prompt_template.invoke({"input":"If today is Wednesday, what day will it be in 10 days?"})
)
print(response.content) # Response should be Saturday

Thought: To find the day of the week in 10 days, we can calculate the remainder of 10 divided by 7 (the number of days in a week). 10 ÷ 7 = 1 remainder 3. This means 10 days from Wednesday is 3 days later. Counting from Wednesday: Thursday (1), Friday (2), Saturday (3). 

Response: Saturday


4. Takeaways
Prompt engineering in LangChain is modular and powerful.
Structured messages and few-shot examples significantly improve LLM response quality.
PromptTemplates enable consistent formatting and reusability.

Summary: Chatbot Application with LangChain
Overview
This exercise walks through the creation of a simple chatbot using LangChain. The focus is on structuring the chatbot class to support memory, managing message formatting via LangChain message objects, and using prompt templates to enable contextual, character-driven interactions.

Key Steps Covered
1. Setup and Imports
Required libraries are imported, including LangChain modules.
Environment variables are loaded to securely manage the OpenAI API key.
2. Chatbot Class Construction
A chatbot class is created with an internal memory (self.messages), which stores a list of messages exchanged during the conversation.
The OpenAI chat model is instantiated using LangChain's ChatOpenAI, without explicitly passing the API key thanks to environment configuration.
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, AIMessage

self.llm = ChatOpenAI()
self.messages = []
3. Implementing the invoke Method
The method accepts user input, wraps it in a HumanMessage, appends it to the message list, and invokes the model with the full conversation history.
The model's response is wrapped in an AIMessage and also appended to memory.
The AI response is returned at the end.
def invoke(self, user_input):
  human_msg = HumanMessage(content=user_input)
  self.messages.append(human_msg)

  ai_msg = self.llm(self.messages)
  self.messages.append(ai_msg)

  return ai_msg.content
4. Prompt Template and Message Formatting
The prompt uses LangChain’s PromptTemplate with a structure that includes system instructions, human input, and AI output. This template manages consistent formatting for few-shot examples.

Sample message flow includes:

System message defining personality and role.
Human messages as input.
AI messages as example responses.
5. Creating and Interacting with the Chatbot
A bot named BEEP-42 is created, initialized with humorous and thematic system instructions.

A series of example interactions are seeded, such as:

"Hello, what is 2+2?"
"Can you dream?"
"Why did the robot go to therapy?"
Memory is inspected to confirm the full conversation history is stored correctly with role-based message types.

6. Sample Invocation
The bot is queried with a playful sequence of questions:
"HAL, is that you?" → Responds it's not HAL.
"RedQueen from The Terminator?" → Clarifies different protocols.
"Wall-E?" → Confirms it's not Wall-E.
"What's the answer for every question?" → Replies: "Answer 42."
7. Next Steps
Learners are encouraged to experiment further by:
Adjusting the temperature to influence creativity.
Modifying the bot’s personality.
Expanding the set of few-shot examples for better grounding.

Streaming in Generative AI Applications
Streaming enables faster and smoother user experiences in entertainment platforms like Spotify and Netflix, where content plays immediately while additional data is loaded in the background. The same concept applies to Generative AI applications, ensuring low-latency, real-time interactions.

Why Streaming Matters in AI Applications
• Without streaming, users must wait for the full response to generate, causing delays.

• With streaming, output is displayed progressively, reducing perceived latency and improving responsiveness.

• Example: ChatGPT streams text word by word, making interactions feel fluid and natural.

Streaming in LangChain
LangChain provides built-in streaming support through the Runnable Interface, allowing developers to process responses as they are generated.

• stream() – Synchronous streaming, suitable for real-time processing.

• astream() – Asynchronous streaming, designed for non-blocking workflows.

Using stream() for Real-Time Processing -
        for chunk in component.stream(some_input):
                print(chunk)  # Processes each chunk as it's produced
• Enhances chat applications by displaying responses progressively.

• Allows interruption if the user no longer needs the full response.

• Requires efficient processing to avoid delays between chunks.

Using astream() for Asynchronous Streaming
Works similarly but is optimized for async applications, ensuring smooth, non-blocking execution.

Final Thoughts
Streaming significantly improves user experience by making LLM applications more responsive. Whether building chatbots, virtual assistants, or interactive AI tools, streaming ensures seamless real-time interactions.

Summary: Streaming Responses in LangChain
Overview
This demo introduces streaming capabilities in LangChain. Instead of waiting for the full response from the model, the output is streamed token-by-token or chunk-by-chunk. This approach enables real-time feedback, partial result handling, and dynamic response processing.

Key Steps Covered
1. Initial Setup
Necessary libraries are imported and environment variables loaded.
The OpenAI chat model (ChatOpenAI) is instantiated as the LLM.

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.7)

Standard use with .invoke() is demonstrated:
Sends the full prompt and waits for the complete response.
3. Streaming Basics
Streaming is enabled by using .stream() instead of .invoke().
Responses are received incrementally as chunks.

chunks = []
for chunk in llm.stream("What does FIFA stand for?"):
  chunks.append(chunk)
  print(chunk.content, end="")

Output begins immediately instead of waiting for full completion.
Each chunk is an AIMessageChunk with content and additional metadata.
3. Working with Chunks
Chunks can be processed individually:
Slicing chunks (e.g., first 5 chunks) to form partial outputs.
Concatenating chunks to recreate the full final output.

complete_output = "".join(chunk.content for chunk in chunks)

4. Handling Interruptions
It is possible to interrupt a streaming response.
A KeyboardInterrupt is caught to gracefully stop streaming.
If interruption handling is disabled, the raw exception is displayed.

try:
  for chunk in llm.stream("Question..."):
      print(chunk.content, end="")
except KeyboardInterrupt:
  print("Interrupted!")

5. Resuming After Interruptions
A simple play() and resume() mechanism is demonstrated:
play() appends streamed chunks to memory.
resume() prompts the model to complete a previously interrupted response.

def play():
  # Streams response and stores in memory

def resume():
  # Resumes based on memory if output seems incomplete

If the model believes the prior output is unfinished, it continues the answer.
6. On-the-Fly Processing
Words are counted dynamically during streaming.
Each new word token can trigger updates or calculations.

word_count = 0
for chunk in llm.stream("Prompt..."):
  word_count += len(chunk.content.split())

This shows how streaming enables real-time processing and metric calculation.
7. Event Handling
Events are emitted during streaming:
on_chat_model_start
on_chat_model_stream
on_chat_model_end
Listeners can be attached to these events to trigger additional actions.
Example:
After on_chat_model_end, a different process could be initiated.
8. Using Streaming in a Chatbot
The BEEP-42 chatbot is re-created with streaming enabled.
It now outputs text progressively during a conversation, creating a more interactive user experience.

bot = Chatbot(name="BEEP-42", instructions="...", examples=[...])
response = bot.ask("Tell me a joke.")

9. Conclusion
Streaming offers faster, more interactive, and more flexible user experiences.
Processing data as it arrives enables more sophisticated applications like real-time dashboards, live feedback systems, or conversational agents.

Summary: Structured Output Parsing with LangChain
Overview
This demo focuses on how to parse and structure outputs from LLMs using LangChain's output parsers, including strategies for handling structured text like dictionaries, booleans, datetimes, and using Pydantic models for robustness. It also covers how to fix parsing errors automatically when the LLM output is misformatted.

Key Steps Covered
1. Basic String Parsing
By default, calling .invoke() on an LLM returns an AIMessage object.
To extract the raw text, access ai_message.content.

response = llm.invoke("Hello there.")
raw_text = response.content

Alternatively, a StrOutputParser can be used to transform the output cleanly.

parser = StrOutputParser()
text = parser.invoke(response)

3. Datetime Parsing
A DatetimeOutputParser is used when you need to convert LLM output into a Python datetime object.
The LLM is prompted to produce a date in a specific format.

parser = DatetimeOutputParser()
datetime_obj = parser.invoke(response)

4. Boolean Parsing
A BooleanOutputParser converts "yes" or "no" responses into Python True or False.
Example:
Content: "yes" → True
Content: "no" → False

parser = BooleanOutputParser()
result = parser.invoke(AIMessage(content="yes"))

5. TypedDict Parsing
LangChain supports using TypedDict to define the structure of expected output.

class UserInfo(TypedDict):
  name: str
  country: str

Using with_structured_output(UserInfo), the model is guided to format its response accordingly.
Examples:

Input: "My name is Henrique and I am from Brazil." → { "name": "Henrique", "country": "Brazil" }

If no relevant info is found, defaults are used.
6. Pydantic Parsing
For more robust parsing and validation, Pydantic models are used.

class UserInfo(BaseModel):
  name: str
  country: str

Pydantic models provide automatic type checking and better error handling.

parsed = llm.with_structured_output(UserInfo).invoke("My name is Washington and I am from Australia.")

If the LLM output is properly structured, parsing succeeds.
If missing information, fields default to empty strings or None, based on model configuration.
7. Parsing Complex Structures
A more complex example is parsing a list of films (filmography) for an actor using a Pydantic model.

class Performer(BaseModel):
  name: str
  film_names: List[str]

Asking for "Scarlett Johansson filmography" returns the correct structured object with movie names.
8. Handling Parsing Errors
Sometimes the LLM outputs poorly formatted JSON or semi-structured text.
If parsing fails (e.g., bad quotes, wrong format), an OutputParserException is raised.

try:
  parser.invoke(bad_output)
except OutputParserException as e:
  print("Parsing error caught!")

9. Fixing Misformatted Outputs Automatically
LangChain provides an OutputFixingParser.
This parser:
Detects format errors.
Attempts to reformat the output using the LLM itself.

fixing_parser = OutputFixingParser.from_llm(parser, llm)
corrected_output = fixing_parser.invoke(misformatted_output)

This enables parsing even from imperfect LLM outputs, making workflows much more reliable.
10. Conclusion
Structured output parsing transforms unstructured LLM responses into reliable Python objects.
TypedDicts and Pydantic models improve structure and validation.
Parsers combined with automatic fixing allow workflows to handle imperfect LLM behavior gracefully.

The Evolution from Chains to Runnables in LangChain
LangChain originally introduced Chains, which allowed developers to build sequential workflows by passing outputs from one step as inputs to the next. Over time, these legacy Chain classes have been deprecated in favor of more flexible and powerful approaches:

LCEL (LangChain Expression Language) – A declarative way to compose AI workflows.
LangGraph – A framework for agentic workflows with complex state management.
Runnables: The New Standard
The Runnable interface is now the core building block of LangChain. It standardizes how components—such as LLMs, output parsers, retrievers, and agent workflows—are executed and composed.

What Can Runnables Do?

Invoke – Process a single input into an output.
Batch – Handle multiple inputs at once.
Stream – Output data in chunks for real-time processing.
Inspect – Access input, output, and configuration details.
Compose – Chain multiple Runnables together for complex workflows.
Example of invoking a Runnable with custom configuration:

some_runnable.invoke(
        some_input, 
        config={
            'run_name': 'my_run', 
            'tags': ['tag1', 'tag2'], 
            'metadata': {'key': 'value'}   
        }
)
LCEL: The Declarative Approach to Chains
LCEL (LangChain Expression Language) enables composing Runnables efficiently using a syntax similar to Linux pipes:

chain = prompt | llm | output_parser
Instead of manually managing execution, LCEL automatically optimizes the workflow, making it easier to build scalable AI applications.

Final Thoughts
The shift from legacy Chains to Runnables and LCEL provides greater flexibility, efficiency, and composability. Developers can now build complex AI pipelines with less boilerplate code, focusing on defining workflows rather than managing execution.

Summary: Chaining Executions and LCEL (LangChain Expression Language)
Overview
This demo explores chaining multiple LangChain components into structured workflows, culminating in the use of LCEL (LangChain Expression Language) for more concise and flexible chain creation. Learners see how to manually connect prompt templates, LLMs, and parsers into execution sequences and how to inspect and customize these executions.

Key Steps Covered
1. Foundations: Recap of Key Objects
Core components revisited:
PromptTemplate: Formats input.
ChatOpenAI (LLM): Generates AI messages.
StrOutputParser: Converts AI messages into strings.
Manual chaining:
Prompt is filled with input (e.g., topic: Python).
LLM generates a joke about the topic.
Parser extracts the response as a string.
prompt.invoke({"topic": "Python"})
llm.invoke(prompt_output)
parser.invoke(llm_output)
2. Understanding Runnables
Each component is a runnable, meaning it supports:
invoke()
batch()
stream()
Runnables also provide introspection:
input_schema, output_schema, and config_schema for validation and structure understanding.
print(runnable.input_schema)
print(runnable.output_schema)
Configuration (config) can be passed during invocation to set metadata like run names and tags.
llm.invoke(input, config={"run_name": "demo_run", "tags": ["demo", "lcel"]})
3. Building Chains Manually
A RunnableSequence is introduced to combine multiple runnables.
Outputs are automatically passed to the next runnable.
Example:
Prompt → LLM → Parser, wrapped as a single chain.
chain = RunnableSequence(first=prompt, middle=llm, last=parser)
result = chain.invoke({"topic": "Python"})
Batch execution is supported: multiple topics can be processed at once.
results = chain.batch([{"topic": "Python"}, {"topic": "Football"}])
Diagrams show the step-by-step data flow inside chains.
4. Advanced Chain Construction
Custom Functions as Runnables:
Simple Python functions (like doubling or tripling numbers) are wrapped in runnable form.
double = RunnableLambda(lambda x: x * 2)
triple = RunnableLambda(lambda x: x * 3)
Parallel Execution:
RunnableParallel runs multiple runnables simultaneously on the same input.
parallel_chain = RunnableParallel(double=double, triple=triple)
result = parallel_chain.invoke(3)
# Output: {"double": 6, "triple": 9}
5. Introduction to LCEL (LangChain Expression Language)
LCEL introduces a pipe (|) syntax to build chains more concisely.
Same chain as before, but constructed with just:
chain = prompt | llm | parser
This is functionally identical to manually creating a RunnableSequence.
LCEL enhances readability and composability of chains.
6. Summary of Features
Single and batched invocation.
Streaming support built into runnables.
Chain visualization through diagrams.
Parallel execution for more complex workflows.
LCEL for clean, expressive pipeline construction.
7. Conclusion
LangChain’s chaining system is flexible and composable.
LCEL simplifies construction and visualization of multi-step pipelines.
Streaming, parallelism, and structured outputs open the door for building robust AI-driven systems.