# Chapter 1: Prompt Chaining (Hands-On Code Example)

> Adapted and modified from https://docs.google.com/document/d/1flxKGrbnF2g8yh3F-oVD5Xx7ZumId56HbFpIiPdkqLI/edit?tab=t.0 
> 
> Fr  5 Sep 2025 10:41:12 CEST

Implementing prompt chaining ranges from direct, sequential function calls within a script to the utilization of specialized frameworks designed to manage control flow, state, and component integration. Frameworks such as LangChain, LangGraph, Crew AI, and the Google Agent Development Kit (ADK) offer structured environments for constructing and executing these multi-step processes, which is particularly advantageous for complex architectures.

For the purpose of demonstration, LangChain and LangGraph are suitable choices as their core APIs are explicitly designed for composing chains and graphs of operations. LangChain provides foundational abstractions for linear sequences, while LangGraph extends these capabilities to support stateful and cyclical computations, which are necessary for implementing more sophisticated agentic behaviors. This example will focus on a fundamental linear sequence.

The following code implements a two-step prompt chain that functions as a data processing pipeline. The initial stage is designed to parse unstructured text and extract specific information. The subsequent stage then receives this extracted output and transforms it into a structured data format.

To replicate this procedure, the required libraries must first be installed. This can be accomplished using the following command: 

```bash
pip install langchain langchain-community langchain-openai langgraph
```

Note that langchain-openai can be substituted with the appropriate package for a different model provider. Subsequently, the execution environment must be configured with the necessary API credentials for the selected language model provider, such as OpenAI, Google Gemini, or Anthropic.

In [1]:
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# For better security, load environment variables from a .env file
from dotenv import load_dotenv
load_dotenv()
# Make sure your OPENAI_API_KEY is set in the .env file

True

In [2]:
# Initialize the Language Model (using ChatOpenAI is recommended)
llm = ChatOpenAI(temperature=0)

In [3]:
# --- Prompt 1: Extract Information ---
prompt_extract = ChatPromptTemplate.from_template(
   "Extract the technical specifications from the following text:\n\n{text_input}"
)

In [4]:
# --- Prompt 2: Transform to JSON ---
prompt_transform = ChatPromptTemplate.from_template(
   "Transform the following specifications into a JSON object with 'cpu', 'memory', and 'storage' as keys:\n\n{specifications}"
)

In [5]:
# --- Build the Chain using LCEL ---
# The StrOutputParser() converts the LLM's message output to a simple string.
extraction_chain = prompt_extract | llm | StrOutputParser()

In [13]:
# The full chain passes the output of the extraction chain into the 'specifications'
# variable for the transformation prompt.
full_chain = (
   {"specifications": extraction_chain}
   | prompt_transform
   | llm
   | StrOutputParser()
)

In [7]:
# --- Run the Chain ---
input_text = "The new laptop model features a 3.5 GHz octa-core processor, 16GB of RAM, and a 1TB NVMe SSD."

In [8]:
# Execute the chain with the input text dictionary.
final_result = full_chain.invoke({"text_input": input_text})

In [9]:
print("\n--- Final JSON Output ---")
print(final_result)


--- Final JSON Output ---
{
    "cpu": {
        "processor": "3.5 GHz octa-core"
    },
    "memory": {
        "RAM": "16GB"
    },
    "storage": {
        "Storage": "1TB NVMe SSD"
    }
}


This Python code demonstrates how to use the LangChain library to process text. It utilizes two separate prompts: one to extract technical specifications from an input string and another to format these specifications into a JSON object. The ChatOpenAI model is employed for language model interactions, and the StrOutputParser ensures the output is in a usable string format. The LangChain Expression Language (LCEL) is used to elegantly chain these prompts and the language model together. The first chain, extraction_chain, extracts the specifications. The full_chain then takes the output of the extraction and uses it as input for the transformation prompt. A sample input text describing a laptop is provided. The full_chain is invoked with this text, processing it through both steps. The final result, a JSON string containing the extracted and formatted specifications, is then printed.

## Experiments 

Above, only an `extraction_chain` was built and used in the `full_chain`. 

Here, we try to additionally build and use a `transformation_chain`: 

In [69]:
# --- Build the Chain using LCEL ---
# The StrOutputParser() converts the LLM's message output to a simple string.
transformation_chain = prompt_transform | llm | StrOutputParser()

# The full chain passes the output of the extraction chain into the 'specifications'
# variable for the transformation prompt.
full_chain = (
   {"specifications": extraction_chain}
   | transformation_chain
)

# Execute the chain with the input text dictionary.
final_result = full_chain.invoke({"text_input": input_text})
print("\n--- Final JSON Output ---")
print(final_result)


--- Final JSON Output ---
{
    "cpu": "",
    "memory": "",
    "storage": ""
}


```
--- Final JSON Output ---
{
    "cpu": {
        "processor": "3.5 GHz octa-core"
    },
    "memory": {
        "RAM": "16GB"
    },
    "storage": {
        "Storage": "1TB NVMe SSD"
    }
}
```

This works.

Further, we test if we can use something like `{"text_input": input_text}` at the beginning of the `full_chain`: 

In [70]:
# The full chain passes the output of the extraction chain into the 'specifications'
# variable for the transformation prompt.
full_chain = (
   {"text_input": input_text}
   | {"specifications": extraction_chain}
   | transformation_chain
)

TypeError: Expected a Runnable, callable or dict.Instead got an unsupported type: <class 'NoneType'>

The error tells us that the chain cannot handle simple strings but only "Runnable, callable or dict". 

Let's try, if we can scrape some text from the web and use it with our prompt chain: 

In [None]:
# Install selenium and webdriver-manager if you haven't already (works with selenium==4.35.0 and webdriver-manager==4.0.2)
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time

url = "https://www.amazon.co.uk/Lenovo-IdeaPad-Laptop-i5-12450H-Windows/dp/B0D7MLLWJ8/ref=sr_1_1?dib=eyJ2IjoiMSJ9.PbYBW-5-7rUuuHRHA3lYyBas94074aaRbm9FhL9mrAFq-dLy_dYZ0zpFJiOa82rLiD6dzx0-eN0sIrf8JULowVjUG0hkP1zKVEo0g1j-lbjCmYtsKlfhW5fohsnG4X__qFiDk_JtVz0_Hb8BWgVx0cq1lxXCmAGupUrwFUFzcPvA0qZFr-RHY66nBgsI28HGVcFKiCPevuLVQ3hOrcME15871kpnEDpU4JAxETlF0TDENPmaY85fsatNdl5y9lMuVD35SyL6V0Bp5AXn5Gj8bFhfWjChxIJVze_VpPf53tg.mGdnjIgfeP8Swo5h0FxGXZpFwWyCDW_tQiwOb3BwOe8&dib_tag=se&keywords=lenovo%2Blaptop&qid=1757064468&s=computers&sr=1-1&th=1"

# Set up Selenium WebDriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')  # Run in headless mode (no browser UI)
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get(url)
time.sleep(5)  # Wait for the page to load

soup = BeautifulSoup(driver.page_source, "html.parser")

# Try to find the product information section
input_id = "productDetails_techSpec_section_1"
product_info_html = soup.find(id=input_id)

product_info_text = product_info_html.get_text()

driver.quit()

product_info_text

'   Brand  \n                                                    \u200eLenovo     Product Dimensions  \n                                                    \u200e32.43 x 21.38 x 1.79 cm; 1.37 kg     Batteries  \n                                                    \u200e1 Lithium Polymer batteries required. (included)     Item model number  \n                                                    \u200e83EQ005UUK     Manufacturer  \n                                                    \u200eLenovo     Series  \n                                                    \u200eIdeaPad Slim 3 14IAH8     Colour\n  \n                                                    \u200eAbyss Blue     Form Factor  \n                                                    \u200eUltra-Portable     Standing screen display size  \n                                                    \u200e14 Inches     Screen Resolution  \n                                                    \u200e1920 x 1080 pixels     Resolution  \n       

The `product_info_text` is quite raw, but let's just try if the llm can handle it: 

In [71]:
input_text = product_info_text

# Execute the chain with the input text dictionary.
final_result = full_chain.invoke({"text_input": input_text})
print("\n--- Final JSON Output ---")
print(final_result)


--- Final JSON Output ---
{
    "cpu": {
        "brand": "Intel",
        "type": "Core i5",
        "speed": "4.4 GHz",
        "count": 8
    },
    "memory": {
        "technology": "Lpddr 5",
        "type": "DDR5 RAM",
        "max_supported": "16 GB"
    },
    "storage": {
        "size": "512 GB",
        "description": "SSD",
        "interface": "Serial ATA"
    }
}


Looks like it did a quite good job. 

Here is a screenshot of the actual webpage, where you can verify the correct output of our prompt chain: 

![Amazon Product Details](images/amazon-product-details.png)