# HUAPize Your LangChain App

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Mircus/HUAP/blob/main/notebooks/HUAPize_LangChain.ipynb)

<p align="center">
  <img src="https://raw.githubusercontent.com/Mircus/HUAP/main/HUAP-logo.png" alt="HUAP Logo" width="400"/>
</p>

**Add full traceability to your LangChain app — zero code changes to your chain logic.**

HUAP's `HuapCallbackHandler` plugs into LangChain's callback system.
Every LLM call, tool invocation, and chain step is automatically captured as a JSONL trace.

No API keys needed — this demo uses LangChain's built-in `FakeListChatModel`.

| Before HUAP | After HUAP |
|---|---|
| Chain runs, prints output | Every action recorded as `trace.jsonl` |
| "It worked yesterday" | Deterministic replay + diff |
| No regression testing | CI gates with golden baselines |
| No audit trail | Shareable HTML reports |

In [None]:
# Install HUAP from PyPI + LangChain (core only — no API keys needed)
!pip install -q huap-core langchain-core

---
## Step 1: A Real LangChain Chain (before HUAP)

A standard LangChain pipeline: **prompt template → chat model → output parser**.

We use `FakeListChatModel` so it runs without API keys, but the chain is 100% real LangChain —
swap in `ChatOpenAI` or `ChatAnthropic` and everything works the same.

In [None]:
# === BEFORE HUAP: a real LangChain chain, no tracing ===

from langchain_core.language_models import FakeListChatModel
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Fake LLM — returns canned responses (swap with ChatOpenAI for production)
llm = FakeListChatModel(responses=[
    "HUAP is a trace-first framework for building deterministic, testable AI agents. "
    "It records every LLM call, tool use, and decision as a JSONL trace, enabling "
    "replay, diff, and CI gating.",
])

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful technical writer."),
    ("human", "Explain {topic} in 2-3 sentences."),
])

chain = prompt | llm | StrOutputParser()

result = chain.invoke({"topic": "HUAP framework"})
print(result)
print("\n--- No trace. No diff. No CI. Just stdout. ---")

---
## Step 2: HUAPize It — One Line Change

Same chain. Same logic. **Zero changes to your pipeline code.**

Just:
1. Create a `HuapCallbackHandler`
2. Pass it in `config={"callbacks": [handler]}`
3. Call `handler.flush()`

LangChain fires the callbacks automatically — every LLM request, response, chain start/end,
tool call, and error is captured in the HUAP trace.

In [None]:
# === AFTER HUAP: same chain, now fully traced — automatically ===

import json
from pathlib import Path
from hu_core.adapters.langchain import HuapCallbackHandler

Path("traces").mkdir(exist_ok=True)

# Same chain as before — no changes
llm = FakeListChatModel(responses=[
    "HUAP is a trace-first framework for building deterministic, testable AI agents. "
    "It records every LLM call, tool use, and decision as a JSONL trace, enabling "
    "replay, diff, and CI gating.",
])
chain = prompt | llm | StrOutputParser()

# >>> THE ONLY ADDITION: create a HUAP handler <<<
handler = HuapCallbackHandler(out="traces/langchain_demo.jsonl", run_name="langchain_demo")

# Run with callbacks — LangChain fires all events automatically
result = chain.invoke(
    {"topic": "HUAP framework"},
    config={"callbacks": [handler]},
)
handler.flush()

print(result)
print()

# Show the auto-captured trace events
print("=" * 60)
print("Auto-captured trace events:")
print("=" * 60)
with open("traces/langchain_demo.jsonl") as f:
    for line in f:
        event = json.loads(line)
        kind = event.get("kind", "?")
        name = event.get("name", "?")
        data = event.get("data", {})
        label = data.get("tool", data.get("model", data.get("node", "")))
        print(f"  {kind:12s}  {name:20s}  {label}")

---
## Step 3: Multi-Step Chain with Tools

Real apps have multiple steps and tools. Let's build a slightly more complex chain
and show that HUAP captures everything — including tool calls.

In [None]:
from langchain_core.language_models import FakeListChatModel
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.tools import tool
from langchain_core.runnables import RunnableLambda
from hu_core.adapters.langchain import HuapCallbackHandler

@tool
def word_count(text: str) -> str:
    """Count words in the given text."""
    count = len(text.split())
    return f"Word count: {count}"

# Multi-step: generate → count words → summarize
llm_step1 = FakeListChatModel(responses=[
    "AI agents are autonomous systems that perceive, reason, and act. "
    "They use LLMs as their reasoning engine and tools for real-world interaction. "
    "Traceability is critical for production deployment.",
])

llm_step2 = FakeListChatModel(responses=[
    "Summary: 3-sentence overview of AI agents covering autonomy, LLMs, and traceability. "
    "Word count is within target range.",
])

generate_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a technical writer."),
    ("human", "Write a brief overview of {topic}."),
])

summarize_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an editor."),
    ("human", "Summarize this draft and its word count:\n\nDraft: {draft}\nWord count: {wc}"),
])

# Build the chain
multi_chain = (
    generate_prompt
    | llm_step1
    | StrOutputParser()
    | RunnableLambda(lambda draft: {
        "draft": draft,
        "wc": word_count.invoke(draft),
    })
    | summarize_prompt
    | llm_step2
    | StrOutputParser()
)

# Run with HUAP tracing
handler2 = HuapCallbackHandler(out="traces/langchain_multi.jsonl", run_name="langchain_multi")

result = multi_chain.invoke(
    {"topic": "AI agents"},
    config={"callbacks": [handler2]},
)
handler2.flush()

print(result)
print()
print("=" * 60)
print("Multi-step trace events:")
print("=" * 60)
import json
with open("traces/langchain_multi.jsonl") as f:
    for line in f:
        event = json.loads(line)
        kind = event.get("kind", "?")
        name = event.get("name", "?")
        data = event.get("data", {})
        label = data.get("tool", data.get("model", data.get("node", "")))
        print(f"  {kind:12s}  {name:20s}  {label}")

---
## Step 4: Generate HTML Report

Turn the raw JSONL into a standalone HTML report anyone can open in a browser.

In [None]:
!huap trace report traces/langchain_multi.jsonl --out reports/langchain_multi.html 2>&1 || true

report = Path("reports/langchain_multi.html")
if report.exists():
    from IPython.display import HTML, display
    print("=" * 60)
    print("HTML Report (inline):")
    print("=" * 60)
    display(HTML(report.read_text()))
else:
    print("The trace file is at: traces/langchain_multi.jsonl")
    print("View it with: huap trace view traces/langchain_multi.jsonl")

---
## Step 5: Prove Reproducibility — Diff Two Runs

Run the same chain again and diff the two traces.
With a fake LLM the outputs are identical — proving deterministic replay.

In [None]:
from langchain_core.language_models import FakeListChatModel
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda
from hu_core.adapters.langchain import HuapCallbackHandler

# Second run — same chain, same inputs
llm_step1_v2 = FakeListChatModel(responses=[
    "AI agents are autonomous systems that perceive, reason, and act. "
    "They use LLMs as their reasoning engine and tools for real-world interaction. "
    "Traceability is critical for production deployment.",
])
llm_step2_v2 = FakeListChatModel(responses=[
    "Summary: 3-sentence overview of AI agents covering autonomy, LLMs, and traceability. "
    "Word count is within target range.",
])

multi_chain_v2 = (
    generate_prompt
    | llm_step1_v2
    | StrOutputParser()
    | RunnableLambda(lambda draft: {
        "draft": draft,
        "wc": word_count.invoke(draft),
    })
    | summarize_prompt
    | llm_step2_v2
    | StrOutputParser()
)

handler3 = HuapCallbackHandler(out="traces/langchain_multi_v2.jsonl", run_name="langchain_multi_v2")
result_v2 = multi_chain_v2.invoke(
    {"topic": "AI agents"},
    config={"callbacks": [handler3]},
)
handler3.flush()

# Diff the two runs
!huap trace diff traces/langchain_multi.jsonl traces/langchain_multi_v2.jsonl

---
## What You Just Did

| Before | After |
|---|---|
| A LangChain chain that runs and prints | A traced, diffable, CI-gatable agent system |
| No audit trail | Every LLM call, tool use, and chain step in `trace.jsonl` |
| "It worked yesterday" | Deterministic replay proves reproducibility |
| No regression testing | `huap ci run` gates regressions against baselines |
| No shareable artifacts | `trace.html` — send to anyone |

### What changed in your code?

**Three lines — zero changes to your chain logic:**

```python
from hu_core.adapters.langchain import HuapCallbackHandler

handler = HuapCallbackHandler(out="traces/my_trace.jsonl")
chain.invoke(input, config={"callbacks": [handler]})
handler.flush()
```

Your prompts, models, tools, and chains stay exactly the same.
HUAP just observes — it never modifies your pipeline.

### Using with a real LLM

Replace `FakeListChatModel` with any LangChain chat model:

```python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
```

The `HuapCallbackHandler` captures everything automatically — model names, token usage, latency, and full message content.

### Next steps
- [GitHub](https://github.com/Mircus/HUAP) — source, examples, contributing guide
- [Try HUAP](https://colab.research.google.com/github/Mircus/HUAP/blob/main/notebooks/Try_HUAP.ipynb) — the flagship demo notebook
- [HUAPize Any Agent](https://colab.research.google.com/github/Mircus/HUAP/blob/main/notebooks/HUAPize_Agents.ipynb) — manual tracing for any framework
- [Getting Started](https://github.com/Mircus/HUAP/blob/main/GETTING_STARTED.md) — full tutorial