# Summarize large text

<p style="font-size: 20px; font-weight: bold; font-style: italic;">...via LangGraph</p>

Anton Antonov   
February 2026


---

## Introduction


This notebook illustrates how to specify a Large Language Model (LLM) graph for deriving comprehensive summaries of large texts.
The LLM graph is based on different LLM and non-LLM functions.

The Python package [LangGraph](https://github.com/langchain-ai/langgraph) is used.
A similar implementation based on the Raku package ["LLM::Graph"](https://raku.land/zef:antononcube/LLM::Graph) is given in the notebook
["Summarize-large-text"](Summarize-large-text.ipynb).


---

## Setup


Load the LangChain packages:

In [None]:
from typing import TypedDict, Optional, Dict, Any, List

import json
import re
from pathlib import Path

import requests

from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage
from langchain_ollama import ChatOllama


Load Markdown display package:

In [None]:
from IPython.display import HTML, Markdown, display


LLM access configurations:

In [None]:
# Make sure Ollama is running and the model is available.
llm_ollama = ChatOllama(model="gpt-oss:20b", temperature=0.5)


In [None]:
llm = llm_ollama


---

## LLM graph


Nodes spec:

In [None]:
class SummaryState(TypedDict, total=False):
    input: str
    with_title: Optional[str]
    export_and_open: Optional[str]
    input_type: str
    ingest_text: str
    title: str
    summary: str
    topics_table: str
    thinking_hats: str
    mind_map: str
    most_provocative: str
    report: str


def _invoke(prompt: str) -> str:
    msg = llm.invoke([HumanMessage(content=prompt)])
    return getattr(msg, "content", str(msg))


def _truthy(value: Any) -> bool:
    if isinstance(value, bool):
        return value
    if value is None:
        return False
    return str(value).strip().lower() in {"true", "yes", "1", "y", "open"}


def _strip_fences(text: str) -> str:
    text = text.strip()
    text = re.sub(r"^```[a-zA-Z0-9_-]*", "", text)
    text = re.sub(r"```$", "", text)
    return text.strip()


def _extract_json(text: str) -> List[Dict[str, Any]]:
    cleaned = _strip_fences(text)
    try:
        data = json.loads(cleaned)
        if isinstance(data, list):
            return data
    except Exception:
        pass
    match = re.search(r"(\[.*\])", cleaned, re.S)
    if match:
        try:
            data = json.loads(match.group(1))
            if isinstance(data, list):
                return data
        except Exception:
            return []
    return []


def _table_to_html(rows: List[Dict[str, Any]], columns: Optional[List[str]] = None) -> str:
    if not rows:
        return "<p><em>No data.</em></p>"
    if columns is None:
        columns = sorted({key for row in rows for key in row.keys()})
    header = "".join(f"<th>{col}</th>" for col in columns)
    body_rows = []
    for row in rows:
        body_rows.append("<tr>" + "".join(f"<td>{row.get(col, '')}</td>" for col in columns) + "</tr>")
    body = "".join(body_rows)
    return f"<table><thead><tr>{header}</tr></thead><tbody>{body}</tbody></table>"


def _guess_input_type(text: str) -> str:
    if not isinstance(text, str):
        return "Other"
    stripped = text.strip()
    if re.match(r"^https?://", stripped):
        return "URL"
    if Path(stripped).exists():
        return "FilePath"
    if stripped:
        return "Text"
    return "Other"


def type_of_input(state: SummaryState) -> Dict[str, Any]:
    if state.get("input_type"):
        return {}
    raw = state.get("input", "")
    guessed = _guess_input_type(raw)
    if guessed in {"URL", "FilePath"}:
        return {"input_type": guessed}
    prompt = (
        "Determine the input type of:\n",
        f"{raw}\n\n",
        "The result should be one of: 'Text', 'URL', 'FilePath', or 'Other'."
    )
    result = _invoke(prompt).strip()
    result = result.strip("' ")
    if result not in {"Text", "URL", "FilePath", "Other"}:
        result = guessed
    return {"input_type": result}


def ingest_text(state: SummaryState) -> Dict[str, Any]:
    if state.get("ingest_text"):
        return {}
    raw = state.get("input", "")
    input_type = state.get("input_type", "Other")
    if input_type == "URL":
        response = requests.get(raw, timeout=30)
        response.raise_for_status()
        return {"ingest_text": response.text}
    if input_type == "FilePath":
        return {"ingest_text": Path(raw).read_text()}
    return {"ingest_text": raw}


def title_node(state: SummaryState) -> Dict[str, Any]:
    if state.get("title"):
        return {}
    with_title = state.get("with_title")
    if isinstance(with_title, str) and with_title.strip():
        return {"title": with_title.strip()}
    prompt = (
        "Suggest a short title (6 words or fewer) for the following article:\n\n"
        f"{state.get('ingest_text', '')}\n\n"
        "Return only the title."
    )
    return {"title": _invoke(prompt).strip()}


def summary_node(state: SummaryState) -> Dict[str, Any]:
    if state.get("summary"):
        return {}
    prompt = (
        "Summarize the following text in a comprehensive but concise way:\n\n"
        f"{state.get('ingest_text', '')}"
    )
    return {"summary": _invoke(prompt).strip()}


def topics_table_node(state: SummaryState) -> Dict[str, Any]:
    if state.get("topics_table"):
        return {}
    prompt = (
        "Create a JSON array of objects with fields 'theme' and 'content' summarizing up to 20 key topics in this article:\n\n"
        f"{state.get('ingest_text', '')}\n\n"
        "Return JSON only."
    )
    return {"topics_table": _invoke(prompt).strip()}


def thinking_hats_node(state: SummaryState) -> Dict[str, Any]:
    if state.get("thinking_hats"):
        return {}
    prompt = (
        "Provide De Bono thinking hats feedback using ONLY yellow and grey hats.\n\n"
        "Return HTML only.\n\n"
        f"{state.get('ingest_text', '')}"
    )
    return {"thinking_hats": _invoke(prompt).strip()}


def mind_map_node(state: SummaryState) -> Dict[str, Any]:
    if state.get("mind_map"):
        return {}
    prompt = (
        "Create a Mermaid mind map or flowchart that captures the main ideas of the text.\n\n"
        "Return only Mermaid code.\n\n"
        f"{state.get('ingest_text', '')}"
    )
    mermaid = _invoke(prompt).strip()
    return {"mind_map": mermaid}


def most_provocative_node(state: SummaryState) -> Dict[str, Any]:
    if state.get("most_provocative"):
        return {}
    prompt = (
        "Give a JSON array of the most important or provocative statements in the following text.\n\n"
        "Each item should have fields 'statement' and 'notes'.\n\n"
        f"{state.get('ingest_text', '')}\n\n"
        "Return JSON only."
    )
    return {"most_provocative": _invoke(prompt).strip()}


def report_node(state: SummaryState) -> Dict[str, Any]:
    if state.get("report"):
        return {}
    topics = _extract_json(state.get("topics_table", ""))
    provocative = _extract_json(state.get("most_provocative", ""))
    topics_html = _table_to_html(topics, columns=["theme", "content"])
    prov_html = _table_to_html(provocative, columns=["statement", "notes"])
    thinking_hats = _strip_fences(state.get("thinking_hats", ""))
    mind_map = state.get("mind_map", "")

    if not mind_map.strip().startswith("```"):

        mind_map = f"```mermaid\n{mind_map}\n```"

    report = "\n\n".join(
        [
            f"# {state.get('title', '')}",
            "### *LLM summary report*",
            "## Summary",
            state.get("summary", ""),
            "## Topics",
            topics_html,
            "## Mind map",
            mind_map,
            "## Thinking hats",
            thinking_hats,
            "## Most important or provocative statements",
            prov_html,
        ])
    return {"report": report}


def export_and_open_node(state: SummaryState) -> Dict[str, Any]:
    if not _truthy(state.get("export_and_open")):
        return {}
    report = state.get("report", "")
    Path("Report.md").write_text(report)
    return {"export_and_open": "Report.md"}


Make the graph:

In [None]:
graph = StateGraph(SummaryState)

graph.add_node("TypeOfInput", type_of_input)
graph.add_node("IngestText", ingest_text)
graph.add_node("Title", title_node)
graph.add_node("Summary", summary_node)
graph.add_node("TopicsTable", topics_table_node)
graph.add_node("ThinkingHats", thinking_hats_node)
graph.add_node("MindMap", mind_map_node)
graph.add_node("MostProvocative", most_provocative_node)
graph.add_node("Report", report_node)
graph.add_node("ExportAndOpen", export_and_open_node)

graph.set_entry_point("TypeOfInput")

graph.add_edge("TypeOfInput", "IngestText")
graph.add_edge("IngestText", "Title")
graph.add_edge("IngestText", "Summary")
graph.add_edge("IngestText", "TopicsTable")
graph.add_edge("IngestText", "ThinkingHats")
graph.add_edge("IngestText", "MindMap")
graph.add_edge("IngestText", "MostProvocative")
graph.add_edge("Title", "Report")
graph.add_edge("Summary", "Report")
graph.add_edge("TopicsTable", "Report")
graph.add_edge("ThinkingHats", "Report")
graph.add_edge("MindMap", "Report")
graph.add_edge("MostProvocative", "Report")

def _should_export(state: SummaryState) -> str:
    return "export" if _truthy(state.get("export_and_open")) else "end"

graph.add_conditional_edges("Report", _should_export, {"export": "ExportAndOpen", "end": END})
graph.add_edge("ExportAndOpen", END)

app = graph.compile()


In [None]:
app

---

## Full computation


URL and text statistics:

In [None]:
def text_stats(text: str) -> Dict[str, int]:
    words = re.findall(r"\w+", text)
    return {
        "characters": len(text),
        "words": len(words),
        "lines": len(text.splitlines()),
    }

url = "https://raw.githubusercontent.com/antononcube/RakuForPrediction-blog/refs/heads/main/Data/Graph-neat-examples-in-Raku-Set-2-YouTube.txt"
txt_focus = requests.get(url, timeout=30).text

text_stats(txt_focus)


Computation:

In [None]:
result = app.invoke({"input": url, "with_title": "«Graph» neat examples, set 3"})
# result

Show the corresponding graph-plot:

In [None]:
app

To get the actual Mermaid code use of the graph plot: 

```python
from IPython.display import Markdown, display
display(Markdown("```mermaid\n" + app.get_graph().draw_mermaid() + "\n```"))
```


Final result:

In [None]:
display(Markdown(result.get("report", "")))


---

## Partial evaluation


Here all results are pre-assigned as arguments:

In [None]:
partial = app.invoke(
    {
        "input": url,
        "with_title": "«Graph» neat examples, set 3",
        "export_and_open": "yes",
        "input_type": "Other",
        "summary": "In brief",
        "ingest_text": "Ingest text",
        "topics_table": '[{"theme": "TopicsTable", "content": "..."}]',
        "thinking_hats": "<p>Thinking hats</p>",
        "mind_map": "mind map graph",
    }
)

In [None]:
display(Markdown(partial.get("report", "")))

---

## References

### Blog posts

[AA1] Anton Antonov,
["Parameterized Literate Programming"](https://rakuforprediction.wordpress.com/2025/06/21/parameterized-literate-programming/),
(2025),
[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).

### Notebooks

[AAn1] Anton Antonov,
["LLM comprehensive summary template for large texts"](https://community.wolfram.com/groups/-/m/t/3448842),
(2025)
