# Context engineering with WRITER and Strands Agents SDK

Using WRITER's Palmyra-X5 and integration with Strands Agents SDK, we're going to walk through the core components of **context engineering** and how you can create agents capable of scaling in the enterprise, mitigating things like **context rot**.

## What is context engineering?

**Context engineering** is the discipline of how we structure, retrieve, and manage information so that LLMs can reason effectively. It goes beyond prompts to include instructions, knowledge, tools, retrieval, memory, and governance — all orchestrated to ensure the model has the *right information in the right format at the right time*. As conversations grow longer, the context window becomes a critical bottleneck that can lead to:

- **Context rot**: Degradation in agent performance as conversations exceed model context limits
    - **Context Poisoning**: When a hallucination makes it into the context
    - **Context Distraction**: When the context overwhelms the training
    - **Context Confusion**: When superfluous context influences the response
    - **Context Clash**: When parts of the context disagree

## How context is generated

As enterprises move beyond simple Q&A and into multi-turn, agentic systems that can reason and act over long time horizons, single-prompt engineering, alone, can’t manage the flood of context these systems depend on:

![Types of Context Management](resources/types_of_context.png)

- **Instructions (the rules)**: system prompts, task instructions, few-shot examples, formatting rules, and tool descriptions
- **Knowledge (the facts)**: documents, embeddings, domain-specific data, chat history, and long-term memory
- **Tools (the actions)**: outputs and feedback from external systems - APIs, search engines, calculators, retrieval functions

The diagram above illustrates the different approaches to context management that we'll explore in this notebook, from simple sliding windows to intelligent summarization strategies.

## Setup and installation

First, let's install the required dependencies for working with Strands Agents SDK and WRITER integration.


In [None]:
%pip install 'strands-agents[writer]' strands-agents-tools

## Environment configuration

We need to set up the WRITER API key to authenticate with the Palmyra-X5 model.


In [None]:
import getpass
import os

os.environ["WRITER_API_KEY"] = getpass.getpass()

## Model and agent setup

Let's create a basic agent with WRITER's Palmyra-X5 model to test our setup.


In [None]:
from strands import Agent
from strands.models.writer import WriterModel

model = WriterModel(
    model_id="palmyra-x5",
)

Now let's test our agent with a simple question to ensure everything is working correctly.


In [None]:
agent = Agent(
    model=model,
    name="Assistant",
    system_prompt="Reply very concisely.",
)
result = agent("Tell me why it is important to evaluate AI agents.")

## State & history: proactive context management with persistence

A central challenge in **context engineering** is deciding *how much of a conversation’s history should remain in the model’s context window.*  
Too little history, and the agent loses continuity. Too much, and it wastes tokens, increases latency, and risks “context rot.”

This section demonstrates **proactive context management** — a strategy that trims older messages *before* they are passed to the model.  
By doing this early, the system ensures that only the most recent and relevant user turns are preserved, keeping context focused and performant.


### How It Works

We extend Strands’ built-in `SlidingWindowConversationManager` into a custom **`ProactiveTurnTrimmer`**, which:

- **Trims proactively** → Always keeps only the last *N* user turns before sending the next model call.  
- **Preserves continuity** → Maintains conversational flow without unnecessary historical baggage.  
- **Optimizes resources** → Ensures predictable token usage and model performance in long-running sessions.

This differs from a *reactive* approach (which trims *after* overflow).  
Proactive trimming is like an automatic cleanup crew — keeping your model’s short-term memory clean and coherent before every turn.

---

### FileSessionManager — Persistent Session Memory

In parallel, we use **`FileSessionManager`** to handle **long-term persistence** of the agent’s state.  
While the `ProactiveTurnTrimmer` manages *active context*, `FileSessionManager` preserves *historical memory* between runs.

#### What It Does
- Stores all messages as structured JSON under `.strands/sessions/...`
- Allows sessions to be reloaded or replayed later
- Creates a lightweight local “memory layer” for experimentation or auditability

#### Why It Matters
Together, these components embody the **inner and outer loops** of context engineering:

| Layer | Mechanism | Function | Context Principle |
|:------|:-----------|:----------|:------------------|
| **Inner Loop** | `ProactiveTurnTrimmer` | Manage active conversation window | *Isolate* |
| **Outer Loop** | `FileSessionManager` | Persist full conversation to disk | *Write* |

This pairing ensures that:
- The **active context** (short-term) stays clean and relevant.
- The **session history** (long-term) remains available for recall, analysis, or summarization later.

---

### Why It Matters in Enterprise Contexts

In production-scale systems — customer assistants, multi-agent workflows, or regulated industries — proactive context management is critical for:

- **Consistency** → Prevent drift by constraining context size and order.  
- **Cost-efficiency** → Control token growth across multi-turn conversations.  
- **Compliance & Auditability** → Persist all interactions for review via session storage.  
- **Scalability** → Maintain stable reasoning as conversations evolve over time.

> **Context engineering isn’t just about prompt design — it’s about lifecycle management.**  
> The proactive trimming strategy governs what stays *in the model’s mind* (state/history),  
> while session persistence governs what stays *on record* (memory).

✅ **Next Step:**  
We’ll build on this by adding summarization — compressing older context while keeping recent turns verbatim, merging *windowing* with *memory abstraction* for full context lifecycle management.


In [None]:
import json
import pathlib
from typing import Any, Dict, List

from strands import Agent
from strands.models.writer import WriterModel
from strands.agent.conversation_manager.sliding_window_conversation_manager import SlidingWindowConversationManager
from strands.session import FileSessionManager

# --- Storage setup(local) ---
storage_dir = pathlib.Path(".strands/sessions").resolve()

# --- Helper: identify user messages ---
def _is_user_msg(item: Dict[str, Any]) -> bool:
    return item.get("role") == "user"

# --- Proactive Turn-Based Conversation Manager ---
class ProactiveTurnTrimmer(SlidingWindowConversationManager):
    def __init__(self, max_turns: int = 3, **kwargs: Any):
        """Trim proactively to the last N user turns each time."""
        super().__init__(**kwargs)
        self.max_turns = max(1, int(max_turns))

    def select_messages(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Always trim proactively before sending to the model."""
        if not messages:
            return messages

        count = 0
        start_idx = 0
        for i in range(len(messages) - 1, -1, -1):
            if _is_user_msg(messages[i]):
                count += 1
                if count == self.max_turns:
                    start_idx = i
                    break
        return messages[start_idx:]

# --- Setup Agent with (local) FileSessionManager ---
session_proactive = FileSessionManager(
    session_id="proactive-demo",
    storage_dir=storage_dir
)

model = WriterModel(model_id="palmyra-x5")

proactive_agent = Agent(
    model=model,
    name="Proactive Assistant",
    system_prompt="Be concise.",
    conversation_manager=ProactiveTurnTrimmer(max_turns=2),
    session_manager=session_proactive,
)

# --- Inspector Utility ---
def show_effective_context(agent: Agent):
    trimmed = agent.conversation_manager.select_messages(agent.messages)
    print(f"\n Effective Context for {agent.name}:")
    for msg in trimmed:
        print(msg)

# --- Test run ---
def test_proactive_agent():
    print(f"\n=== {proactive_agent.name} ===")
    proactive_agent("How is your day?")
    proactive_agent("What did I just say?")
    proactive_agent("What model are we using?")

    # Show what the model actually "sees" (proactive state/history)
    show_effective_context(proactive_agent)

# Run test
test_proactive_agent()


## Session storage inspection

To understand how Strands persists conversation data, we can inspect what’s stored on disk.  
The `FileSessionManager` saves each message (user, assistant, tool, etc.) as a structured JSON file with metadata like timestamps, roles, and IDs.

This snippet adds **observability** to session storage — helping you verify what’s actually persisted versus what remains in active memory.

---

### inspect_session() — observing stored history

#### Purpose
Adds **visibility** into the agent’s stored state by reading session files and printing their contents.

#### What it does
- Iterates through all saved messages on disk.  
- Extracts key details such as role (`user`, `assistant`, etc.) and text content.  
- Displays the full chronological order of a session, providing a transparent view of the stored memory state.

#### Why it matters
Observability is a critical part of **context engineering**.  
It allows you to debug *when* and *why* context was trimmed, summarized, or persisted —  
bridging the gap between the model’s **active state** (what it’s reasoning over now)  
and its **stored history** (what it remembers across turns and sessions).

> Use this utility after running a conversation to inspect your `.strands/sessions` directory  
> and confirm that messages are being written, trimmed, or summarized as expected.

In [None]:
import json
import pathlib

def extract_message(m: dict):
    # Messages are nested under "message"
    msg = m.get("message", m)  

    role = msg.get("role", "unknown")

    content = msg.get("content", "")
    if isinstance(content, list):
        text = " ".join(c.get("text", "") for c in content if isinstance(c, dict))
    elif isinstance(content, str):
        text = content
    else:
        text = str(content)

    return role, text


def inspect_session(session_id: str):
    base_dir = pathlib.Path(f".strands/sessions/session_{session_id}/agents")
    if not base_dir.exists():
        print(f" No session found for {session_id}")
        return

    for agent_dir in base_dir.iterdir():
        print(f"\n Agent: {agent_dir.name}")
        messages_dir = agent_dir / "messages"
        if not messages_dir.exists():
            print("   (no messages yet)")
            continue

        print(" Conversation:")
        for msg_file in sorted(messages_dir.glob("message_*.json")):
            with open(msg_file) as f:
                raw = json.load(f)
            role, text = extract_message(raw)
            print(f"{role}: {text}")


# Example usage:
inspect_session("proactive-demo")

## Context compression: summarizing conversation history

After exploring proactive context trimming, the next step in **context engineering**  
is to **compress** older conversation history without losing important information.  

The `SummarizingConversationManager` introduces an automatic summarization layer —  
it shortens the *oldest parts* of a conversation while preserving recent turns verbatim.  
This bridges the gap between **short-term state management** and **long-term memory retention**.


### How it works

When the conversation grows too long, this manager:
1. Identifies the *oldest* messages beyond a given threshold (`summary_ratio`).
2. Uses the model itself to summarize that portion of the conversation.  
3. Keeps the most recent few turns intact (`preserve_recent_messages`).
4. Replaces the trimmed history with a concise summary message.

This allows the agent to:
- Maintain awareness of prior topics and facts,  
- Reduce token usage dramatically,  
- Preserve natural flow without overwhelming the model context window.

---

### Parameters used in this example

| Parameter | Description |
|------------|-------------|
| `summary_ratio=0.3` | Summarize roughly 30% of the oldest messages when the window fills. |
| `preserve_recent_messages=4` | Always keep the latest four messages unaltered. |
| `summarization_system_prompt` | Custom instruction defining how to summarize. |

---

### Why it matters

In **context engineering**, summarization is the **“compress”** phase —  
it’s how we preserve meaning across long conversations while keeping the model efficient.

Where the **ProactiveTurnTrimmer** focused on short-term state and history,  
the **SummarizingConversationManager** introduces a middle-layer memory —  
an abstraction that turns verbose logs into a compact, meaningful summary.

> This is how context evolves from “raw conversation history”  
> into “semantic memory” that still fits within the model’s context window.

---

✅ **Next step:**  
We’ll combine both strategies — sliding-window trimming for state  
and summarization for memory — into a single hybrid conversation manager.


In [None]:
import pathlib
from strands import Agent
from strands.models.writer import WriterModel
from strands.agent.conversation_manager.summarizing_conversation_manager import SummarizingConversationManager
from strands.session.file_session_manager import FileSessionManager

# --- Storage directory for persistence ---
storage_dir = pathlib.Path(".strands/sessions").resolve()

# --- Setup summarizing manager ---
# summary_ratio=0.3 means: summarize about 30% of the oldest messages
# preserve_recent_messages=4 means: always keep at least the last 4 raw messages
summarizing_manager = SummarizingConversationManager(
    summary_ratio=0.3,
    preserve_recent_messages=4,
    summarization_system_prompt="Summarize the following conversation briefly but keep all important details."
)

# --- Session + model setup ---
session_summarizing = FileSessionManager(
    session_id="summarizing-demo",
    storage_dir=storage_dir
)

model = WriterModel(model_id="palmyra-x5")

summarizing_agent = Agent(
    model=model,
    name="Summarizing Assistant",
    system_prompt="You are a helpful summarizing assistant.",
    conversation_manager=summarizing_manager,
    session_manager=session_summarizing,
)

# --- Run a demo conversation ---
print("=== Summarizing Agent Demo ===")
summarizing_agent("Hello, I’m researching space travel history.")
summarizing_agent("Tell me about the Apollo missions.")
summarizing_agent("What happened during Apollo 13?")
summarizing_agent("Summarize what we’ve talked about so far.")

# --- Inspect current active context (model input after summarization) ---
print("\n✂️ Active Context (summarized):")
for msg in summarizing_agent.messages:
    print(msg)

# --- Inspect persisted session on disk ---
base_dir = storage_dir / "session_summarizing-demo" / "agents"
if base_dir.exists():
    for agent_dir in base_dir.iterdir():
        print(f"\n Agent: {agent_dir.name}")
        msgs_dir = agent_dir / "messages"
        for msg_file in sorted(msgs_dir.glob("message_*.json")):
            print(msg_file.name, "->", open(msg_file).read())


## 🧠 Hybrid Context Management: Combining Trimming and Summarization

So far, we’ve explored two key patterns:
1. **Proactive Trimming** → Keep only the most recent turns (short-term state).  
2. **Summarization** → Compress older turns into a concise summary (long-term memory).

The next evolution is to **combine both strategies** into a single, adaptive system.  
This `HybridSummarizingTurnTrimmer` demonstrates how an agent can intelligently balance *context depth* and *context efficiency*.

### How It Works

This implementation enforces two key parameters:

| Parameter | Description |
|------------|-------------|
| `context_limit` | Maximum number of user turns to keep in raw history before summarization triggers. |
| `keep_last_n_turns` | Number of most recent turns to preserve exactly as they occurred. |

When the number of user turns exceeds `context_limit`, the manager:
1. **Counts user turns** in the current conversation.  
2. **Summarizes everything before** the earliest of the last `keep_last_n_turns` turns.  
3. **Injects a synthetic conversation pair**:
   - `user`: “Summarize the conversation we had so far.”  
   - `assistant`: *(generated summary)*  
4. **Appends the recent turns** verbatim after the summary.

This creates a natural conversation flow where the agent remembers prior context through a summary — just as a human might recall *“We already talked about that earlier…”* — while keeping the most recent details intact.

---

### Why It Matters for Context Engineering

The hybrid model captures the full **context lifecycle**:

| Context Layer | Strategy | Component | Goal |
|:---------------|:----------|:-----------|:------|
| **Active Context (short-term)** | Sliding window trimming | `SlidingWindowConversationManager` | Maintain focus and coherence |
| **Summarized Memory (mid-term)** | Context compression | `_summarize_old_history()` | Preserve meaning, reduce token load |
| **Persistent State (long-term)** | File-based storage | `FileSessionManager` | Enable recovery, auditing, continuity |

Together, these mechanisms form the backbone of **enterprise-grade context engineering**:
- **Scalable** → Handles long, multi-session dialogues gracefully.  
- **Composable** → Fits within multi-agent systems where each agent manages its own memory.  
- **Explainable** → Summaries provide interpretable, inspectable history snapshots.

---

> 💬 In short: this hybrid design teaches your agent to *think like a human* —
> remember the essence of what was said, forget unnecessary detail, and stay focused on what matters right now.


In [None]:
class HybridSummarizingTurnTrimmer(SlidingWindowConversationManager):
    def __init__(self, model: WriterModel, context_limit: int = 8, keep_last_n_turns: int = 3, **kwargs):
        super().__init__(**kwargs)
        assert keep_last_n_turns <= context_limit
        self.model = model
        self.context_limit = context_limit
        self.keep_last_n_turns = keep_last_n_turns

    def _is_user_msg(self, item: Dict[str, Any]) -> bool:
        return item.get("role") == "user"

    def _count_turns(self, messages: List[Dict[str, Any]]) -> int:
        return sum(1 for m in messages if self._is_user_msg(m))

    def _trim_to_last_turns(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        count = 0
        start_idx = 0
        for i in range(len(messages) - 1, -1, -1):
            if self._is_user_msg(messages[i]):
                count += 1
                if count == self.keep_last_n_turns:
                    start_idx = i
                    break
        return messages[start_idx:]

    def _summarize_old_history(self, old_messages: List[Dict[str, Any]]) -> Dict[str, Any]:
        """Use the model to summarize earlier context."""
        if not old_messages:
            return {"role": "assistant", "content": [{"text": "(no previous context)"}]}
    
        # Create a temporary summarizer agent using the same model
        summarizer = Agent(
            model=self.model,
            system_prompt="You are a concise assistant that summarizes prior conversation context for continuity."
        )
    
        # Construct the summarization prompt
        prompt = "Summarize the conversation we had so far briefly but keep all important details."
    
        # Run the summarizer agent to generate the summary
        response = summarizer(prompt + "\n\n" + json.dumps(old_messages, indent=2))
    
        # Extract the output safely
        summary_text = getattr(response, "output_text", str(response))
        return {"role": "assistant", "content": [{"text": summary_text}]}


    def select_messages(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Proactively trim and summarize when exceeding context_limit."""
        if self._count_turns(messages) <= self.context_limit:
            return messages

        trimmed = self._trim_to_last_turns(messages)
        old_msgs = messages[: len(messages) - len(trimmed)]

        synthetic_user = {"role": "user", "content": [{"text": "Summarize the conversation we had so far."}]}
        synthetic_assistant = self._summarize_old_history(old_msgs)

        # Replace old history with summary + last turns
        return [synthetic_user, synthetic_assistant] + trimmed


## Initialize the Hybrid Agent

Set up the hybrid agent using the `HybridSummarizingTurnTrimmer` we defined above.

In [None]:
storage_dir = pathlib.Path(".strands/sessions").resolve()
model = WriterModel(model_id="palmyra-x5")

session = FileSessionManager(session_id="hybrid-demo", storage_dir=storage_dir)

agent = Agent(
    model=model,
    name="Hybrid Assistant",
    system_prompt="Be concise but preserve important context.",
    conversation_manager=HybridSummarizingTurnTrimmer(model, context_limit=6, keep_last_n_turns=2),
    session_manager=session,
)

## Running the Hybrid Context Demo

Let’s simulate a multi-turn conversation to see the hybrid summarization in action.  
As the dialogue progresses, older turns will be summarized once the `context_limit` is exceeded — keeping the recent turns verbatim and compressing the rest.  
Afterward, we’ll inspect both the **active** and **effective** contexts to confirm how the memory evolved.


In [None]:
print("=== Hybrid Summarization Demo ===")

queries = [
    "Hi there, I’m researching the history of financial crises.",
    "Can you tell me what caused the 2008 global financial crisis?",
    "How did central banks respond to the crisis?",
    "What were the long-term policy changes that came from it?",
    "Now, what are the key similarities between 2008 and the 2023 banking stress events?",
]

for q in queries:
    print(f"\n User: {q}")
    agent(q)

# --- Inspect current context after summarization ---
print("\n Active Context (after summarization):")
for msg in agent.messages:
    print(msg)

# --- Inspect effective context (trimmed + summarization pair) ---
print("\n Effective Context (with synthetic summarization pair):")
effective = agent.conversation_manager.select_messages(agent.messages)
for msg in effective:
    print(msg)
