Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions docs/adapters/frameworks-agno.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Agno framework adapter

`layerlens.instrument.adapters.frameworks.agno.AgnoAdapter` instruments
[Agno](https://github.com/agno-agi/agno) agents — single-agent and
multi-agent teams — by wrapping `Agent.run()` and `Agent.arun()`.

## Install

```bash
pip install 'layerlens[agno]'
```

Pulls `agno>=0.1,<1.0`. Requires Python 3.10+.

## Quick start

```python
from agno.agent import Agent
from agno.models.openai import OpenAIChat

from layerlens.instrument.adapters.frameworks.agno import AgnoAdapter, instrument_agent
from layerlens.instrument.transport.sink_http import HttpEventSink

sink = HttpEventSink(adapter_name="agno")
adapter = AgnoAdapter()
adapter.add_sink(sink)
adapter.connect()

agent = Agent(model=OpenAIChat(id="gpt-4o-mini"), instructions="Be concise.")
adapter.instrument_agent(agent)

response = agent.run("What is 2 + 2?")

adapter.disconnect()
sink.close()
```

`instrument_agent(agent)` is the one-liner equivalent.

## What's wrapped

`adapter.instrument_agent(agent)` patches the following on each Agent:

- `run` — sync entry point. Emits `agent.input` + `agent.output` and any
inner `model.invoke` / `tool.call` events.
- `arun` — async entry point. Same semantics.
- `_run_tool` — emits `tool.call` per tool invocation (when present in the
Agno version).
- Model adapter hooks — emit `model.invoke` per LLM call.

`disconnect()` restores all originals.

## Events emitted

| Event | Layer | When |
|---|---|---|
| `environment.config` | L4a | First `run` per agent. |
| `agent.input` | L1 | Beginning of every `run` / `arun`. |
| `agent.output` | L1 | End of every `run` / `arun`. |
| `agent.action` | L4a | Per intermediate reasoning step. |
| `agent.handoff` | L4a | When a team agent delegates to a sub-agent. |
| `agent.state.change` | cross-cutting | Memory mutations. |
| `tool.call` | L5a | Per tool invocation. |
| `model.invoke` | L3 | Per LLM call. |

## Agno specifics

- **Teams**: Agno supports multi-agent teams via `Team(agents=[...])`.
Each team member must be instrumented individually with
`adapter.instrument_agent(team_member)` — or call
`instrument_agent(team)` and the convenience helper recurses.
- **Reasoning agents**: when `reasoning=True` is set on an Agent, the
intermediate reasoning steps emit `agent.action` events with a
`step_index` field.
- **Storage backends**: Agno session storage (Postgres, sqlite, Redis,
etc.) emits `agent.state.change` on every save.

## Capture config

```python
from layerlens.instrument.adapters._base import CaptureConfig

# Recommended.
adapter = AgnoAdapter(capture_config=CaptureConfig.standard())

# Heavy: include reasoning steps as agent.code (the chain-of-thought).
adapter = AgnoAdapter(
capture_config=CaptureConfig(
l1_agent_io=True,
l2_agent_code=True,
l3_model_metadata=True,
l5a_tool_calls=True,
),
)
```

## BYOK

Agno model adapters (`OpenAIChat`, `AnthropicClaude`, etc.) read their own
credentials. The Agno adapter does not own them. For platform-managed
BYOK see `docs/adapters/byok.md` (atlas-app M1.B).
113 changes: 113 additions & 0 deletions docs/adapters/frameworks-bedrock_agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# AWS Bedrock Agents framework adapter

`layerlens.instrument.adapters.frameworks.bedrock_agents.BedrockAgentsAdapter`
instruments AWS Bedrock Agent runtime calls by registering boto3 event hooks
and parsing the `InvokeAgent` response stream's `trace` blocks.

## Install

```bash
pip install 'layerlens[bedrock-agents]'
```

Pulls `boto3>=1.34`. AWS credentials and region must be configured the
standard way (env vars, IAM role, profile).

## Quick start

```python
import boto3

from layerlens.instrument.adapters.frameworks.bedrock_agents import (
BedrockAgentsAdapter,
instrument_client,
)
from layerlens.instrument.transport.sink_http import HttpEventSink

sink = HttpEventSink(adapter_name="bedrock_agents")
adapter = BedrockAgentsAdapter()
adapter.add_sink(sink)
adapter.connect()

client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
adapter.instrument_client(client)

response = client.invoke_agent(
agentId="ABCDEFGHIJ",
agentAliasId="TSTALIASID",
sessionId="my-session",
inputText="What is 2+2?",
)
# Iterate the response stream — trace events are captured automatically.
for chunk in response["completion"]:
pass

adapter.disconnect()
sink.close()
```

`instrument_client(client)` is the convenience helper.

## What's wrapped

`adapter.instrument_client(client)` registers two boto3 event hooks on the
provided `bedrock-agent-runtime` client:

- `provide-client-params.bedrock-agent-runtime.InvokeAgent` — fires before
the request goes out. Captures `agentId`, `sessionId`, `inputText`,
emits `agent.input` and `environment.config` on first agent encounter.
- `after-call.bedrock-agent-runtime.InvokeAgent` — fires after the response
comes back. Walks the `trace` blocks in the streamed events and emits
`model.invoke` / `tool.call` / `agent.action` per trace step.

`disconnect()` unregisters both hooks.

## Events emitted

| Event | Layer | When |
|---|---|---|
| `environment.config` | L4a | First `InvokeAgent` per `agentId`. |
| `agent.input` | L1 | Beginning of every `InvokeAgent`. |
| `agent.output` | L1 | End of every `InvokeAgent` (after stream consumption). |
| `agent.action` | L4a | Per `orchestrationTrace.modelInvocationInput` block. |
| `agent.handoff` | L4a | Per cross-agent collaboration step. |
| `tool.call` | L5a | Per `actionGroupInvocationInput` / `knowledgeBaseLookupInput` block. |
| `model.invoke` | L3 | Per `modelInvocationOutput` block (with token usage). |

## Bedrock Agents specifics

- **Action groups**: each `actionGroup` invocation maps to a `tool.call`
with `tool_name = "{actionGroupName}::{apiPath}"` and the typed
parameters in the payload.
- **Knowledge bases**: every KB lookup emits a `tool.call` with
`tool_name = "knowledge_base::{knowledgeBaseId}"` and the rendered
query + retrieved citations.
- **Multi-agent collaboration**: when a supervisor agent delegates to a
collaborator, an `agent.handoff` event is emitted with both agent IDs.
- **Session attributes**: passed through into `agent.input` payloads as
`session_attributes`.

## Capture config

```python
from layerlens.instrument.adapters._base import CaptureConfig

# Recommended.
adapter = BedrockAgentsAdapter(capture_config=CaptureConfig.standard())

# Compliance: drop user input/output content but keep tool/model metadata.
adapter = BedrockAgentsAdapter(
capture_config=CaptureConfig(
l1_agent_io=True,
l3_model_metadata=True,
l5a_tool_calls=True,
capture_content=False,
),
)
```

## BYOK

Bedrock Agents bills directly to your AWS account via your IAM identity.
There's no separate API key to manage. The model used by the agent is
configured server-side in the agent definition.
108 changes: 108 additions & 0 deletions docs/adapters/frameworks-benchmark_import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Benchmark import framework adapter

`layerlens.instrument.adapters.frameworks.benchmark_import.BenchmarkImportAdapter`
imports external benchmark datasets into Stratix evaluation spaces. Unlike
the other framework adapters, this is a **data importer**, not a runtime
instrumentation adapter — it reads benchmarks from disk or from
HuggingFace and produces normalized rows.

## Install

```bash
pip install 'layerlens[benchmark-import]'
```

The `benchmark-import` extra has no required dependencies. To use the
HuggingFace import path, additionally install `datasets`:

```bash
pip install datasets
```

## Quick start (CSV)

```python
from layerlens.instrument.adapters.frameworks.benchmark_import import (
BenchmarkImportAdapter,
)

adapter = BenchmarkImportAdapter()

result = adapter.import_csv(
path="my_benchmark.csv",
schema_mapping={"question": "prompt", "answer": "expected_output"},
max_records=1000,
tags=["custom", "qa"],
)

print(f"Imported {result.records_imported} records into {result.benchmark_id}")
```

## Quick start (HuggingFace)

```python
result = adapter.import_huggingface(
dataset_name="squad",
split="validation",
max_records=200,
tags=["public", "qa"],
)
```

## Quick start (HELM)

```python
result = adapter.import_helm(
path="/path/to/helm_results.json",
tags=["helm", "leaderboard"],
)
```

## Public API

| Method | Description |
|---|---|
| `import_huggingface(dataset_name, split=, subset=, schema_mapping=, max_records=, tags=)` | Stream a HuggingFace dataset into Stratix. |
| `import_helm(path, tags=)` | Import HELM JSON results. |
| `import_csv(path, schema_mapping=, delimiter=, max_records=, tags=)` | Import a CSV benchmark. |
| `import_json(path, schema_mapping=, records_key=, max_records=, tags=)` | Import a JSON benchmark. |
| `import_parquet(path, schema_mapping=, max_records=, tags=)` | Import a Parquet benchmark (requires `pyarrow`). |

All methods return `ImportResult` with `success`, `benchmark_id`,
`records_imported`, `records_skipped`, `duration_ms`, `errors`, and
`metadata` (a `BenchmarkMetadata` Pydantic model).

## Schema mapping

Supplying a `schema_mapping` dict renames source columns to the canonical
Stratix evaluation schema:

| Stratix field | Common source columns |
|---|---|
| `prompt` | `question`, `input`, `query` |
| `expected_output` | `answer`, `target`, `reference`, `ground_truth` |
| `difficulty` | `difficulty`, `level` |
| `category` | `category`, `subject`, `topic` |

When no mapping is provided, the adapter applies a small set of automatic
heuristics (case-insensitive name match against the canonical fields).

## Persistence

If you pass a `store=` argument to `BenchmarkImportAdapter(...)` (something
that exposes `save_benchmark(metadata, records)`), the adapter writes
imported benchmarks through it. Otherwise records are returned to the
caller and held in `adapter._benchmarks` keyed by `benchmark_id`.

## Events emitted

This adapter does not emit telemetry events — it produces benchmark rows.
Once stored in atlas-app, the platform's evaluation runner can iterate the
benchmark and produce `model.invoke` / `evaluation.score` events through
the standard provider adapters.

## BYOK

Not applicable. The adapter reads files locally or downloads from
HuggingFace using the standard `datasets` library — no model API keys are
involved.
Loading