Skip to content

refactor: extract common base for purple-agent adapters (#40)#51

Merged
bordeauxred merged 2 commits into
aganthos:mainfrom
kiranannadatha8:40-purple-base
Apr 20, 2026
Merged

refactor: extract common base for purple-agent adapters (#40)#51
bordeauxred merged 2 commits into
aganthos:mainfrom
kiranannadatha8:40-purple-base

Conversation

@kiranannadatha8
Copy link
Copy Markdown
Contributor

Summary

  • Moves shared scaffolding into _purple_base.py so CAR and entropic adapters only carry their bench-specific prompt/envelope logic.
  • Net −221 LOC across the three files; _convert_tools_to_openai and friends live in one place now.
  • No behavior change — same wire format on both sides.

Test plan

  • pytest tests/test_car_purple.py tests/test_entropic_purple.py tests/test_car_adapter.py tests/test_entropic_adapter.py — 52/52 green
  • Full suite: 958 passed, 49 skipped
  • ruff check --select E,F,I + ruff format --check clean

Closes #40.

…aganthos#40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a shared base class, _PurpleAgentBase, to consolidate common logic for A2A purple agents, significantly reducing code duplication in the CAR and Entropic agent implementations. Feedback highlights a logic error in _reconcile_tool_call_id where an early return prematurely stops the search for assistant messages, a lack of error handling for LLM completion calls which could lead to inconsistent session states, and a suggestion to move imports to the top of the file for better performance and idiomatic structure.

Comment thread clawloop/environments/_purple_base.py
Comment on lines +153 to +209
def handle_message_sync(self, jsonrpc_request: dict) -> dict:
"""Handle one ``message/send`` request (sync — litellm is sync)."""
params = jsonrpc_request["params"]
msg = params["message"]
context_id = params.get("contextId", "default")

text_parts = [p["text"] for p in msg["parts"] if p.get("kind") == "text"]
data_parts = [p["data"] for p in msg["parts"] if p.get("kind") == "data"]

if context_id not in self._sessions:
self._sessions[context_id] = []
messages = self._sessions[context_id]

if not messages:
messages.extend(self._build_initial_messages(text_parts))
for d in data_parts:
if "tools" in d:
self._tool_cache[context_id] = self._convert_tools_to_openai(d["tools"])
else:
for d in data_parts:
if "tool_results" in d:
for tr in d["tool_results"]:
green_id = tr["tool_call_id"]
tool_name = tr.get("tool_name", "")
self._reconcile_tool_call_id(messages, tool_name, green_id)
messages.append(
{
"role": "tool",
"tool_call_id": green_id,
"content": tr["content"],
}
)
for text in text_parts:
if text.strip():
messages.append({"role": "user", "content": text})

tools = self._tool_cache.get(context_id)
completion_kwargs: dict[str, Any] = {
"model": self.model,
"messages": messages,
"temperature": 0.0,
}
if tools:
completion_kwargs["tools"] = tools
if self.api_base:
completion_kwargs["api_base"] = self.api_base
if self.api_key:
completion_kwargs["api_key"] = self.api_key

response = litellm.completion(**completion_kwargs)
assistant_msg = response.choices[0].message

normalized = self._normalize_assistant_msg(assistant_msg)
messages.append(normalized)
self._capture_assistant(context_id, normalized)

return self._format_a2a_response(assistant_msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The handle_message_sync method lacks error handling for the litellm.completion call. Network issues, API timeouts, or model-specific errors will raise uncaught exceptions, potentially leaving the session state (self._sessions) in an inconsistent state since messages are appended before the call. Consider wrapping the completion call in a try-except block to handle common LLM provider errors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for a pure refactor — worth a separate issue to design the fail-up / session-rollback semantics.

Comment thread clawloop/environments/_purple_base.py
@bordeauxred
Copy link
Copy Markdown
Contributor

Hi @kiranannadatha8,
thanks a lot for your PR! Please address the comments from Gemini.

Addresses Gemini review on aganthos#51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.
@kiranannadatha8
Copy link
Copy Markdown
Contributor Author

Thanks for the review! Addressed in aabec30:

  • Imports: hoisted socket, time, httpx to module top.
  • Early return in _reconcile_tool_call_id: kept as-is (pre-existing behavior — tool results always follow their triggering assistant call, so walking older messages would reconcile against stale ids). Expanded the docstring + inline comment so the intent is clear.
  • litellm.completion error handling: also pre-existing. Out of scope for a pure refactor — worth a separate issue to design the fail-up / session-rollback semantics.

@bordeauxred
Copy link
Copy Markdown
Contributor

bordeauxred commented Apr 20, 2026

Tested locally against real Gemini (gemini-2.0-flash-lite via litellm):

  • CarPurpleAgent — full HTTP path through start_purple_server → agent-card + JSON-RPC message/send → live completion → correct {message: {...}} envelope; _captured hook fires.
  • EntropicPurpleAgenthandle_message_sync with JSON task payload → _format_crm_task + harness injection → correct flat {kind: "message", ...} envelope; session ordering preserved.

Both bench-specific quirks preserved end-to-end. Refactor LGTM, merging. Thanks @kiranannadatha8

@bordeauxred bordeauxred merged commit 2e91e32 into aganthos:main Apr 20, 2026
5 checks passed
dantp-ai pushed a commit to dantp-ai/clawloop that referenced this pull request Apr 24, 2026
…aganthos#51)

* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes aganthos#40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.

* refactor: hoist stdlib imports in _purple_base, clarify reconcile scope

Addresses Gemini review on aganthos#51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.
bordeauxred pushed a commit that referenced this pull request May 3, 2026
* feat: Weights & Biases sink integration

* refactor: extract common base for purple-agent adapters (#40) (#51)

* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.

* refactor: hoist stdlib imports in _purple_base, clarify reconcile scope

Addresses Gemini review on #51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.

* refactor: extract helpers from learning_loop — fixes #39 (#58)

Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).

* fix: sync log_iteration with self._step

* Ignore wandb dir

---------

Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>
bordeauxred pushed a commit that referenced this pull request May 22, 2026
* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.

* refactor: hoist stdlib imports in _purple_base, clarify reconcile scope

Addresses Gemini review on #51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.
bordeauxred pushed a commit that referenced this pull request May 22, 2026
* feat: Weights & Biases sink integration

* refactor: extract common base for purple-agent adapters (#40) (#51)

* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40

Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg
normalization, session state, tool-call id reconciliation, harness update)
into clawloop/environments/_purple_base.py. CAR and entropic adapters now
override only the two bench-specific seams: _build_initial_messages and
_format_a2a_response.

No behavior change. 600 lines deleted, 379 added across the three files.

* refactor: hoist stdlib imports in _purple_base, clarify reconcile scope

Addresses Gemini review on #51:
- Move socket, time, httpx to module-level imports (PEP 8).
- Expand docstring + comment on _reconcile_tool_call_id to explain why
  it intentionally stops at the most-recent assistant message.

No behavior change.

* refactor: extract helpers from learning_loop — fixes #39 (#58)

Split the 486-line learning_loop() god-function into three focused
helper classes plus five module-private glue functions.

New modules:
  - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling
    + 3-way adapter dispatch)
  - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level
    counters + all archive writes)
  - clawloop/core/transaction.py (LayerTransaction — two-phase
    fb→optim→rollback protocol with cross-layer rollback invariant)

learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines.

Also:
  - Add Harness.pending_paradigm_insights() so LayerTransaction can
    query paradigm-tagged insights without touching _pending directly.
  - Move iter_cost accumulation outside the archive try-block in
    ArchiveRecorder so total_cost_tokens is tracked even when
    log_iteration fails.

No public API change on learning_loop. Full suite unchanged
(961 passed / 42 skipped) plus 28 new unit tests covering the
extracted helpers (runner 97%, archive_recorder 92%,
transaction 91% coverage).

* fix: sync log_iteration with self._step

* Ignore wandb dir

---------

Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: extract common base for purple-agent adapters (CAR + entropic)

2 participants