refactor: extract common base for purple-agent adapters (#40) by kiranannadatha8 · Pull Request #51 · aganthos/clawloop

kiranannadatha8 · 2026-04-19T21:54:03Z

Summary

Moves shared scaffolding into _purple_base.py so CAR and entropic adapters only carry their bench-specific prompt/envelope logic.
Net −221 LOC across the three files; _convert_tools_to_openai and friends live in one place now.
No behavior change — same wire format on both sides.

Test plan

pytest tests/test_car_purple.py tests/test_entropic_purple.py tests/test_car_adapter.py tests/test_entropic_adapter.py — 52/52 green
Full suite: 958 passed, 49 skipped
ruff check --select E,F,I + ruff format --check clean

Closes #40.

…aganthos#40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files.

gemini-code-assist

Code Review

This pull request introduces a shared base class, _PurpleAgentBase, to consolidate common logic for A2A purple agents, significantly reducing code duplication in the CAR and Entropic agent implementations. Feedback highlights a logic error in _reconcile_tool_call_id where an early return prematurely stops the search for assistant messages, a lack of error handling for LLM completion calls which could lead to inconsistent session states, and a suggestion to move imports to the top of the file for better performance and idiomatic structure.

gemini-code-assist · 2026-04-19T21:59:39Z

+    def handle_message_sync(self, jsonrpc_request: dict) -> dict:
+        """Handle one ``message/send`` request (sync — litellm is sync)."""
+        params = jsonrpc_request["params"]
+        msg = params["message"]
+        context_id = params.get("contextId", "default")
+
+        text_parts = [p["text"] for p in msg["parts"] if p.get("kind") == "text"]
+        data_parts = [p["data"] for p in msg["parts"] if p.get("kind") == "data"]
+
+        if context_id not in self._sessions:
+            self._sessions[context_id] = []
+        messages = self._sessions[context_id]
+
+        if not messages:
+            messages.extend(self._build_initial_messages(text_parts))
+            for d in data_parts:
+                if "tools" in d:
+                    self._tool_cache[context_id] = self._convert_tools_to_openai(d["tools"])
+        else:
+            for d in data_parts:
+                if "tool_results" in d:
+                    for tr in d["tool_results"]:
+                        green_id = tr["tool_call_id"]
+                        tool_name = tr.get("tool_name", "")
+                        self._reconcile_tool_call_id(messages, tool_name, green_id)
+                        messages.append(
+                            {
+                                "role": "tool",
+                                "tool_call_id": green_id,
+                                "content": tr["content"],
+                            }
+                        )
+            for text in text_parts:
+                if text.strip():
+                    messages.append({"role": "user", "content": text})
+
+        tools = self._tool_cache.get(context_id)
+        completion_kwargs: dict[str, Any] = {
+            "model": self.model,
+            "messages": messages,
+            "temperature": 0.0,
+        }
+        if tools:
+            completion_kwargs["tools"] = tools
+        if self.api_base:
+            completion_kwargs["api_base"] = self.api_base
+        if self.api_key:
+            completion_kwargs["api_key"] = self.api_key
+
+        response = litellm.completion(**completion_kwargs)
+        assistant_msg = response.choices[0].message
+
+        normalized = self._normalize_assistant_msg(assistant_msg)
+        messages.append(normalized)
+        self._capture_assistant(context_id, normalized)
+
+        return self._format_a2a_response(assistant_msg)


The handle_message_sync method lacks error handling for the litellm.completion call. Network issues, API timeouts, or model-specific errors will raise uncaught exceptions, potentially leaving the session state (self._sessions) in an inconsistent state since messages are appended before the call. Consider wrapping the completion call in a try-except block to handle common LLM provider errors.

Out of scope for a pure refactor — worth a separate issue to design the fail-up / session-rollback semantics.

bordeauxred · 2026-04-19T22:08:22Z

Hi @kiranannadatha8,
thanks a lot for your PR! Please address the comments from Gemini.

Addresses Gemini review on aganthos#51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change.

kiranannadatha8 · 2026-04-19T23:28:57Z

Thanks for the review! Addressed in aabec30:

Imports: hoisted socket, time, httpx to module top.
Early return in _reconcile_tool_call_id: kept as-is (pre-existing behavior — tool results always follow their triggering assistant call, so walking older messages would reconcile against stale ids). Expanded the docstring + inline comment so the intent is clear.
litellm.completion error handling: also pre-existing. Out of scope for a pure refactor — worth a separate issue to design the fail-up / session-rollback semantics.

bordeauxred · 2026-04-20T23:45:42Z

Tested locally against real Gemini (gemini-2.0-flash-lite via litellm):

CarPurpleAgent — full HTTP path through start_purple_server → agent-card + JSON-RPC message/send → live completion → correct {message: {...}} envelope; _captured hook fires.
EntropicPurpleAgent — handle_message_sync with JSON task payload → _format_crm_task + harness injection → correct flat {kind: "message", ...} envelope; session ordering preserved.

Both bench-specific quirks preserved end-to-end. Refactor LGTM, merging. Thanks @kiranannadatha8

…aganthos#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes aganthos#40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on aganthos#51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change.

* feat: Weights & Biases sink integration * refactor: extract common base for purple-agent adapters (#40) (#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on #51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change. * refactor: extract helpers from learning_loop — fixes #39 (#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage). * fix: sync log_iteration with self._step * Ignore wandb dir --------- Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>

* refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on #51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change.

* feat: Weights & Biases sink integration * refactor: extract common base for purple-agent adapters (#40) (#51) * refactor: extract _PurpleAgentBase for CAR + entropic adapters — fixes #40 Pulls the shared A2A scaffolding (tool schema conversion, assistant-msg normalization, session state, tool-call id reconciliation, harness update) into clawloop/environments/_purple_base.py. CAR and entropic adapters now override only the two bench-specific seams: _build_initial_messages and _format_a2a_response. No behavior change. 600 lines deleted, 379 added across the three files. * refactor: hoist stdlib imports in _purple_base, clarify reconcile scope Addresses Gemini review on #51: - Move socket, time, httpx to module-level imports (PEP 8). - Expand docstring + comment on _reconcile_tool_call_id to explain why it intentionally stops at the most-recent assistant message. No behavior change. * refactor: extract helpers from learning_loop — fixes #39 (#58) Split the 486-line learning_loop() god-function into three focused helper classes plus five module-private glue functions. New modules: - clawloop/core/runner.py (EpisodeCollectorRunner — task sampling + 3-way adapter dispatch) - clawloop/core/archive_recorder.py (ArchiveRecorder — owns run-level counters + all archive writes) - clawloop/core/transaction.py (LayerTransaction — two-phase fb→optim→rollback protocol with cross-layer rollback invariant) learning_loop() body: 486 → 100 lines. loop.py total: 742 → 337 lines. Also: - Add Harness.pending_paradigm_insights() so LayerTransaction can query paradigm-tagged insights without touching _pending directly. - Move iter_cost accumulation outside the archive try-block in ArchiveRecorder so total_cost_tokens is tracked even when log_iteration fails. No public API change on learning_loop. Full suite unchanged (961 passed / 42 skipped) plus 28 new unit tests covering the extracted helpers (runner 97%, archive_recorder 92%, transaction 91% coverage). * fix: sync log_iteration with self._step * Ignore wandb dir --------- Co-authored-by: kiranannadatha8 <87536091+kiranannadatha8@users.noreply.github.com>

gemini-code-assist Bot reviewed Apr 19, 2026

View reviewed changes

bordeauxred merged commit 2e91e32 into aganthos:main Apr 20, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: extract common base for purple-agent adapters (#40)#51

refactor: extract common base for purple-agent adapters (#40)#51
bordeauxred merged 2 commits into
aganthos:mainfrom
kiranannadatha8:40-purple-base

kiranannadatha8 commented Apr 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Uh oh!

kiranannadatha8 Apr 19, 2026

Uh oh!

Uh oh!

bordeauxred commented Apr 19, 2026

Uh oh!

kiranannadatha8 commented Apr 19, 2026

Uh oh!

bordeauxred commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kiranannadatha8 commented Apr 19, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

kiranannadatha8 Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bordeauxred commented Apr 19, 2026

Uh oh!

kiranannadatha8 commented Apr 19, 2026

Uh oh!

bordeauxred commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bordeauxred commented Apr 20, 2026 •

edited

Loading