Summarization Agent #170

jjasinsk-amd · 2026-01-12T09:50:03Z

Changes

This PR introduces a summarization agent and exposes its endpoint via the MCP bridge

Features

MCP Bridge Integration: Data is sent as multipart/form-data to leverage OCR capabilities from GAIA and Lemonade Server.
Streaming Responses: Summarization results are streamed back to the frontend in real time.
Iterative Summarization: The summary is produced iteratively in order to minimize TTFT.
PDF Text Caching: Extracted text from PDF files is cached for improved performance.
Preservation of Existing Features: Original features of the summarization app (transcripts, e-mail summarization) remain functional.

kovtcharov-amd · 2026-01-12T09:52:25Z

@claude review this PR.

github-actions · 2026-01-12T09:53:07Z

Claude finished @kovtcharov-amd's task —— View job

PR Review: Summarization Agent

Read PR diff and understand changes
Review src/gaia/agents/summarize/agent.py
Review src/gaia/mcp/mcp_bridge.py changes
Review test files
Review other changed files
Provide comprehensive feedback

Summary

This PR introduces a well-structured SummarizerAgent that consolidates summarization functionality into a reusable agent, with MCP bridge integration for streaming summarization. The overall architecture is good, but there are several issues that should be addressed before merging.

Issues to Address

Critical

Copyright Header Year Inconsistency (src/gaia/agents/summarize/agent.py:1, prompts.yml:1)
- Files use 2025-2026 but project standard is 2024-2025
- Should match the pattern in CLAUDE.md: Copyright(C) 2024-2025 Advanced Micro Devices, Inc.
Silenced Exceptions Hide Errors (src/gaia/agents/summarize/agent.py:96-101, 203-204, 336, 371-372, 375-376, 676-678)
- Multiple bare except Exception: blocks with pass or break hide potential errors
- Example at line 336: except Exception: just breaks silently, making debugging difficult
- At minimum, log warnings instead of silently ignoring errors

High Priority

Potential Security: Temp File Cleanup Race Condition (src/gaia/mcp/mcp_bridge.py:402-409)
```
with tempfile.NamedTemporaryFile(delete=False, suffix=ext or ".pdf") as tmpfile:
    buf = file_rec.get("file_object")
    buf.seek(0)
    shutil.copyfileobj(buf, tmpfile)
    tmpfile_path = tmpfile.name
```
- Temp file created with delete=False and cleaned up later - ensure cleanup happens in all error paths
- Consider using a context manager pattern that guarantees cleanup
Missing Style Validation in Streaming (src/gaia/agents/summarize/agent.py:430)
- summarize_stream validates style but _execute_summarize in mcp_bridge accepts any style string
- Invalid styles could cause runtime errors during streaming
Hardcoded Model Default (src/gaia/agents/summarize/agent.py:109)
```
DEFAULT_MODEL = "Qwen3-4B-Instruct-2507-GGUF"
```
- This differs from the project default Qwen2.5-0.5B-Instruct-CPU mentioned in CLAUDE.md
- Consider using from gaia.llm.lemonade_client import DEFAULT_MODEL_NAME for consistency

Medium Priority

Token Counting is Very Approximate (src/gaia/agents/summarize/agent.py:29-37)

def count_tokens(self, text: str) -> int:
    chars = len(text)
    words = len(text.split())
    est_by_chars = chars // 4
    est_by_words = int(words * 1.3)
    num_tokens = max(est_by_chars, est_by_words)

This approximation may lead to context window overflows or under-utilization
Consider adding a safety margin or documenting the limitation

Verbose Logging During Iteration (src/gaia/agents/summarize/agent.py:319-321)
```
self.log.info(
    f"[LLM Prompt - chunk {i+1}/{len(chunks)}] {base_prompt}..."
)
```
- Logs the entire prompt which can be very large - truncate in log message
Repeated Code in Fallback Path (src/gaia/agents/summarize/agent.py:336-372)
- The fallback logic in _iterative_summary_events duplicates the main streaming logic
- Consider extracting to a helper method

Low Priority / Suggestions

Missing Type Hints in Test File (tests/test_summarizer.py)
- Test methods could benefit from return type hints for consistency
Test File Date in Future (tests/test_summarizer.py:233)
```
"Document generated for testing purposes - Date: 2026-01-08",
```
- Hardcoded future date - consider using dynamic date or removing
Redundant Path Conversion (src/gaia/agents/summarize/agent.py:838-840)
```
if not isinstance(file_path, Path):
    file_path = Path(file_path)
```
- This pattern appears multiple times - consider a decorator or single entry point validation

Positive Aspects

Clean Separation of Concerns: Agent logic cleanly separated from app wrapper
Comprehensive Test Coverage: Good integration tests covering CLI, streaming, and error cases
Streaming Support: Iterative summarization with SSE streaming is well-implemented
Caching: PDF text caching improves performance for repeated operations
Content Type Detection: Robust detection with heuristics + LLM fallback
Style System: Flexible YAML-based prompt configuration

Recommendation

Request Changes - The copyright header issue and silent exception handling should be addressed before merging. The other items are improvements that could be addressed in follow-up PRs.

kovtcharov-amd · 2026-01-14T00:39:08Z

Is this a good replacement for the existing summarizer that's already checked in?

jjasinsk-amd · 2026-01-14T10:18:04Z

Is this a good replacement for the existing summarizer that's already checked in?

The core features of the original summarizer application such as processing transcripts and emails were not changed. This PR adds the possibility to summarize PDF files and encapsulates the summarization logic within a dedicated agent, improving modularity and reusability.

src/gaia/agents/summarize/__init__.py

src/gaia/agents/summarize/agent.py

src/gaia/apps/summarize/app.py

src/gaia/agents/summarize/agent.py

jjasinsk-amd added 2 commits January 8, 2026 17:12

Summarizer agent - initial commit

44962ae

Merge remote-tracking branch 'origin' into jjasinsk/summarize_agent

1258711

jjasinsk-amd added 3 commits January 12, 2026 13:44

Review changes

13ea38f

Add agent tests to CI workflows

68693ff

Update CI workflow

6c9293d

kovtcharov-amd self-requested a review January 12, 2026 21:42

jjasinsk-amd added 2 commits January 13, 2026 12:45

Merge remote-tracking branch 'origin' into jjasinsk/summarize_agent

26d9909

Update CI workflow

6406c9a

github-actions bot added dependencies Dependency updates devops DevOps/infrastructure changes agents Agent system changes mcp MCP integration changes cli CLI changes tests Test changes labels Jan 13, 2026

kovtcharov-amd added 2 commits January 13, 2026 13:59

Merge branch 'main' into jjasinsk/summarize_agent

7820247

Merge branch 'main' into jjasinsk/summarize_agent

3312675

Merge branch 'main' into jjasinsk/summarize_agent

42416b6

jjasinsk-amd added 2 commits January 14, 2026 12:28

Merge remote-tracking branch 'origin' into jjasinsk/summarize_agent

528f7ff

Install missing dependencies for summarizer tests in CI

0e77aa3

kovtcharov-amd approved these changes Jan 14, 2026

View reviewed changes

Merge branch 'main' into jjasinsk/summarize_agent

5f7f2ca

kovtcharov-amd reviewed Jan 14, 2026

View reviewed changes

src/gaia/agents/summarize/agent.py Show resolved Hide resolved

kovtcharov-amd reviewed Jan 14, 2026

View reviewed changes

src/gaia/apps/summarize/app.py Show resolved Hide resolved

kovtcharov-amd reviewed Jan 14, 2026

View reviewed changes

src/gaia/agents/summarize/agent.py Show resolved Hide resolved

kovtcharov-amd added 2 commits January 14, 2026 10:03

Merge branch 'main' into jjasinsk/summarize_agent

4bce0c9

Merge branch 'main' into jjasinsk/summarize_agent

71c16e8

jjasinsk-amd and others added 6 commits January 15, 2026 10:19

Merge remote-tracking branch 'origin' into jjasinsk/summarize_agent

bf49ad5

Post-review changes

a678bd0

Fix tests

e90fad7

Merge branch 'main' into jjasinsk/summarize_agent

1e0ea61

Merge branch 'main' into jjasinsk/summarize_agent

dff3915

Merge branch 'main' into jjasinsk/summarize_agent

02ede19

jjasinsk-amd enabled auto-merge January 16, 2026 16:38

jjasinsk-amd added this pull request to the merge queue Jan 16, 2026

Merged via the queue into main with commit baf2fe1 Jan 16, 2026
68 checks passed

jjasinsk-amd deleted the jjasinsk/summarize_agent branch January 16, 2026 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summarization Agent #170

Summarization Agent #170

Uh oh!

jjasinsk-amd commented Jan 12, 2026 •

edited

Loading

Uh oh!

kovtcharov-amd commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

kovtcharov-amd commented Jan 14, 2026

Uh oh!

jjasinsk-amd commented Jan 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Summarization Agent #170

Summarization Agent #170

Uh oh!

Conversation

jjasinsk-amd commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Features

Uh oh!

kovtcharov-amd commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Summarization Agent

Summary

Issues to Address

Critical

High Priority

Medium Priority

Low Priority / Suggestions

Positive Aspects

Recommendation

Uh oh!

kovtcharov-amd commented Jan 14, 2026

Uh oh!

jjasinsk-amd commented Jan 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jjasinsk-amd commented Jan 12, 2026 •

edited

Loading

github-actions bot commented Jan 12, 2026 •

edited

Loading