Skip to content

feat: eval narrative summaries#387

Merged
reybahl merged 8 commits intomasterfrom
eval-summaries
Mar 1, 2026
Merged

feat: eval narrative summaries#387
reybahl merged 8 commits intomasterfrom
eval-summaries

Conversation

@reybahl
Copy link
Copy Markdown
Member

@reybahl reybahl commented Feb 12, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added AI-powered summarization of narrative course evaluations, generating concise summaries from student comments
    • Support for OpenAI-compatible LLM providers with customizable models
  • Chores

    • Introduced GitHub Actions workflow for automated evaluation summarization
    • Added openai library as a runtime dependency

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds AI-assisted evaluation summarization pipeline including GitHub Actions workflow, OpenAI-compatible LLM client, database model for storing narrative summaries, evaluation processing module with async concurrency control, and transformation logic to integrate summaries into the database.

Changes

Cohort / File(s) Summary
GitHub Workflows & Configuration
.github/workflows/summarize-evals.yml, config/dev_summarize_evals.yml, config/release_summarize_evals.yml
Adds GitHub Actions workflow and development/release configurations for evaluation summarization pipeline with seasonal processing and OpenAI API integration.
LLM & AI Infrastructure
ferry/ai/client.py, ferry/ai/__init__.py
Introduces OpenAI-compatible LLM client with DEFAULT_MODEL constant ("gpt-4.1-mini"), async chat completion capability, configurable base_url and model overrides. Exports public API.
Evaluation Summarization
ferry/summarize/summarize_evals.py, ferry/summarize/__init__.py
Implements async evaluation summarization module that reads parsed evaluations, generates AI summaries via LLM with configurable concurrency (max 10 concurrent requests), enforces minimum comment threshold (3), and writes results to evaluation_summaries/*.json. Re-exports API.
Argument Parsing
ferry/args_parser.py
Extends RawArgs and Args with openai_api_key, llm_model, llm_base_url, and summarize_evals fields; adds corresponding CLI arguments and environment-based initialization; prompts for API key when summarization enabled.
Database Models
ferry/database/models.py, ferry/database/__init__.py
Adds EvaluationNarrativeSummary ORM model with composite primary key (course_id, question_code), foreign keys to courses and evaluation_questions, and summary text field. Exports model.
Data Transformation & Import
ferry/transform/import_evaluations.py, ferry/transform/__init__.py
Introduces import_evaluation_summaries helper to load AI-generated summaries from cache, map to course_ids via season/CRN lookup, and return DataFrame; adds evaluation_narrative_summaries to EvalTables; integrates into main import flow with deduplication.
Integration & Dependencies
main.py, pyproject.toml
Updates start_crawl to return list of processed seasons; integrates summarize_evals call in main pipeline with assertion for API key presence; adds openai>=1.0.0 dependency.

Sequence Diagram

sequenceDiagram
    participant User
    participant Main as main.py
    participant Crawl as start_crawl
    participant Summarize as summarize_evals
    participant Cache as Disk Cache
    participant LLMClient as LLMClient
    participant OpenAI as OpenAI API
    participant Database as Database

    User->>Main: Run with --summarize-evals
    Main->>Crawl: start_crawl(args)
    Crawl->>Database: Fetch/process course data
    Crawl-->>Main: Return seasons list
    
    Main->>Summarize: summarize_evals(seasons, api_key)
    loop For each season
        Summarize->>Cache: Load parsed_evaluations/{season}.json
        Summarize->>Cache: Load existing evaluation_summaries/{season}.json
        loop For each course with 3+ comments
            Summarize->>LLMClient: complete(narrative_comments)
            LLMClient->>OpenAI: POST /chat/completions
            OpenAI-->>LLMClient: AI-generated summary
            LLMClient-->>Summarize: Return summary text
        end
        Summarize->>Cache: Write evaluation_summaries/{season}.json
    end
    
    Summarize->>Database: Insert/update EvaluationNarrativeSummary records
    Database-->>Main: Summaries stored
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hop along, evaluations flow,
LLMs now help summaries grow,
Async hops with wisdom's glow,
AI whispers what students know,
Comments bloom where threads now sow! 🌱✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: eval narrative summaries' accurately reflects the main objective of the pull request, which adds AI-assisted summarization of narrative course evaluations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eval-summaries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread .github/workflows/summarize-evals.yml Fixed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.github/workflows/summarize-evals.yml:
- Around line 29-32: Update the GitHub Actions step that currently uses
actions/setup-python@v4 to the recommended v5; locate the workflow step with the
name "Set up Python 3.12" and change the uses value from actions/setup-python@v4
to actions/setup-python@v5 while keeping the existing python-version: "3.12"
configuration intact.
- Around line 42-49: The "Summarize Evaluations" workflow step currently
interpolates `${{ inputs.seasons }}` directly into the shell which risks command
injection; change the step to set an environment variable (e.g., SEASONS) from
`${{ inputs.seasons }}` and then call python main.py with --seasons "$SEASONS"
(preserving other envs like OPENAI_API_KEY and SENTRY_URL), ensuring the value
is passed as a single quoted/escaped argument instead of being expanded into the
shell command.
- Around line 10-13: Add an explicit least-privilege permissions block to the
summarize_evals job to avoid default broad permissions; inside the job
definition (jobs -> summarize_evals) add a permissions mapping (for example at
minimum permissions: contents: read) or other exact scopes the job requires so
the workflow no longer relies on default permissions.

In `@ferry/summarize/summarize_evals.py`:
- Around line 83-94: The code assumes response.choices[0].message.content is
always a string and calls .strip(), which can raise AttributeError if content is
None; update the return logic in summarize_evals.py (the block using
openai_client.chat.completions.create and returning
response.choices[0].message.content.strip()) to safely handle None by checking
the value (e.g., assign content = response.choices[0].message.content or "" /
fallback text) and then call .strip() on the confirmed string, optionally
logging or raising a clear error if content is unexpectedly missing.
🧹 Nitpick comments (3)
ferry/summarize/summarize_evals.py (1)

14-14: Unused import: ujson.

The ujson import appears unused in this module. JSON operations are handled by the imported load_cache_json and save_cache_json functions from ferry.crawler.cache.

♻️ Proposed fix
-import ujson
main.py (1)

136-142: Consider using ValueError instead of assert for runtime validation.

Using assert for input validation can be bypassed if Python runs with -O (optimizations enabled). A ValueError provides a cleaner user experience and is more robust.

♻️ Proposed fix
     if args.summarize_evals:
-        assert args.openai_api_key, "OpenAI API key is required for --summarize-evals"
+        if not args.openai_api_key:
+            raise ValueError("OpenAI API key is required for --summarize-evals")
         await summarize_evals(
             seasons=seasons,
             data_dir=args.data_dir,
             openai_api_key=args.openai_api_key,
         )
pyproject.toml (1)

30-30: Consider pinning the openai version for consistency and reproducibility.

All other dependencies use exact version pinning (e.g., ==4.13.3), but openai>=1.0.0 uses a minimum version constraint. This inconsistency could lead to non-reproducible builds across environments—if a future OpenAI SDK release introduces behavioral changes, different versions might be installed in different builds. Either pin to a specific version matching your requirements or document the intentional flexibility if loose pinning is necessary for compatibility.

Comment thread .github/workflows/summarize-evals.yml
Comment thread .github/workflows/summarize-evals.yml
Comment thread .github/workflows/summarize-evals.yml
Comment thread ferry/summarize/summarize_evals.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
ferry/transform/import_evaluations.py (1)

53-104: Consider extracting the shared mapping logic.

The season_crn_to_course_id mapping (lines 65-75) duplicates the same logic from match_evaluations_to_courses (lines 21-31). Consider extracting this to a shared helper function to reduce duplication.

♻️ Example refactor
+def _build_season_crn_to_course_id(
+    listings: pd.DataFrame,
+) -> dict[str, dict[int, float]]:
+    """Build a nested dict mapping (season_code, crn) -> course_id."""
+    return (
+        listings[["season_code", "course_id", "crn"]]
+        .groupby("season_code")
+        .apply(
+            lambda x: x[["crn", "course_id"]]
+            .set_index("crn")["course_id"]
+            .astype(float)
+            .to_dict()
+        )
+        .to_dict()
+    )

Then use _build_season_crn_to_course_id(listings) in both match_evaluations_to_courses and import_evaluation_summaries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/transform/import_evaluations.py` around lines 53 - 104, The code
duplicates the season_crn_to_course_id construction inside
import_evaluation_summaries; extract that logic into a shared helper named e.g.
_build_season_crn_to_course_id(listings) that returns the same dict used now,
and replace the inline construction in both import_evaluation_summaries and
match_evaluations_to_courses with calls to
_build_season_crn_to_course_id(listings), keeping the existing behavior and
types (including the int casting and to_dict conversion).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ferry/summarize/summarize_evals.py`:
- Around line 151-155: The CRN may be an int in course_eval but
CourseSummary.crn is declared as str, so normalize CRNs to strings everywhere:
convert course_eval["crn"] to str when building the returned dict (the block
returning {"crn": course_eval["crn"], ...}) and also ensure any places that add
or compare CRNs (e.g., the membership check using already_done and the addition
where c["crn"] is used) convert values to str first; update instances involving
CourseSummary creation, the return value from summarize function, and the
membership logic that checks c["crn"] not in already_done to use str(c["crn"])
so types match consistently.
- Around line 28-34: Update the invalid model constant OPENAI_MODEL to the
correct production model name "gpt-4.1-mini-2025-04-14" (replace the current
"gpt-4.1-mini") so runtime API calls succeed, and remove the development cap by
setting MAX_COURSES_PER_SEASON to None (instead of 10) to avoid limiting
production summarization; change these two constants (OPENAI_MODEL and
MAX_COURSES_PER_SEASON) in summarize_evals.py accordingly.

---

Nitpick comments:
In `@ferry/transform/import_evaluations.py`:
- Around line 53-104: The code duplicates the season_crn_to_course_id
construction inside import_evaluation_summaries; extract that logic into a
shared helper named e.g. _build_season_crn_to_course_id(listings) that returns
the same dict used now, and replace the inline construction in both
import_evaluation_summaries and match_evaluations_to_courses with calls to
_build_season_crn_to_course_id(listings), keeping the existing behavior and
types (including the int casting and to_dict conversion).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd7d821 and 97fe6f7.

📒 Files selected for processing (7)
  • ferry/database/__init__.py
  • ferry/database/models.py
  • ferry/database/sync_db_courses_old.py
  • ferry/database/sync_db_evals.py
  • ferry/summarize/summarize_evals.py
  • ferry/transform/__init__.py
  • ferry/transform/import_evaluations.py
💤 Files with no reviewable changes (2)
  • ferry/database/sync_db_courses_old.py
  • ferry/database/sync_db_evals.py

Comment thread ferry/summarize/summarize_evals.py Outdated
Comment thread ferry/summarize/summarize_evals.py
@reybahl reybahl changed the title draft: eval summaries feat: eval narrative summaries Mar 1, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
.github/workflows/summarize-evals.yml (2)

31-34: ⚠️ Potential issue | 🟠 Major

Upgrade actions/setup-python to @v5.

Line [32] still uses actions/setup-python@v4, which is flagged by actionlint as too old for current runners.

🔧 Proposed fix
       - name: Set up Python 3.12
-        uses: actions/setup-python@v4
+        uses: actions/setup-python@v5
         with:
           python-version: "3.12"
#!/bin/bash
set -euo pipefail
# Verify outdated setup-python usage in workflows
rg -n 'uses:\s*actions/setup-python@v4' .github/workflows
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/summarize-evals.yml around lines 31 - 34, Update the
workflow step that sets up Python to use the latest setup-python action: replace
the uses value "actions/setup-python@v4" with "actions/setup-python@v5" in the
step named "Set up Python 3.12" so the workflow uses the supported action
version; keep the python-version: "3.12" line unchanged.

44-51: ⚠️ Potential issue | 🟠 Major

Avoid direct shell interpolation of workflow input.

Line [50] passes ${{ inputs.seasons }} directly into a shell command. This is injection-prone and can break argument boundaries.

🛡️ Proposed fix
       - name: Summarize Evaluations
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          SEASONS: ${{ inputs.seasons }}
         run: |
           python main.py \
             -f config/release_summarize_evals.yml \
-            --seasons ${{ inputs.seasons }} \
+            --seasons "$SEASONS" \
             --sentry-url "${{ secrets.SENTRY_URL }}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/summarize-evals.yml around lines 44 - 51, The workflow
step named "Summarize Evaluations" is directly interpolating ${{ inputs.seasons
}} into the shell command (python main.py ... --seasons ${{ inputs.seasons }}),
which is injection-prone; move the input into an environment variable (e.g., add
SEASONS: ${{ inputs.seasons }} alongside OPENAI_API_KEY) and update the python
invocation to use the env var with proper quoting (--seasons "$SEASONS") so the
--seasons argument is passed safely and preserves boundaries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/summarize-evals.yml:
- Around line 31-34: Update the workflow step that sets up Python to use the
latest setup-python action: replace the uses value "actions/setup-python@v4"
with "actions/setup-python@v5" in the step named "Set up Python 3.12" so the
workflow uses the supported action version; keep the python-version: "3.12" line
unchanged.
- Around line 44-51: The workflow step named "Summarize Evaluations" is directly
interpolating ${{ inputs.seasons }} into the shell command (python main.py ...
--seasons ${{ inputs.seasons }}), which is injection-prone; move the input into
an environment variable (e.g., add SEASONS: ${{ inputs.seasons }} alongside
OPENAI_API_KEY) and update the python invocation to use the env var with proper
quoting (--seasons "$SEASONS") so the --seasons argument is passed safely and
preserves boundaries.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97fe6f7 and 27302d3.

📒 Files selected for processing (1)
  • .github/workflows/summarize-evals.yml

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
ferry/summarize/summarize_evals.py (1)

28-34: ⚠️ Potential issue | 🔴 Critical

Fix invalid model name and remove development cap before merging.

These issues were flagged in a previous review and remain unaddressed:

  1. Line 28: "gpt-4.1-mini" is not a valid OpenAI model name. Use "gpt-4.1-mini-2025-04-14" or another valid model identifier.
  2. Line 34: MAX_COURSES_PER_SEASON = 10 is a development/testing artifact that will limit production summarization to only 10 courses per season. Set to None for production.
🔧 Proposed fix
 # OpenAI model to use for summarization.
-OPENAI_MODEL = "gpt-4.1-mini"
+OPENAI_MODEL = "gpt-4.1-mini-2025-04-14"

 # Maximum concurrent API requests to avoid rate-limit pressure.
 MAX_CONCURRENT_REQUESTS = 10

 # Cap courses per season for testing (set to None for no limit).
-MAX_COURSES_PER_SEASON = 10
+MAX_COURSES_PER_SEASON = None

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/summarize/summarize_evals.py` around lines 28 - 34, Update the invalid
model constant OPENAI_MODEL to a valid model identifier (e.g.,
"gpt-4.1-mini-2025-04-14") so API calls use a supported model, and remove the
development cap by setting MAX_COURSES_PER_SEASON to None (instead of 10) so
production summarization isn’t limited; adjust these constants where they are
defined (OPENAI_MODEL and MAX_COURSES_PER_SEASON) in summarize_evals.py.
🧹 Nitpick comments (2)
ferry/summarize/summarize_evals.py (2)

234-235: Verbose per-summary output may flood logs in production.

Line 235 prints each individual summary as it's generated. With many courses and multiple questions per course, this could produce excessive output. Consider reducing verbosity or making it conditional (e.g., via a --verbose flag or logging.debug).

♻️ Suggested refinement
             if result is not None:
                 new_summaries.append(result)
-                for ns in result["narrative_summaries"]:
-                    print(f"  [{result['season']}] crn={result['crn']} {ns['question_code']}: {ns['summary']}")
+                logging.debug(
+                    "Summarised crn=%s: %d questions",
+                    result["crn"],
+                    len(result["narrative_summaries"]),
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/summarize/summarize_evals.py` around lines 234 - 235, The loop that
prints each per-summary line (iterating over result["narrative_summaries"] and
using print(f"  [{result['season']}] crn={result['crn']} {ns['question_code']}:
{ns['summary']}")) is too verbose for production; change it to honor a verbosity
setting by replacing the print with a logging.debug (or logging.info) call and
wire that to a command-line --verbose flag (or an existing verbosity/config
variable) in summarize_evals.py so detailed per-summary lines are only emitted
when verbose mode is enabled.

146-153: Consider narrowing the exception type.

Catching bare Exception (flagged by static analysis) is intentional here for resilience, but could mask unexpected errors. Consider catching openai.OpenAIError (or its subclasses) specifically, while letting truly unexpected exceptions propagate.

♻️ Optional refinement
+from openai import OpenAIError
+
 # ... in _summarize_course:
-        except Exception as exc:
+        except (OpenAIError, ValueError) as exc:
             logging.warning(
                 "Failed to summarize %s/%s crn=%s: %s",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/summarize/summarize_evals.py` around lines 146 - 153, Narrow the except
clause in the summarization try/except block to catch OpenAI-specific errors
instead of a bare Exception: replace "except Exception as exc" with an exception
handler for openai.OpenAIError (or its relevant subclasses) around the code that
calls the OpenAI API, log the error with the same message including course_eval
and question_code, and allow other unexpected exceptions to propagate (or
re-raise them) so they aren't silently swallowed; locate the block referencing
course_eval.get("season"), question_code, and course_eval.get("crn") in
summarize_evals.py to make this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@ferry/summarize/summarize_evals.py`:
- Around line 28-34: Update the invalid model constant OPENAI_MODEL to a valid
model identifier (e.g., "gpt-4.1-mini-2025-04-14") so API calls use a supported
model, and remove the development cap by setting MAX_COURSES_PER_SEASON to None
(instead of 10) so production summarization isn’t limited; adjust these
constants where they are defined (OPENAI_MODEL and MAX_COURSES_PER_SEASON) in
summarize_evals.py.

---

Nitpick comments:
In `@ferry/summarize/summarize_evals.py`:
- Around line 234-235: The loop that prints each per-summary line (iterating
over result["narrative_summaries"] and using print(f"  [{result['season']}]
crn={result['crn']} {ns['question_code']}: {ns['summary']}")) is too verbose for
production; change it to honor a verbosity setting by replacing the print with a
logging.debug (or logging.info) call and wire that to a command-line --verbose
flag (or an existing verbosity/config variable) in summarize_evals.py so
detailed per-summary lines are only emitted when verbose mode is enabled.
- Around line 146-153: Narrow the except clause in the summarization try/except
block to catch OpenAI-specific errors instead of a bare Exception: replace
"except Exception as exc" with an exception handler for openai.OpenAIError (or
its relevant subclasses) around the code that calls the OpenAI API, log the
error with the same message including course_eval and question_code, and allow
other unexpected exceptions to propagate (or re-raise them) so they aren't
silently swallowed; locate the block referencing course_eval.get("season"),
question_code, and course_eval.get("crn") in summarize_evals.py to make this
change.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27302d3 and b84706b.

📒 Files selected for processing (1)
  • ferry/summarize/summarize_evals.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
ferry/ai/client.py (1)

32-32: Unnecessary super().__init__() call.

LLMClient doesn't inherit from any custom class (only implicitly from object), so the super().__init__() call is unnecessary.

♻️ Suggested fix
     def __init__(
         self,
         api_key: str,
         *,
         base_url: str | None = None,
         model: str = DEFAULT_MODEL,
     ) -> None:
-        super().__init__()
         self._client = self._create_client(api_key=api_key, base_url=base_url)
         self.model = model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/ai/client.py` at line 32, The constructor of LLMClient contains an
unnecessary super().__init__() call since LLMClient does not inherit from any
base class other than object; remove the super().__init__() invocation from
LLMClient.__init__ to clean up the constructor and avoid redundant calls (search
for the LLMClient class and its __init__ method to locate and delete the
super().__init__() line).
ferry/summarize/summarize_evals.py (2)

99-118: Tasks are created but awaited sequentially, losing parallelism.

The code creates tasks with asyncio.create_task() but then awaits them one-by-one in a sequential loop. This pattern doesn't fully leverage async parallelism since each task completes before the next is checked. While the semaphore allows up to 10 concurrent requests, the sequential awaiting here means only one question per course is processed at a time.

Consider using asyncio.gather() to await all tasks concurrently:

♻️ Proposed refactor
     if not tasks:
         return None

-    for question_code, question_text, task in tasks:
-        try:
-            summary_text = await task
+    results = await asyncio.gather(*[t[2] for t in tasks], return_exceptions=True)
+    
+    for (question_code, question_text, _), result in zip(tasks, results):
+        if isinstance(result, Exception):
+            logging.warning(
+                "Failed to summarize %s/%s crn=%s: %s",
+                course_eval.get("season"),
+                question_code,
+                course_eval.get("crn"),
+                result,
+            )
+            continue
+        
+        if result:
             summaries.append(
                 {
                     "question_code": question_code,
                     "question_text": question_text,
-                    "summary": summary_text,
+                    "summary": result,
                 }
             )
-        except Exception as exc:
-            logging.warning(
-                "Failed to summarize %s/%s crn=%s: %s",
-                course_eval.get("season"),
-                question_code,
-                course_eval.get("crn"),
-                exc,
-            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/summarize/summarize_evals.py` around lines 99 - 118, The loop currently
awaits each asyncio.create_task sequentially, losing parallelism; change the
logic that creates tasks (tasks: list[tuple[str, str, asyncio.Task[str]]]) to
await all tasks concurrently using asyncio.gather on the task objects (e.g.,
gather(*(t for _, _, t in tasks))), then iterate the gathered results and pair
each summary result back to its question_code and question_text to build the
return value; ensure you handle exceptions from gather (use
return_exceptions=True or wrap handling) so a failing summary doesn't cancel
others, and preserve the use of semaphore and _summarize_comments.

219-224: Courses are processed sequentially, limiting throughput.

The for loop with await _summarize_course() processes one course at a time. Combined with the sequential task awaiting inside _summarize_course, this significantly limits throughput despite having a semaphore for 10 concurrent requests.

Consider batching course processing to better utilize the semaphore:

♻️ Proposed approach
# Process courses in batches to better utilize concurrency
async def process_batch(batch):
    return await asyncio.gather(
        *[_summarize_course(llm, c, semaphore) for c in batch],
        return_exceptions=True
    )

batch_size = MAX_CONCURRENT_REQUESTS
for i in tqdm(range(0, len(to_process), batch_size), desc=f"Summarizing {season}"):
    batch = to_process[i:i + batch_size]
    results = await process_batch(batch)
    for result in results:
        if result is not None and not isinstance(result, Exception):
            new_summaries.append(result)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ferry/summarize/summarize_evals.py` around lines 219 - 224, The loop over
to_process calls await _summarize_course sequentially, underutilizing the
semaphore; change to batch-based concurrent execution by grouping to_process
into chunks (size matching your semaphore/MAX_CONCURRENT_REQUESTS) and use
asyncio.gather to run [_summarize_course(llm, c, semaphore) for c in batch]
concurrently, then iterate gather results, skip Exceptions and None, and append
valid results to new_summaries (ensuring you handle return_exceptions=True and
filter exceptions); reference functions/variables: _summarize_course, semaphore,
to_process, new_summaries, and the progress bar (tqdm) to show batch progress.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@main.py`:
- Around line 136-144: Replace the runtime assertion with explicit validation:
in the block that runs summarize_evals (check for args.summarize_evals), replace
the line "assert args.openai_api_key, ..." with an if-statement that checks "if
not args.openai_api_key:" and then terminate with a clear error, e.g. raise
SystemExit("API key is required for --summarize-evals") or call sys.exit(1)
after printing to stderr; keep the call to summarize_evals (and its parameters:
seasons, data_dir=args.data_dir, api_key=args.openai_api_key,
model=args.llm_model or DEFAULT_MODEL, base_url=args.llm_base_url) unchanged.

---

Nitpick comments:
In `@ferry/ai/client.py`:
- Line 32: The constructor of LLMClient contains an unnecessary
super().__init__() call since LLMClient does not inherit from any base class
other than object; remove the super().__init__() invocation from
LLMClient.__init__ to clean up the constructor and avoid redundant calls (search
for the LLMClient class and its __init__ method to locate and delete the
super().__init__() line).

In `@ferry/summarize/summarize_evals.py`:
- Around line 99-118: The loop currently awaits each asyncio.create_task
sequentially, losing parallelism; change the logic that creates tasks (tasks:
list[tuple[str, str, asyncio.Task[str]]]) to await all tasks concurrently using
asyncio.gather on the task objects (e.g., gather(*(t for _, _, t in tasks))),
then iterate the gathered results and pair each summary result back to its
question_code and question_text to build the return value; ensure you handle
exceptions from gather (use return_exceptions=True or wrap handling) so a
failing summary doesn't cancel others, and preserve the use of semaphore and
_summarize_comments.
- Around line 219-224: The loop over to_process calls await _summarize_course
sequentially, underutilizing the semaphore; change to batch-based concurrent
execution by grouping to_process into chunks (size matching your
semaphore/MAX_CONCURRENT_REQUESTS) and use asyncio.gather to run
[_summarize_course(llm, c, semaphore) for c in batch] concurrently, then iterate
gather results, skip Exceptions and None, and append valid results to
new_summaries (ensuring you handle return_exceptions=True and filter
exceptions); reference functions/variables: _summarize_course, semaphore,
to_process, new_summaries, and the progress bar (tqdm) to show batch progress.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b84706b and 66c6fac.

📒 Files selected for processing (7)
  • config/release_summarize_evals.yml
  • ferry/ai/__init__.py
  • ferry/ai/client.py
  • ferry/args_parser.py
  • ferry/summarize/__init__.py
  • ferry/summarize/summarize_evals.py
  • main.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • config/release_summarize_evals.yml

Comment thread main.py
Comment on lines +136 to +144
if args.summarize_evals:
assert args.openai_api_key, "API key is required for --summarize-evals"
await summarize_evals(
seasons=seasons,
data_dir=args.data_dir,
api_key=args.openai_api_key,
model=args.llm_model or DEFAULT_MODEL,
base_url=args.llm_base_url,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace assert with explicit error handling for API key validation.

Using assert for runtime validation is problematic because assertions can be disabled when Python runs with optimization flags (-O or -OO). This would silently bypass the API key check in optimized builds.

🛡️ Proposed fix
     if args.summarize_evals:
-        assert args.openai_api_key, "API key is required for --summarize-evals"
+        if not args.openai_api_key:
+            raise SystemExit("API key is required for --summarize-evals")
         await summarize_evals(
             seasons=seasons,
             data_dir=args.data_dir,
             api_key=args.openai_api_key,
             model=args.llm_model or DEFAULT_MODEL,
             base_url=args.llm_base_url,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 136 - 144, Replace the runtime assertion with explicit
validation: in the block that runs summarize_evals (check for
args.summarize_evals), replace the line "assert args.openai_api_key, ..." with
an if-statement that checks "if not args.openai_api_key:" and then terminate
with a clear error, e.g. raise SystemExit("API key is required for
--summarize-evals") or call sys.exit(1) after printing to stderr; keep the call
to summarize_evals (and its parameters: seasons, data_dir=args.data_dir,
api_key=args.openai_api_key, model=args.llm_model or DEFAULT_MODEL,
base_url=args.llm_base_url) unchanged.

@reybahl reybahl merged commit 4e3c7fb into master Mar 1, 2026
5 checks passed
@reybahl reybahl deleted the eval-summaries branch March 1, 2026 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants