-
Notifications
You must be signed in to change notification settings - Fork 0
web\_answering: stabilize control flow, fix stray try/except, and harden error handling #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughReplaces module-level logging with a project-wide JSON logging_config (context-aware, rotating file support, adapter-based loggers) and rewrites the web_answering endpoint into a stepwise multi-phase pipeline (search → fetch → rank → generate) with per-phase timing, timeouts, granular logging, and new request/response models and helpers. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as web_answering endpoint
participant Logger
participant SearchSvc as Search
participant FetchSvc as Fetch
participant RankSvc as Rank
participant GenSvc as Generate
Client->>API: POST /web-answer (WebSearchAnswerRequest)
API->>Logger: set_log_context(request_id,...)
API->>SearchSvc: search(query) %% phase: search
alt search timed out or empty
SearchSvc-->>API: timeout/empty
API->>Logger: log phase timeout/empty
API-->>Client: response (no results)
else search results
SearchSvc-->>API: snippets/ids
API->>FetchSvc: fetch(snippets/ids) %% phase: fetch
alt fetch includes snippets
FetchSvc-->>API: documents
else fetch failed/timeout
FetchSvc-->>API: timeout/error
API->>Logger: log phase timeout/error
end
API->>RankSvc: rank(evidence) %% phase: rank
RankSvc-->>API: ranked_evidence
API->>GenSvc: synthesize(ranked_evidence) %% phase: generate
GenSvc-->>API: answer, citations
API->>Logger: clear_log_context()
API-->>Client: WebSearchAnswerResponse (answer, citations, timings)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
api/endpoints/web_answering.py (8)
153-166
: Unify search time budget in a single variableAvoid recomputing the 40% time budget twice and risking drift. Compute once and reuse for both the inner call and wait_for.
Apply this diff:
- search_start = time.time() + search_start = time.time() + search_budget_s = min(request.timeout_seconds * 0.4, 10.0) try: search_results = await asyncio.wait_for( search_service.search( query=request.query, k=request.top_k_results, region=request.region, language=request.language, - timeout_seconds=min(request.timeout_seconds * 0.4, 10) # Allocate 40% of timeout + timeout_seconds=search_budget_s # Allocate 40% of timeout (capped at 10s) ), - timeout=request.timeout_seconds * 0.4 + timeout=search_budget_s )
182-191
: Align fetch timeouts and consider removing double timeoutUse a single computed budget to avoid mismatches. If fetch_search_results already enforces its own timeout, consider dropping outer wait_for to prevent double cancellations.
Apply this diff:
- fetch_start = time.time() + fetch_start = time.time() + fetch_budget_s = min(request.timeout_seconds * 0.3, 8.0) try: fetched_docs = await asyncio.wait_for( fetch_service.fetch_search_results( search_results=search_results, - timeout_seconds=min(request.timeout_seconds * 0.3, 8), # Allocate 30% of timeout + timeout_seconds=fetch_budget_s, # Allocate 30% of timeout (capped at 8s) preserve_snippets=request.include_snippets ), - timeout=request.timeout_seconds * 0.3 + timeout=fetch_budget_s )
206-215
: Unify ranking time budget and add generic exception handling
- Compute the 15% budget once and reuse.
- Also catch unexpected exceptions to log and return a generic 500 instead of leaking errors.
Apply this diff:
- rank_start = time.time() + rank_start = time.time() + rank_budget_s = request.timeout_seconds * 0.15 try: ranked_evidence = await asyncio.wait_for( ranking_service.rank_documents( query=request.query, docs=fetched_docs, max_context_chars=request.max_context_chars ), - timeout=request.timeout_seconds * 0.15 # Allocate 15% of timeout + timeout=rank_budget_s # Allocate 15% of timeout ) except asyncio.TimeoutError: logger.warning(f"Ranking timed out after {request.timeout_seconds * 0.15}s", extra=logger_ctx) raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Ranking phase timed out." ) + except Exception as e: + logger.error("Ranking failed: %s", e, extra=logger_ctx) + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail="Ranking phase failed." + )
232-242
: Unify synthesis time budget and add generic exception handling
- Compute the 15% budget once and reuse.
- Catch unexpected exceptions and return a generic 500 while logging.
Apply this diff:
- generate_start = time.time() + generate_start = time.time() + synth_budget_s = request.timeout_seconds * 0.15 try: synthesis_result = await asyncio.wait_for( synthesis_service.generate_answer( query=request.query, evidence_list=ranked_evidence, answer_tokens=request.answer_tokens, style_profile_id=request.style_profile_id ), - timeout=request.timeout_seconds * 0.15 # Allocate remaining time + timeout=synth_budget_s # Allocate remaining time ) except asyncio.TimeoutError: logger.warning(f"Synthesis timed out after {request.timeout_seconds * 0.15}s", extra=logger_ctx) raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Synthesis phase timed out." ) + except Exception as e: + logger.error("Synthesis failed: %s", e, extra=logger_ctx) + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail="Synthesis phase failed." + )
156-171
: Add generic exception handling to all phasesTo fully meet the goal of not leaking exception details and to ensure consistent logging, add broad exception handling (in addition to TimeoutError) for search and fetch too.
Apply these diffs:
try: search_results = await asyncio.wait_for( search_service.search( query=request.query, k=request.top_k_results, region=request.region, language=request.language, timeout_seconds=search_budget_s # Allocate 40% of timeout (capped at 10s) ), timeout=search_budget_s ) except asyncio.TimeoutError: logger.warning(f"Search timed out after {request.timeout_seconds * 0.4}s", extra=logger_ctx) raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Search phase timed out." ) + except Exception as e: + logger.error("Search failed: %s", e, extra=logger_ctx) + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail="Search phase failed." + )try: fetched_docs = await asyncio.wait_for( fetch_service.fetch_search_results( search_results=search_results, timeout_seconds=fetch_budget_s, # Allocate 30% of timeout (capped at 8s) preserve_snippets=request.include_snippets ), timeout=fetch_budget_s ) except asyncio.TimeoutError: logger.warning(f"Fetch timed out after {request.timeout_seconds * 0.3}s", extra=logger_ctx) # Continue with whatever we got fetched_docs = [] + except Exception as e: + logger.error("Fetch failed: %s", e, extra=logger_ctx) + # Continue with empty docs; ranking will handle and return 422 if needed + fetched_docs = []Also applies to: 183-203, 208-221, 234-248
262-275
: Harden citation snippet building against None/non-strGuard against None or non-string passage to avoid TypeError during slicing.
Apply this diff:
- for evidence in ranked_evidence: + for evidence in ranked_evidence: if evidence.id in synthesis_result.used_citation_ids: + passage = evidence.passage or "" + snippet = (passage[:200] + "...") if len(passage) > 200 else passage citations.append(Citation( id=evidence.id, url=evidence.url, title=evidence.title, site_name=evidence.site_name, published_at=evidence.published_at, - snippet=evidence.passage[:200] + "..." if len(evidence.passage) > 200 else evidence.passage, + snippet=snippet, score=evidence.score ))
126-131
: Prefer parameterized logging over f-stringsUse logger’s lazy formatting to avoid building strings when the log level is disabled and for consistency.
Example:
-logger.info( - f"Web search answer request: query='{request.query}', " - f"top_k={request.top_k_results}, timeout={request.timeout_seconds}s", - extra=logger_ctx -) +logger.info( + "Web search answer request: query='%s', top_k=%d, timeout=%ss", + request.query, request.top_k_results, request.timeout_seconds, + extra=logger_ctx +)Also applies to: 167-167, 179-179, 193-193, 203-203, 217-217, 229-229, 244-244, 294-297
79-85
: Consider time.perf_counter for higher-resolution timingsperf_counter is monotonic and better for measuring elapsed durations.
Example:
-start_time = time.time() +start_time = time.perf_counter() ... -timings["search"] = int((time.time() - search_start) * 1000) +timings["search"] = int((time.perf_counter() - search_start) * 1000) ... -timings["total"] = int((time.time() - start_time) * 1000) +timings["total"] = int((time.perf_counter() - start_time) * 1000)Also applies to: 251-253
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
api/endpoints/web_answering.py
(1 hunks)
🔇 Additional comments (2)
api/endpoints/web_answering.py (2)
120-120
: LGTM: start_time placementStarting the timer immediately after the docstring is fine and keeps service initialization within the total timing.
47-67
: Pydantic v2 dependency verified — no changes needed
requirements.txt pins pydantic>=2.6.0,<3.0.0 and api/endpoints/web_answering.py correctly imports field_validator (Pydantic v2 API).
* Enhance structured logging with improved JSON formatting and exception handling * Refactor logging setup to prevent duplicate stdout handlers and improve handler management
…xt-aware logging, and update JSON formatter for better structured output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (11)
api/endpoints/web_answering.py (7)
155-167
: Log service init failures with stack traces and preserve cause; satisfy Ruff TRY400/B904.Use logger.exception() and chain the HTTPException.
Apply:
- except Exception as e: - logger.error("Failed to initialize services: %s", e) - raise HTTPException( + except Exception as err: # noqa: BLE001 + logger.exception("Failed to initialize services") + raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="Service initialization error.", - ) + ) from err
168-187
: Align inner/outer timeouts; fix exception chaining per Ruff B904.Compute a single search budget and use it for both the service call and wait_for. Chain the HTTPException with from None to avoid misleading tracebacks.
- search_start = time.time() + search_start = time.time() + budget_search = min(request.timeout_seconds * 0.4, 10) try: search_results = await asyncio.wait_for( search_service.search( query=request.query, k=request.top_k_results, region=request.region, language=request.language, - timeout_seconds=min(request.timeout_seconds * 0.4, 10), # Allocate 40% of timeout + timeout_seconds=budget_search, # Allocate 40% of timeout ), - timeout=request.timeout_seconds * 0.4, + timeout=budget_search, ) except asyncio.TimeoutError: logger.warning(f"Search timed out after {request.timeout_seconds * 0.4}s") - raise HTTPException( + raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Search phase timed out.", - ) + ) from None
197-207
: Reuse a single fetch budget for consistency.Keeps the actual operation and wait bound aligned.
- fetch_start = time.time() + fetch_start = time.time() + budget_fetch = min(request.timeout_seconds * 0.3, 8) try: fetched_docs = await asyncio.wait_for( fetch_service.fetch_search_results( search_results=search_results, - timeout_seconds=min(request.timeout_seconds * 0.3, 8), # Allocate 30% of timeout + timeout_seconds=budget_fetch, # Allocate 30% of timeout preserve_snippets=request.include_snippets, ), - timeout=request.timeout_seconds * 0.3, + timeout=budget_fetch, )
207-218
: Graceful fallback when fetch times out and snippets were requested.Currently, a fetch timeout yields [], leading to 422 later. If snippets were requested, consider falling back to ranking over search snippets to avoid unnecessary failures.
Would you like me to draft a fallback that maps search_results into minimal doc objects (id/url/title/passage=snippet) when fetched_docs is empty and include_snippets is True?
231-236
: Chain HTTPException to satisfy Ruff B904.- raise HTTPException( + raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Ranking phase timed out.", - ) + ) from None
259-263
: Chain HTTPException to satisfy Ruff B904.- raise HTTPException( + raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Synthesis phase timed out.", - ) + ) from None
270-276
: 422 for no citations is correct; consider including a hint in logs about budgets.Good guard. Consider logging remaining overall budget for debugging flakiness.
services/logging_config.py (4)
26-26
: Mutable default for ContextVar; silence Ruff B039 or switch to an immutable default.Either mark the line or use an immutable mapping to avoid the warning.
Option A (minimal):
-_log_context: contextvars.ContextVar[Dict[str, Any]] = contextvars.ContextVar("_log_context", default={}) +_log_context: contextvars.ContextVar[Dict[str, Any]] = contextvars.ContextVar("_log_context", default={}) # noqa: B039Option B (immutable default, preferred):
+from types import MappingProxyType -_log_context: contextvars.ContextVar[Dict[str, Any]] = contextvars.ContextVar("_log_context", default={}) +_log_context = contextvars.ContextVar("_log_context", default=MappingProxyType({}))If you choose option B, adjust calls like
_log_context.get({})
to_log_context.get()
and keep usingdict(...)
where mutation is needed.
66-69
: Don’t catch broad Exception around json.dumps; catch TypeError.Keeps noise out of logs and satisfies linters.
- except Exception: + except TypeError: base[k] = str(v)
90-99
: Narrow exception when parsing LOG_LEVEL env; satisfies Ruff BLE001.Only ValueError is expected from int().
- try: - level = int(env_level) - except Exception: - level = getattr(logging, env_level.upper(), logging.INFO) + try: + level = int(env_level) + except ValueError: + level = getattr(logging, env_level.upper(), logging.INFO)
128-137
: Make add_rotating_file_handler idempotent; add UTF‑8 encoding.Prevents duplicate file handlers on reload and ensures encoding.
-def add_rotating_file_handler(path: str, max_bytes: int = 10_000_000, backup_count: int = 3, *, level: Optional[int] = None, - pretty: bool = False) -> RotatingFileHandler: +def add_rotating_file_handler(path: str, max_bytes: int = 10_000_000, backup_count: int = 3, *, level: Optional[int] = None, + pretty: bool = False) -> RotatingFileHandler: """Add a RotatingFileHandler to the root logger and return it.""" - handler = RotatingFileHandler(path, maxBytes=max_bytes, backupCount=backup_count) + import os as _os + abs_path = _os.path.abspath(path) + root = logging.getLogger() + for h in root.handlers: + if isinstance(h, RotatingFileHandler) and getattr(h, "baseFilename", None) == abs_path: + return h + handler = RotatingFileHandler(abs_path, maxBytes=max_bytes, backupCount=backup_count, encoding="utf-8") handler.setFormatter(JsonFormatter(pretty=pretty)) if level is None: level = logging.getLogger().level handler.setLevel(level) - logging.getLogger().addHandler(handler) + root.addHandler(handler) return handler
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
api/endpoints/web_answering.py
(2 hunks)services/logging_config.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
api/endpoints/web_answering.py (1)
services/logging_config.py (5)
setup_logging
(76-125)get_logger
(171-174)set_log_context
(157-160)clear_log_context
(167-168)add_rotating_file_handler
(128-137)
🪛 Ruff (0.13.1)
services/logging_config.py
26-26: Do not use mutable data structures for ContextVar
defaults
(B039)
38-38: Unused method argument: datefmt
(ARG002)
68-68: Do not catch blind exception: Exception
(BLE001)
95-95: Do not catch blind exception: Exception
(BLE001)
api/endpoints/web_answering.py
161-161: Do not catch blind exception: Exception
(BLE001)
162-162: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
163-166: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
183-186: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
233-236: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
260-263: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (3)
api/endpoints/web_answering.py (2)
315-317
: Nice success log with totals and citations.LGTM.
320-322
: Good hygiene clearing per-request log context.Prevents context bleed between requests.
services/logging_config.py (1)
171-175
: get_logger adapter looks good.API is clear and composes well with contextvars.
setup_logging() | ||
LOG_FILE = os.getenv("LOG_FILE") | ||
if LOG_FILE: | ||
# Add a rotating file handler if LOG_FILE is set in the environment | ||
try: | ||
add_rotating_file_handler(LOG_FILE) | ||
except Exception: | ||
# Avoid raising on startup if file handler can't be added | ||
logging.getLogger(__name__).exception("Failed to add rotating file handler") | ||
|
||
logger = get_logger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid configuring logging in an importable module; dedup rotating file handler.
Module import side effects can add duplicate handlers (especially under reload) and override app-level logging. Move setup to app startup, or at minimum guard file handler addition.
Apply this diff to make the endpoint side‑effect free (preferred):
-setup_logging()
-LOG_FILE = os.getenv("LOG_FILE")
-if LOG_FILE:
- # Add a rotating file handler if LOG_FILE is set in the environment
- try:
- add_rotating_file_handler(LOG_FILE)
- except Exception:
- # Avoid raising on startup if file handler can't be added
- logging.getLogger(__name__).exception("Failed to add rotating file handler")
-
logger = get_logger(__name__)
If you must keep file logging here, dedup handlers by path:
-LOG_FILE = os.getenv("LOG_FILE")
-if LOG_FILE:
- # Add a rotating file handler if LOG_FILE is set in the environment
- try:
- add_rotating_file_handler(LOG_FILE)
- except Exception:
- # Avoid raising on startup if file handler can't be added
- logging.getLogger(__name__).exception("Failed to add rotating file handler")
+LOG_FILE = os.getenv("LOG_FILE")
+if LOG_FILE:
+ try:
+ root = logging.getLogger()
+ # prevent duplicates on reload
+ if not any(
+ getattr(h, "baseFilename", None) == os.path.abspath(LOG_FILE)
+ for h in root.handlers
+ if h.__class__.__name__ == "RotatingFileHandler"
+ ):
+ add_rotating_file_handler(LOG_FILE)
+ except Exception:
+ logging.getLogger(__name__).exception("Failed to add rotating file handler")
Add this import outside the changed range:
from logging.handlers import RotatingFileHandler # if you choose the dedup path uses isinstance
🤖 Prompt for AI Agents
In api/endpoints/web_answering.py around lines 29 to 39, the module configures
logging at import time which causes side effects and can add duplicate rotating
file handlers; move setup_logging() and any add_rotating_file_handler(LOG_FILE)
calls into the application startup/init routine so the endpoint module remains
side‑effect free. If you cannot move it, add an import for RotatingFileHandler
and guard handler addition by scanning logging.getLogger().handlers for an
existing RotatingFileHandler with the same baseFilename (or same handler class
and filename) and only create/add a new RotatingFileHandler when none exists,
and keep the try/except but log the exception including context.
…xt-aware logging, and update JSON formatter for better structured output.
…rmaton/core into fix/web-answering-refactor
Stabilizes
web_answering.py
by removing syntax-breaking try/except blocks, consolidating service initialization, fixing citation assembly, and replacing leaked exception strings with generic 500 responses. Includes small timing/logging cleanups and (optionally) CI-safe tweak for Tavily tests.Features
Consistent timings & telemetry
start_time
and per-phase timestamps with a uniformtimings = {search|fetch|rank|generate|total}
.Safer HTTP responses
500
detail; full diagnostics stay in logs.Fixes
Import/parsing errors
try:
and mismatchedexcept
blocks that causedSyntaxError
/indentation failures on import.Broken response assembly
used_citation_ids
exist.422
now triggers only when no citations are produced, as intended.Error leakage
str(e)
in responses with generic messages; details are logged vialogger.error
.Refactors
Initialization
Cleanup
start_time = time.time()
immediately after docstring for clarity.Tests (related, optional to include)
tests/test_tavily_search.py
now CI-safe:TAVILY_API_KEY
is absent.pytest-asyncio
marker.How to QA
Import smoke test
python -c "import sys, pathlib; sys.path.insert(0, str(pathlib.Path('core').resolve())); import importlib; importlib.import_module('api.endpoints.web_answering'); print('IMPORT_OK')"
Runtime
/websearch/answer
with a valid payload (or mocked services); verify timings, citations, and generic 500s.Lint/Typecheck (optional)
Risks & Mitigations
Summary by CodeRabbit
New Features
Refactor
Notes