-
Notifications
You must be signed in to change notification settings - Fork 0
Web Answering API: Cited Answers via Fetch→Rank→Synthesize #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a Perplexity-style web-answering FastAPI endpoint with internal API-key auth, plus new RankingService and SynthesisService modules, per-phase timeouted pipeline (search → fetch → rank → synthesize), request/response models, an integration test for Tavily, and related dataclasses/models. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as FastAPI /websearch/answer
participant Search as WebSearchService
participant Fetch as WebFetchService
participant Rank as RankingService
participant Synth as SynthesisService
Client->>API: POST /websearch/answer (query, params, X-Internal-API-Key)
API->>API: verify_internal_api_key
rect rgb(235,245,255)
note over API: Phase 1 — Search (<=40% timeout)
API->>Search: search(query, top_k_results, region, language, timeout)
Search-->>API: results or timeout/empty
end
alt no results
API-->>Client: 422 Unprocessable Entity
else results
rect rgb(235,245,255)
note over API: Phase 2 — Fetch (<=30% timeout)
API->>Fetch: fetch(results, timeout, include_snippets)
Fetch-->>API: fetched docs or empty
end
alt no docs (and snippets not allowed)
API-->>Client: 422 Unprocessable Entity
else docs
rect rgb(235,245,255)
note over API: Phase 3 — Rank (<=15% timeout)
API->>Rank: rank_documents(query, docs, max_context_chars)
Rank-->>API: ranked evidence or empty
end
alt no evidence
API-->>Client: 422 Unprocessable Entity
else evidence
rect rgb(235,245,255)
note over API: Phase 4 — Synthesis (remaining timeout)
API->>Synth: generate_answer(query, evidence, answer_tokens, style_profile_id)
Synth-->>API: answer_markdown, used_citation_ids, token usage
end
API->>API: assemble citations, timings_ms, meta
API-->>Client: 200 WebSearchAnswerResponse
end
end
end
par Any phase timeout
API-->>Client: 504 Gateway Timeout
end
par Unexpected error
API-->>Client: 500 Internal Server Error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (11)
services/synthesis_service.py (2)
133-148
: Harden citation extraction to body-only and filter to valid IDs.Regex currently counts markers in the References block and any hallucinated IDs. Limit to the body (before “References”) and intersect with evidence IDs.
Apply this diff:
-def _extract_citation_ids(self, text: str) -> Set[int]: +def _extract_citation_ids(self, text: str, valid_ids: Optional[Set[int]] = None) -> Set[int]: @@ - # Find all [^i] citations - citation_markers = re.findall(r'\[\^(\d+)\]', text) - - # Convert to integers and return as a set - return {int(id) for id in citation_markers if id.isdigit()} + # Consider citations only in the answer body (before "References") + body = text.split("\nReferences", 1)[0] + ids = {int(m) for m in re.findall(r'\[\^(\d+)\]', body)} + if valid_ids is not None: + ids &= set(valid_ids) + return idsAnd pass the valid IDs when extracting:
- used_citation_ids = self._extract_citation_ids(answer) + used_citation_ids = self._extract_citation_ids( + answer, valid_ids={e.id for e in evidence_list} + )Also applies to: 195-195
50-56
: Prompt looks good. Consider moving model name and tone defaults to settings.Keeps config centralized and testable.
services/ranking_service.py (2)
129-140
: Avoid relying on exceptions for control flow; handle empty embedding responses.
embed_texts([query])[0]
willIndexError
if API returns[]
. Check lengths before indexing.Apply this diff:
- # Embed query - query_embedding = embed_texts([query])[0] - - # Embed passages - passage_embeddings = embed_texts(passages) + # Embed query + q_embeds = embed_texts([query]) + if not q_embeds: + raise RuntimeError("Empty query embedding") + query_embedding = q_embeds[0] + + # Embed passages + passage_embeddings = embed_texts(passages) + if not passage_embeddings: + raise RuntimeError("Empty passage embeddings")
141-144
: Preferlogger.exception
and avoid f-strings in logs.Improves stack traces and avoids needless string interpolation.
Apply this diff:
- logger.error(f"Error during embedding or similarity computation: {str(e)}") + logger.exception("Error during embedding or similarity computation")api/endpoints/web_answering.py (5)
111-112
: Unused dependency variable; use_
to signal intent.Keeps API key check while silencing “unused variable”.
Apply this diff:
- api_key: str = Depends(verify_internal_api_key) + _: None = Depends(verify_internal_api_key)
21-25
: Import FetchedDoc for snippet-only fallback.Needed to build docs when fetch times out.
Apply this diff:
-from services.web_fetch_service import WebFetchService +from services.web_fetch_service import WebFetchService, FetchedDoc
195-199
: Timeout fallback: preserve snippets instead of dropping all docs.If fetch times out, degrade gracefully by constructing snippet-backed docs when
include_snippets=True
.Apply this diff:
- except asyncio.TimeoutError: - logger.warning(f"Fetch timed out after {request.timeout_seconds * 0.3}s", extra=logger_ctx) - # Continue with whatever we got - fetched_docs = [] + except asyncio.TimeoutError: + logger.warning( + "Fetch timed out after %.2fs", request.timeout_seconds * 0.3, extra=logger_ctx + ) + if request.include_snippets: + fetched_docs = [ + FetchedDoc( + url=r.url, + title=r.title, + site_name=r.site_name, + text=(r.snippet or ""), + published_at=getattr(r, "published_at", None), + fetch_ms=0, + ) + for r in search_results + if getattr(r, "snippet", None) + ] + else: + fetched_docs = []
121-125
: Use parameterized logging and avoid f-strings.Saves formatting cost and standardizes logs.
Apply this diff:
- logger.info( - f"Web search answer request: query='{request.query}', " - f"top_k={request.top_k_results}, timeout={request.timeout_seconds}s", - extra=logger_ctx - ) + logger.info( + "Web search answer request: query=%r, top_k=%d, timeout=%ds", + request.query, request.top_k_results, request.timeout_seconds, extra=logger_ctx + ) @@ - logger.info(f"Search completed with {len(search_results)} results", extra=logger_ctx) + logger.info("Search completed with %d results", len(search_results), extra=logger_ctx) @@ - logger.info(f"Fetch completed with {len(fetched_docs)} documents", extra=logger_ctx) + logger.info("Fetch completed with %d documents", len(fetched_docs), extra=logger_ctx) @@ - logger.info(f"Ranking completed with {len(ranked_evidence)} evidence passages", extra=logger_ctx) + logger.info("Ranking completed with %d evidence passages", len(ranked_evidence), extra=logger_ctx) @@ - logger.warning(f"Synthesis timed out after {request.timeout_seconds * 0.15}s", extra=logger_ctx) + logger.warning("Synthesis timed out after %.2fs", request.timeout_seconds * 0.15, extra=logger_ctx) @@ - logger.info( - f"Answer generated successfully in {timings['total']}ms with {len(citations)} citations", - extra=logger_ctx - ) + logger.info( + "Answer generated successfully in %dms with %d citations", + timings["total"], len(citations), extra=logger_ctx + )Also applies to: 181-182, 208-209, 237-238, 251-256, 303-306
121-125
: Minor: avoid logging full queries (possible PII).Consider masking or sampling in prod logs.
tests/test_tavily_search.py (2)
9-24
: Make this a stable, skippable pytest async test.As written, this will be flaky and may fail under pytest (async test without marker). Gate on API key and remove “today” from query.
Apply this diff:
+import pytest @@ -async def test_tavily_search(): +@pytest.mark.asyncio +async def test_tavily_search(): + if not os.getenv("TAVILY_API_KEY"): + pytest.skip("TAVILY_API_KEY not set; skipping Tavily integration test") @@ - query = "WHat is the result of the asia cup match between ban vs sri today?" + query = "Asia Cup cricket Bangladesh vs Sri Lanka result site:icc-cricket.com" @@ - assert len(results) > 0, "No search results returned" + assert len(results) > 0, "No search results returned"
39-44
: Drop broad try/except and prints in tests.Let pytest report failures; reserve prints for debugging.
- except Exception as e: - print(f"Error: {str(e)}") - import traceback - traceback.print_exc() - return False + except Exception: + # Let pytest surface the traceback naturally + raise
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
api/endpoints/web_answering.py
(1 hunks)services/ranking_service.py
(1 hunks)services/synthesis_service.py
(1 hunks)tests/test_tavily_search.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
services/ranking_service.py (3)
services/chunking_service.py (1)
chunk_text
(9-39)services/embedding_service.py (1)
embed_texts
(19-42)services/web_fetch_service.py (1)
FetchedDoc
(30-37)
api/endpoints/web_answering.py (4)
services/web_search_service.py (1)
WebSearchService
(197-298)services/web_fetch_service.py (2)
WebFetchService
(39-274)fetch_search_results
(235-274)services/ranking_service.py (2)
RankingService
(31-190)rank_documents
(95-190)services/synthesis_service.py (2)
SynthesisService
(29-224)generate_answer
(149-224)
services/synthesis_service.py (1)
services/ranking_service.py (1)
RankedEvidence
(21-29)
tests/test_tavily_search.py (1)
services/web_search_service.py (1)
WebSearchService
(197-298)
🪛 Ruff (0.13.1)
services/ranking_service.py
140-140: Do not catch blind exception: Exception
(BLE001)
141-141: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
141-141: Use explicit conversion flag
Replace with conversion flag
(RUF010)
api/endpoints/web_answering.py
142-142: Expected an indented block after try
statement
(invalid-syntax)
services/synthesis_service.py
45-45: Avoid specifying long messages outside the exception class
(TRY003)
210-210: Do not catch blind exception: Exception
(BLE001)
211-211: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
211-211: Use explicit conversion flag
Replace with conversion flag
(RUF010)
216-216: Use explicit conversion flag
Replace with conversion flag
(RUF010)
tests/test_tavily_search.py
38-38: Consider moving this statement to an else
block
(TRY300)
39-39: Do not catch blind exception: Exception
(BLE001)
40-40: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (2)
services/ranking_service.py (1)
183-189
: LGTM: sequential ID reassignment for citations.Ensures [^i] markers align with final selection order.
api/endpoints/web_answering.py (1)
107-107
: Endpoint path prefix "/internal" — verify router inclusion.Spec expects POST /internal/websearch/answer; file api/endpoints/web_answering.py defines @router.post("/websearch/answer", ...). Search returned no matches for app.include_router(router, prefix='/internal'); confirm the application mounts this router with prefix='/internal' (e.g. app.include_router(router, prefix='/internal')) or update the route to include '/internal'.
…d ranking and synthesis services for enhanced query responses
a714bcc
to
7d1bfb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
tests/test_tavily_search.py (1)
1-5
: Drop unused imports.sys and Path are unused.
-import sys -import asyncio -import os -from pathlib import Path +import os +import asyncioservices/ranking_service.py (2)
139-144
: Use logger.exception and keep logs structured; avoid f-strings.Improves observability and aligns with lint hints.
- except Exception as e: - logger.error(f"Error during embedding or similarity computation: {str(e)}") + except Exception: + logger.exception("Error during embedding or similarity computation") # Fallback: assign decreasing scores based on original order similarity_scores = [1.0 - (i / len(passages)) for i in range(len(passages))]
135-140
: Defensive check for embedding length mismatches.If the embedding service returns fewer vectors than passages (partial failure), fall back instead of raising later.
# Embed passages passage_embeddings = embed_texts(passages) - # Compute similarity scores + # Defensive: ensure shapes match + if len(passage_embeddings) != len(passages): + raise RuntimeError("Embedding count mismatch") + # Compute similarity scores similarity_scores = self._compute_similarity(query_embedding, passage_embeddings)api/endpoints/web_answering.py (2)
161-164
: Pass integer timeouts to providers and wait_for.Avoid float seconds where ints are expected.
- timeout_seconds=min(request.timeout_seconds * 0.4, 10) # Allocate 40% of timeout + timeout_seconds=int(min(request.timeout_seconds * 0.4, 10)) # Allocate 40% of timeout ), - timeout=request.timeout_seconds * 0.4 + timeout=float(request.timeout_seconds) * 0.4 @@ - timeout_seconds=min(request.timeout_seconds * 0.3, 8), # Allocate 30% of timeout + timeout_seconds=int(min(request.timeout_seconds * 0.3, 8)), # Allocate 30% of timeout ), - timeout=request.timeout_seconds * 0.3 + timeout=float(request.timeout_seconds) * 0.3 @@ - timeout=request.timeout_seconds * 0.15 # Allocate 15% of timeout + timeout=float(request.timeout_seconds) * 0.15 # Allocate 15% of timeout @@ - timeout=request.timeout_seconds * 0.15 # Allocate remaining time + timeout=float(request.timeout_seconds) * 0.15 # Allocate remaining timeAlso applies to: 189-193, 219-221, 249-250
33-35
: Verify internal route prefix.PR says the endpoint is internal under /internal. Confirm a parent router prefixes “/internal”; if not, add prefix or inline it on this route.
Possible change:
-router = APIRouter(tags=["websearch"]) +router = APIRouter(prefix="/internal", tags=["websearch"]) @@ -@router.post("/websearch/answer", response_model=WebSearchAnswerResponse, status_code=200) +@router.post("/websearch/answer", response_model=WebSearchAnswerResponse, status_code=200)Or keep router as-is and change path to "/internal/websearch/answer".
Also applies to: 107-112
services/synthesis_service.py (1)
190-201
: Defensively access token usage and use structured logs.Avoid attribute errors and f-strings in logs.
- prompt_tokens = response.usage.prompt_tokens - completion_tokens = response.usage.completion_tokens + prompt_tokens = getattr(response.usage, "prompt_tokens", 0) or 0 + completion_tokens = getattr(response.usage, "completion_tokens", 0) or 0 @@ - logger.info(f"Generated answer in {generation_ms}ms, {completion_tokens} tokens, {len(used_citation_ids)} citations") + logger.info( + "Generated answer in %dms, %d tokens, %d citations", + generation_ms, completion_tokens, len(used_citation_ids) + )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
api/endpoints/web_answering.py
(1 hunks)services/ranking_service.py
(1 hunks)services/synthesis_service.py
(1 hunks)tests/test_tavily_search.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
services/ranking_service.py (3)
services/chunking_service.py (1)
chunk_text
(9-39)services/embedding_service.py (1)
embed_texts
(19-42)services/web_fetch_service.py (1)
FetchedDoc
(31-38)
api/endpoints/web_answering.py (4)
services/web_search_service.py (1)
WebSearchService
(202-303)services/web_fetch_service.py (2)
WebFetchService
(40-301)fetch_search_results
(262-301)services/ranking_service.py (2)
RankingService
(31-190)rank_documents
(95-190)services/synthesis_service.py (2)
SynthesisService
(29-224)generate_answer
(149-224)
services/synthesis_service.py (1)
services/ranking_service.py (1)
RankedEvidence
(21-29)
tests/test_tavily_search.py (1)
services/web_search_service.py (1)
WebSearchService
(202-303)
🪛 Ruff (0.13.1)
services/ranking_service.py
140-140: Do not catch blind exception: Exception
(BLE001)
141-141: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
141-141: Use explicit conversion flag
Replace with conversion flag
(RUF010)
api/endpoints/web_answering.py
142-142: Expected an indented block after try
statement
(invalid-syntax)
services/synthesis_service.py
45-45: Avoid specifying long messages outside the exception class
(TRY003)
210-210: Do not catch blind exception: Exception
(BLE001)
211-211: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
211-211: Use explicit conversion flag
Replace with conversion flag
(RUF010)
216-216: Use explicit conversion flag
Replace with conversion flag
(RUF010)
tests/test_tavily_search.py
38-38: Consider moving this statement to an else
block
(TRY300)
39-39: Do not catch blind exception: Exception
(BLE001)
40-40: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (5)
services/ranking_service.py (1)
86-93
: Guard cosine normalization against zero-norm vectors.Zero norms yield NaNs and break ranking; add safe normalization.
- # Normalize vectors - query_vec = query_vec / np.linalg.norm(query_vec) - passage_vecs = passage_vecs / np.linalg.norm(passage_vecs, axis=1, keepdims=True) + # Normalize vectors with zero-norm guards + q_norm = np.linalg.norm(query_vec) + if q_norm == 0: + return [0.0] * len(passage_embeddings) + query_vec = query_vec / q_norm + p_norms = np.linalg.norm(passage_vecs, axis=1, keepdims=True) + p_norms[p_norms == 0] = 1.0 + passage_vecs = passage_vecs / p_normsapi/endpoints/web_answering.py (2)
18-18
: Migrate validators to Pydantic v2.Replace v1 @validator with @field_validator and update import.
-from pydantic import BaseModel, Field, validator +from pydantic import BaseModel, field_validator @@ - # Validators - @validator("top_k_results") - def validate_top_k_results(cls, v): + # Validators + @field_validator("top_k_results") + def validate_top_k_results(cls, v: int) -> int: return max(3, min(15, v)) # Clamp between 3 and 15 @@ - @validator("max_context_chars") - def validate_max_context_chars(cls, v): + @field_validator("max_context_chars") + def validate_max_context_chars(cls, v: int) -> int: return max(1000, min(50000, v)) # Clamp between 1000 and 50000 @@ - @validator("answer_tokens") - def validate_answer_tokens(cls, v): + @field_validator("answer_tokens") + def validate_answer_tokens(cls, v: int) -> int: return max(100, min(2000, v)) # Clamp between 100 and 2000 @@ - @validator("timeout_seconds") - def validate_timeout_seconds(cls, v): + @field_validator("timeout_seconds") + def validate_timeout_seconds(cls, v: int) -> int: return max(5, min(60, v)) # Clamp between 5 and 60 secondsAlso applies to: 47-63
139-154
: Fix syntax error: stray/duplicated try; endpoint won’t import.Remove the extra try and keep a single init block.
- try: - # Initialize services - # Initialize services - try: + # Initialize services + try: search_service = WebSearchService() fetch_service = WebFetchService() ranking_service = RankingService() synthesis_service = SynthesisService() except Exception as e: - logger.error(f"Failed to initialize services: {str(e)}", extra=logger_ctx) + logger.error("Failed to initialize services: %s", e, extra=logger_ctx) raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, - detail=f"Service initialization error: {str(e)}" - ) # 1. Search phase + detail="Service initialization error." + ) + # 1. Search phaseservices/synthesis_service.py (2)
15-16
: Make the client async to avoid blocking the event loop.Use AsyncOpenAI and await the call so endpoint timeouts work.
-from openai import OpenAI +from openai import AsyncOpenAI @@ def __init__(self, api_key: Optional[str] = None, model: str = "gpt-4o"): @@ - self.model = model - self.client = OpenAI(api_key=self.api_key) + self.model = model + self.client = AsyncOpenAI(api_key=self.api_key, timeout=30.0) @@ - response = self.client.chat.completions.create( + response = await self.client.chat.completions.create( model=self.model, messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ], temperature=0.2, max_tokens=answer_tokens, )Also applies to: 35-49, 174-185
210-224
: Don’t leak exception details; log stack and return a generic message.Aligns with security/observability practices.
- except Exception as e: - logger.error(f"Error generating answer: {str(e)}") + except Exception: + logger.exception("Error generating answer") @@ - fallback_answer = ( - f"I apologize, but I encountered an error while generating an answer to your question: \"{query}\"\n\n" - f"Error: {str(e)}\n\n" - f"Please try again with a different query or contact support if the problem persists." - ) + fallback_answer = ( + "I encountered an internal error while generating the answer. " + "Please try again or contact support if the problem persists." + )
…r handling and performance; update synthesis service for better logging and timeout management; enhance test for Tavily search service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
services/ranking_service.py (1)
146-150
: Prefer logger.exception and avoid f-string in logs; keep broad fallback.Catching Exception is acceptable here given the downstream fallback, but use logger.exception for stack traces and structured formatting.
Apply this diff:
- except Exception as e: - logger.error(f"Error during embedding or similarity computation: {str(e)}") + except Exception: + logger.exception("Error during embedding or similarity computation") # Fallback: assign decreasing scores based on original order similarity_scores = [1.0 - (i / len(passages)) for i in range(len(passages))]api/endpoints/web_answering.py (2)
111-116
: Don’t bind unused dependency parameter.The dependency is only for auth; bind to underscore to avoid “unused” and clarify intent.
Apply this diff:
-async def web_search_answer( - request: WebSearchAnswerRequest, - req: Request, - api_key: str = Depends(verify_internal_api_key) -): +async def web_search_answer( + request: WebSearchAnswerRequest, + req: Request, + _: None = Depends(verify_internal_api_key), +):
126-130
: Prefer structured logging over f-strings.Reduces formatting overhead when the log level is higher.
Apply this diff:
- logger.info( - f"Web search answer request: query='{request.query}', " - f"top_k={request.top_k_results}, timeout={request.timeout_seconds}s", - extra=logger_ctx - ) + logger.info( + "Web search answer request: query='%s', top_k=%d, timeout=%ss", + request.query, request.top_k_results, request.timeout_seconds, + extra=logger_ctx, + )services/synthesis_service.py (1)
15-18
: Remove unused OpenAI sync client import.Cleanup to avoid confusion; you’re using AsyncOpenAI.
Apply this diff:
-from openai import OpenAI -from services.ranking_service import RankedEvidence -from openai import AsyncOpenAI +from services.ranking_service import RankedEvidence +from openai import AsyncOpenAI
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
api/endpoints/web_answering.py
(1 hunks)services/ranking_service.py
(1 hunks)services/synthesis_service.py
(1 hunks)tests/test_tavily_search.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_tavily_search.py
🧰 Additional context used
🧬 Code graph analysis (3)
api/endpoints/web_answering.py (4)
services/web_search_service.py (1)
WebSearchService
(202-303)services/web_fetch_service.py (2)
WebFetchService
(40-301)fetch_search_results
(262-301)services/ranking_service.py (2)
RankingService
(31-196)rank_documents
(101-196)services/synthesis_service.py (2)
SynthesisService
(30-227)generate_answer
(150-227)
services/ranking_service.py (3)
services/chunking_service.py (1)
chunk_text
(9-39)services/embedding_service.py (1)
embed_texts
(19-42)services/web_fetch_service.py (1)
FetchedDoc
(31-38)
services/synthesis_service.py (1)
services/ranking_service.py (1)
RankedEvidence
(21-29)
🪛 Ruff (0.13.1)
api/endpoints/web_answering.py
301-301: Expected a statement
(invalid-syntax)
301-302: Expected an expression
(invalid-syntax)
303-303: Unexpected indentation
(invalid-syntax)
304-304: Expected a statement
(invalid-syntax)
304-304: Expected a statement
(invalid-syntax)
304-305: Expected an expression
(invalid-syntax)
306-306: Unexpected indentation
(invalid-syntax)
310-310: Expected a statement
(invalid-syntax)
services/ranking_service.py
146-146: Do not catch blind exception: Exception
(BLE001)
147-147: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
147-147: Use explicit conversion flag
Replace with conversion flag
(RUF010)
services/synthesis_service.py
46-46: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (4)
services/ranking_service.py (1)
86-95
: Good fix: cosine similarity now guards zero-norms.Prevents NaNs and stabilizes scores. Looks correct.
services/synthesis_service.py (2)
44-51
: LGTM: async client initialization with timeout.Appropriate default and aligns with endpoint’s asyncio.wait_for.
176-186
: Await the AsyncOpenAI call; current code returns a coroutine and will break.This is a runtime blocker.
Apply this diff:
- response = self.client.chat.completions.create( + response = await self.client.chat.completions.create( model=self.model, messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ], temperature=0.2, max_tokens=answer_tokens, )api/endpoints/web_answering.py (1)
111-112
: Confirmed — router is mounted under /internal.
web_answering router is included in api/main.py via app.include_router(web_answering_router, prefix="/internal", tags=["websearch"]).
… endpoint for improved clarity and logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
api/endpoints/web_answering.py (6)
140-151
: Consider more specific exception handling and structured logging.The broad
Exception
catch could mask specific initialization failures. Also, the error handling could follow chain patterns for better error context.# Initialize services try: search_service = WebSearchService() fetch_service = WebFetchService() ranking_service = RankingService() synthesis_service = SynthesisService() -except Exception as e: - logger.error("Failed to initialize services: %s", e, extra=logger_ctx) +except (ImportError, ValueError, ConnectionError) as e: + logger.exception("Failed to initialize services", extra=logger_ctx) + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail="Service initialization error." + ) from e +except Exception as e: + logger.exception("Unexpected error during service initialization", extra=logger_ctx) raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="Service initialization error." - ) + ) from e
166-171
: Use exception chaining for better error context.Following Python best practices for exception handling in nested contexts.
except asyncio.TimeoutError: logger.warning(f"Search timed out after {request.timeout_seconds * 0.4}s", extra=logger_ctx) raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Search phase timed out." - ) + ) from None
216-221
: Use exception chaining for better error context.Following Python best practices for exception handling in nested contexts.
except asyncio.TimeoutError: logger.warning(f"Ranking timed out after {request.timeout_seconds * 0.15}s", extra=logger_ctx) raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Ranking phase timed out." - ) + ) from None
241-241
: Incorrect timeout allocation comment: synthesis gets less than 15%.The comment says "Allocate remaining time" and uses 15%, but after allocating 40% + 30% + 15% = 85%, the remaining time should be 15%, not another 15%. The math seems inconsistent.
- timeout=request.timeout_seconds * 0.15 # Allocate remaining time + timeout=max(request.timeout_seconds * 0.15, 2) # Allocate remaining time (~15%)
244-248
: Use exception chaining for better error context.Following Python best practices for exception handling in nested contexts.
except asyncio.TimeoutError: logger.warning(f"Synthesis timed out after {request.timeout_seconds * 0.15}s", extra=logger_ctx) raise HTTPException( status_code=status.HTTP_504_GATEWAY_TIMEOUT, detail="Synthesis phase timed out." - ) + ) from None
272-272
: Potential issue with snippet truncation for citation.The citation snippet truncation uses a hardcoded 200-character limit without considering word boundaries, potentially creating awkward cuts mid-word.
- snippet=evidence.passage[:200] + "..." if len(evidence.passage) > 200 else evidence.passage, + snippet=( + evidence.passage[:197] + "..." + if len(evidence.passage) > 200 + else evidence.passage + ),Or for better word boundary handling:
def smart_truncate(text: str, max_length: int = 200) -> str: if len(text) <= max_length: return text # Find the last space before max_length last_space = text.rfind(' ', 0, max_length - 3) if last_space > max_length * 0.8: # Only truncate at word boundary if it's reasonably close return text[:last_space] + "..." return text[:max_length - 3] + "..."
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
api/endpoints/web_answering.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
api/endpoints/web_answering.py (4)
services/web_search_service.py (1)
WebSearchService
(202-303)services/web_fetch_service.py (2)
WebFetchService
(40-301)fetch_search_results
(262-301)services/ranking_service.py (2)
RankingService
(31-196)rank_documents
(101-196)services/synthesis_service.py (2)
SynthesisService
(30-227)generate_answer
(150-227)
🪛 Ruff (0.13.1)
api/endpoints/web_answering.py
115-115: Unused function argument: api_key
(ARG001)
146-146: Do not catch blind exception: Exception
(BLE001)
147-147: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
148-151: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
168-171: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
218-221: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
245-248: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (2)
api/endpoints/web_answering.py (2)
47-66
: Already fixed: Pydantic v2 @field_validator correctly implemented.The code correctly uses @field_validator decorators with the proper syntax for Pydantic v2. The validator functions follow the correct pattern with @classmethod methods that return the validated value.
115-115
: The api_key parameter serves a functional dependency injection purpose.The
api_key
parameter fromDepends(verify_internal_api_key)
is necessary for FastAPI's dependency injection system to enforce authentication. While the parameter isn't used within the function body, removing it would disable the authentication check.
@fehranbit please review the update |
Short summary: Adds an internal, citation-first web answering endpoint that searches, fetches, ranks, and synthesizes Markdown answers with inline footnotes and a References block. Handles partial failures and returns timings/meta for observability.
New Features
Endpoint (
POST /internal/websearch/answer
,X-Internal-API-Key
):query
,top_k_results=8
,max_context_chars=20000
,region="auto"
,language="en"
,style_profile_id?
,answer_tokens=800
,include_snippets=true
,timeout_seconds=25
.{ answer_markdown, citations[], used_sources_count, timings_ms, meta }
.Router: Included under
/internal
with tagwebsearch
.Web Fetch (
httpx.AsyncClient
, semaphoreMAX_FETCH_CONCURRENCY
):Ranking: Passage split (~800–1200 chars + small overlap), cosine sim via
EmbeddingService
, trimmed tomax_context_chars
.Synthesis: Evidence-first prompt, inline
[^i]
markers, References block,(citation needed)
for unsupported claims, optional style capsule, return only actually-cited sources.Improvements
timings_ms
+request_id
in logs.citations[]
.Security & Config
X-Internal-API-Key
; internal-only under/internal
.dummy
andtavily
; per-requesttimeout_seconds
; controlled concurrency via env.Summary by CodeRabbit