Skip to content

Conversation

@jharris1679
Copy link
Contributor

@jharris1679 jharris1679 commented Feb 21, 2025

Summary by CodeRabbit

  • Documentation
    • Added comprehensive French documentation alongside an updated English guide with new sections on interactive usage, architecture, and troubleshooting.
  • New Features
    • Launched an interactive chat interface for natural language queries on government spending data.
    • Enhanced deployment with containerized services for a streamlined setup.
    • Introduced a new FastAPI application for handling queries and health checks.
  • Chores
    • Refined project configurations to improve file management and maintenance.

@coderabbitai
Copy link

coderabbitai bot commented Feb 21, 2025

Walkthrough

This pull request updates the repository’s configuration and documentation while introducing a new Dockerized query engine for exploring Canadian government spending data. The changes include modifications to the .gitignore file, updated English and French documentation, and several new files under the query-engine directory. New Docker configurations, environment variable examples, and a set of Python modules implementing a FastAPI application with chat and indexing capabilities are also added, along with a comprehensive requirements file.

Changes

File(s) Change Summary
.gitignore Removed entry x; added entries for .env, __pycache__, and .DS_Store to ignore these files.
README.fr.md, README.md Added French documentation for the CanadaSpends tool; updated English documentation to reflect the interactive government spending data explorer with expanded sections (Overview, Features, Chat Interface, Architecture, Development, Troubleshooting).
query-engine/Dockerfile, query-engine/docker-compose.yaml, query-engine/.env.example Introduced Docker configuration for the query engine. The Dockerfile builds the image using Python 3.11-slim and sets up the application environment; docker-compose defines services (weaviate, api, chat) and network; .env.example provides environment variable declarations.
query-engine/requirements.txt Created a requirements file listing 18 dependencies needed for indexing, vector search, API services, and related functionalities.
query-engine/src/chat_interface.py, query-engine/src/indexer.py, query-engine/src/main.py, query-engine/src/query_engine.py Added Python modules that implement a chat interface, data indexing with XML and CSV parsing, a FastAPI application with /query and /health endpoints, and a QueryEngine class for processing user questions with error handling and result fetching.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ChatInterface
    participant API
    participant QueryEngine

    User->>ChatInterface: Start chat session\nEnter query
    ChatInterface->>API: Send GET /query?question=...
    API->>QueryEngine: Process query
    QueryEngine-->>API: Return result
    API-->>ChatInterface: Send response
    ChatInterface-->>User: Display response
Loading
sequenceDiagram
    participant API
    participant CSVIndexer
    participant Weaviate

    API->>CSVIndexer: Initialize indexing (startup)
    CSVIndexer->>Weaviate: Create/update schema & index CSV data
    Weaviate-->>CSVIndexer: Acknowledge operations
    CSVIndexer-->>API: Report indexing status (ready/failed)
    API->>API: /health endpoint returns current status
Loading

Poem

I'm a bunny with a happy hop,
Code carrots in a vibrant crop!
Docker dreams and chats so clear,
Indexing data without fear.
With every line, I bound with glee—
A code garden for you and me!
🥕🐇 Happy coding!

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (23)
query-engine/launch.sh (1)

1-1: Consider Enhancing Error Handling

Although the chained commands with && ensure that subsequent steps only run if the prior succeed, for improved robustness consider adding explicit error handling (for instance, including set -e at the beginning of the script). This approach makes the script exit immediately on any error and can simplify debugging.

🧰 Tools
🪛 Shellcheck (0.10.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

query-engine/src/indexer.py (5)

4-4: Remove unused import
The typing.Optional import is never used. Removing it will keep the imports clean and reduce potential confusion.

-from typing import Optional
🧰 Tools
🪛 Ruff (0.8.2)

4-4: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


6-10: Remove the unused Settings import
The Settings symbol is never referenced in this file. Consider removing it to avoid clutter.

-from llama_index.core import (
-    Document, 
-    VectorStoreIndex, 
-    StorageContext, 
-    Settings
-)
+from llama_index.core import (
+    Document, 
+    VectorStoreIndex, 
+    StorageContext
+)
🧰 Tools
🪛 Ruff (0.8.2)

10-10: llama_index.core.Settings imported but unused

Remove unused import: llama_index.core.Settings

(F401)


47-83: Check for blocking I/O in an async method
This method uses blocking calls (pandas.read_csv, self.weaviate_client.connect) within an async function. Consider offloading file I/O or network calls to an executor or using async-capable libraries, to prevent blocking the event loop.


80-82: Use consistent logging instead of print
The rest of the file uses loguru for logging. Replacing the print statement in the exception handler with logger.error(...) would ensure consistency and better traceability.

-except Exception as e:
-    print(f"Error during initialization: {str(e)}")
-    raise
+except Exception as e:
+    logger.error(f"Error during initialization: {str(e)}")
+    raise

87-90: Use contextlib.suppress instead of bare try-except-pass
Catching all exceptions silently can mask unexpected errors. contextlib.suppress(Exception) provides a cleaner, more Pythonic approach that makes your intent clearer.

-try:
-    self.weaviate_client.collections.delete(collection_name)
-except:
-    pass
+from contextlib import suppress
+with suppress(Exception):
+    self.weaviate_client.collections.delete(collection_name)
🧰 Tools
🪛 Ruff (0.8.2)

87-90: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


89-89: Do not use bare except

(E722)

query-engine/src/query_engine.py (2)

2-2: Remove unused import
typing.Optional is never referenced in this file. Remove it to keep dependencies minimal.

-from typing import Optional
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


17-17: Raise exceptions with raise ... from e
Use raise Exception(...) from e when re-raising exceptions so the original traceback is preserved.

-except Exception as e:
-    raise Exception(f"Error querying index: {str(e)}")
+except Exception as e:
+    raise Exception(f"Error querying index: {str(e)}") from e
🧰 Tools
🪛 Ruff (0.8.2)

17-17: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

query-engine/src/main.py (5)

1-1: Remove unused os import
This import is never referenced in the file. Removing it will tidy up the code.

-import os
🧰 Tools
🪛 Ruff (0.8.2)

1-1: os imported but unused

Remove unused import: os

(F401)


5-5: Remove unused import
typing.Optional is not used. Consider removing it to avoid confusion.

-from typing import Optional
🧰 Tools
🪛 Ruff (0.8.2)

5-5: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


9-9: Remove unused QueryEngine import
QueryEngine is not directly referenced in this file, even though the global query_engine variable is used. If the code never needs the class type, remove this import.

-from query_engine import QueryEngine
🧰 Tools
🪛 Ruff (0.8.2)

9-9: query_engine.QueryEngine imported but unused

Remove unused import: query_engine.QueryEngine

(F401)


22-22: Use a configurable path for CSV data
Hard-coding the CSV path can be limiting. Consider using an environment variable or application config to specify it for better flexibility.


44-44: Raise exceptions with raise ... from e
Using raise HTTPException(...) from e can help preserve the original exception traceback in logs.

-except Exception as e:
-    raise HTTPException(status_code=500, detail=str(e))
+except Exception as e:
+    raise HTTPException(status_code=500, detail=str(e)) from e
🧰 Tools
🪛 Ruff (0.8.2)

44-44: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

query-engine/src/chat_interface.py (4)

30-32: Improve error handling specificity.

The current error handling catches all exceptions. Consider catching specific exceptions that might occur during initialization.

-        except Exception as e:
+        except (FileNotFoundError, PermissionError) as e:
             console.print(f"[red]Error initializing query engine: {str(e)}")
             sys.exit(1)
+        except Exception as e:
+            console.print(f"[red]Unexpected error initializing query engine: {str(e)}")
+            sys.exit(1)

1-12: Remove unused imports.

The following imports are not used in the code:

  • rich.markdown.Markdown
  • rich.print
  • rich.spinner.Spinner
-from rich.markdown import Markdown
-from rich import print
-from rich.spinner import Spinner
🧰 Tools
🪛 Ruff (0.8.2)

6-6: rich.markdown.Markdown imported but unused

Remove unused import: rich.markdown.Markdown

(F401)


8-8: rich.print imported but unused

Remove unused import: rich.print

(F401)


9-9: rich.spinner.Spinner imported but unused

Remove unused import: rich.spinner.Spinner

(F401)


30-32: Enhance error handling with more specific information.

The current error handling could be more informative by providing context about what went wrong during initialization (e.g., file not found, permission issues, etc.).

         except Exception as e:
-            console.print(f"[red]Error initializing query engine: {str(e)}")
+            error_context = "Error initializing query engine:"
+            if "No such file" in str(e):
+                error_context = "CSV file not found:"
+            elif "Permission denied" in str(e):
+                error_context = "Permission denied while accessing CSV file:"
+            console.print(f"[red]{error_context} {str(e)}")
             sys.exit(1)

56-98: Consider implementing a more graceful shutdown mechanism.

While the current error handling is good, consider implementing a more graceful shutdown that:

  1. Saves any necessary state
  2. Closes any pending operations
  3. Provides feedback about the shutdown process
     try:
         indexer, query_engine = await initialize_engine()
         console.print("\n[green]✓ Query engine ready![/green]\n")
         await chat_loop(query_engine)
     except (KeyboardInterrupt, asyncio.CancelledError):
-        # Handle both keyboard interrupt and asyncio cancellation
-        pass
+        console.print("\n[yellow]Initiating graceful shutdown...[/yellow]")
     finally:
         if indexer:
+            console.print("[dim]Closing connections...[/dim]")
             indexer.close()
             console.print("[dim]Connection closed.[/dim]")
+            console.print("[green]✓ Shutdown complete[/green]")
         # Exit cleanly without showing the asyncio cancellation traceback
         sys.exit(0)
query-engine/Dockerfile (1)

1-23: Optimize Docker image size using multi-stage build.

Consider using a multi-stage build to reduce the final image size by removing build dependencies.

-FROM python:3.11-slim
+# Build stage
+FROM python:3.11-slim as builder
+
+WORKDIR /app
+
+# Install build dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Python packages
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Final stage
+FROM python:3.11-slim
 
 WORKDIR /app
 
-# Install system dependencies
-RUN apt-get update && apt-get install -y \
-    build-essential \
-    && rm -rf /var/lib/apt/lists/*
-
-# Copy requirements first to leverage Docker cache
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
+# Copy Python packages from builder
+COPY --from=builder /usr/local/lib/python3.11/site-packages/ /usr/local/lib/python3.11/site-packages/
 
 # Copy the rest of the application
 COPY src/ ./src/
query-engine/docker-compose.yaml (2)

14-15: Document required environment variables.

The compose file uses environment variables without documenting their requirements. Consider adding a .env.example file to document the required variables.

Create a new file .env.example:

OPENAI_API_KEY=your_openai_api_key_here
COHERE_API_KEY=your_cohere_api_key_here

24-48: Add health check for API service.

Consider adding a health check to ensure the API service is ready before marking it as healthy.

   api:
     build: 
       context: .
       dockerfile: Dockerfile
     command: python src/main.py
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 10s
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 25-25: trailing spaces

(trailing-spaces)

README.en.md (1)

111-125: Development and Directory Structure

The directory tree under the “Development” section is well presented.
Suggestion: Add a language specifier (e.g., use "plaintext" instead of just "") for the fenced code block on line 115 to meet markdownlint guidelines.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

README.fr.md (2)

111-125: Développement et Organisation du Code

La présentation de l’arborescence du dépôt via un bloc de code est claire.
Suggestion: Comme pour la version anglaise, ajoutez un spécificateur de langage (par exemple "```plaintext") pour ce bloc de code afin de respecter les recommandations de markdownlint.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


127-149: Section Dépannage et Remarques sur la Conformité Typographique

La section Dépannage est très complète et fournit des indications utiles pour résoudre les problèmes courants.
En parallèle, voici quelques remarques basées sur les suggestions de l'analyse statique :

  • Ligne 3 : Vérifiez que la syntaxe des liens est correctement interprétée (le linter signale un potentiel déséquilibre de crochets, bien que visuellement la syntaxe soit correcte).
  • Ligne 53 : Assurez-vous qu’une espace soit présente après le point pour respecter les conventions typographiques.
  • Ligne 71 : Contrôlez l’usage des accents pour respecter la grammaire française, surtout pour les participes passés.
  • Ligne 76 : Envisagez de reformuler légèrement cette phrase pour améliorer la fluidité de la lecture.
  • Ligne 87 : Pour les commandes disponibles, un tiret long ou une ponctuation plus appropriée pourrait être utilisé afin d’éviter l’usage du trait d’union, réservé à la formation de mots composés.
  • Ligne 89 : Vérifiez que l’apostrophe utilisée est la bonne (éviter les caractères erronés pour assurer la cohérence typographique).
  • Ligne 101 et 104 : Assurez-vous que les participes passés et verbes utilisés respectent un registre formel adapté au document.
  • Ligne 135 : Veillez à ce que le mot signalé commence par une minuscule pour maintenir la cohérence des points de listes.
  • Ligne 149 : Une espace est requise après la ponctuation pour une présentation correcte.

Si vous avez besoin d'assistance pour ajuster ces points ou pour valider la conformité générale du document, n'hésitez pas à demander.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~135-~135: Ce mot doit commencer par une minuscule.
Context: ...des services : docker-compose ps - Consultez les logs : docker-compose logs 2. **...

(DEUX_POINTS_MAJ)


[typographical] ~149-~149: Une espace est requise.
Context: ...t échoue, contactez hi@canadaspends.com

(DEUX_POINTS_ESPACE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5552d9e and 853ee07.

⛔ Files ignored due to path filters (1)
  • query-engine/csv-data/otpmopeom-apdtmacdpam-2024.csv is excluded by !**/*.csv
📒 Files selected for processing (13)
  • .gitignore (1 hunks)
  • README.en.md (1 hunks)
  • README.fr.md (1 hunks)
  • README.md (0 hunks)
  • query-engine/Dockerfile (1 hunks)
  • query-engine/docker-compose.yaml (1 hunks)
  • query-engine/launch.sh (1 hunks)
  • query-engine/requirements.txt (1 hunks)
  • query-engine/src/chat_interface.py (1 hunks)
  • query-engine/src/indexer.py (1 hunks)
  • query-engine/src/main.py (1 hunks)
  • query-engine/src/query_engine.py (1 hunks)
  • query-engine/src/start.py (1 hunks)
💤 Files with no reviewable changes (1)
  • README.md
✅ Files skipped from review due to trivial changes (2)
  • .gitignore
  • query-engine/requirements.txt
🧰 Additional context used
🪛 YAMLlint (1.35.1)
query-engine/docker-compose.yaml

[error] 1-1: trailing spaces

(trailing-spaces)


[error] 2-2: trailing spaces

(trailing-spaces)


[error] 3-3: trailing spaces

(trailing-spaces)


[error] 4-4: trailing spaces

(trailing-spaces)


[error] 5-5: trailing spaces

(trailing-spaces)


[error] 6-6: trailing spaces

(trailing-spaces)


[error] 10-10: trailing spaces

(trailing-spaces)


[error] 11-11: trailing spaces

(trailing-spaces)


[error] 25-25: trailing spaces

(trailing-spaces)


[error] 50-50: trailing spaces

(trailing-spaces)


[error] 77-77: no new line character at the end of file

(new-line-at-end-of-file)

🪛 LanguageTool
README.en.md

[uncategorized] ~132-~132: The preposition ‘to’ seems more likely in this position.
Context: ...all required API keys are correctly set in .env - Verify CSV data is present ...

(AI_HYDRA_LEO_REPLACE_IN_TO)

README.fr.md

[typographical] ~3-~3: Pas de correspondance fermante ou ouvrante pour le caractère « ] »
Context: ...rnementales English | Français Un outil puiss...

(UNPAIRED_BRACKETS)


[typographical] ~53-~53: Il manque une espace après le point.
Context: ...l'environnement :** Créez un fichier .env dans le répertoire racine : ```env ...

(ESPACE_APRES_POINT)


[uncategorized] ~71-~71: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ... données vectorielle Weaviate - Service API - Interface de chat interactive ## Utilisation de ...

(NOM_A_V_PPA)


[style] ~76-~76: Cette structure peut être allégée afin d’apporter de la fluidité à votre texte.
Context: ... données des dépenses gouvernementales. Il suffit de taper vos questions en langage naturel pour o...

(IL_SUFFIT_DE)


[style] ~87-~87: Un tiret long ou un autre signe de ponctuation plus approprié, car le trait d’union ne s’utilise que pour former un mot.
Context: ... naturel pour interroger les données de dépenses - clear - Efface l'écran - q, quit, ou exit - Quit...

(TIRET_LONG_1)


[typographical] ~89-~89: Caractère d’apostrophe incorrect.
Context: ...enses - clear - Efface l'écran - q, quit, ou exit - Quitte l'interface de chat...

(APOS_INCORRECT)


[uncategorized] ~101-~101: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ...onnées entre les sessions 2. Service API - Gère l'indexation des données - Traite le...

(NOM_A_V_PPA)


[style] ~104-~104: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...te les requêtes en langage naturel - Gère la communication avec les modèles d'IA ...

(VERBES_FAMILIERS_PREMIUM)


[style] ~109-~109: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...nses pour une meilleure lisibilité - Gère les commandes utilisateur ## Développe...

(VERBES_FAMILIERS_PREMIUM)


[uncategorized] ~135-~135: Ce mot doit commencer par une minuscule.
Context: ...des services : docker-compose ps - Consultez les logs : docker-compose logs 2. **...

(DEUX_POINTS_MAJ)


[typographical] ~149-~149: Une espace est requise.
Context: ...t échoue, contactez hi@canadaspends.com

(DEUX_POINTS_ESPACE)

🪛 markdownlint-cli2 (0.17.2)
README.en.md

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

README.fr.md

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🪛 Ruff (0.8.2)
query-engine/src/start.py

2-2: uvicorn imported but unused

Remove unused import: uvicorn

(F401)


28-28: Redefinition of unused uvicorn from line 2

Remove definition: uvicorn

(F811)

query-engine/src/query_engine.py

2-2: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


17-17: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

query-engine/src/main.py

1-1: os imported but unused

Remove unused import: os

(F401)


5-5: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


9-9: query_engine.QueryEngine imported but unused

Remove unused import: query_engine.QueryEngine

(F401)


44-44: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

query-engine/src/indexer.py

4-4: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


10-10: llama_index.core.Settings imported but unused

Remove unused import: llama_index.core.Settings

(F401)


87-90: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


89-89: Do not use bare except

(E722)

query-engine/src/chat_interface.py

6-6: rich.markdown.Markdown imported but unused

Remove unused import: rich.markdown.Markdown

(F401)


8-8: rich.print imported but unused

Remove unused import: rich.print

(F401)


9-9: rich.spinner.Spinner imported but unused

Remove unused import: rich.spinner.Spinner

(F401)

🪛 Shellcheck (0.10.0)
query-engine/launch.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

🔇 Additional comments (25)
query-engine/src/indexer.py (1)

106-120:

Details

✅ Verification successful

Confirm presence of expected CSV columns
If any column (e.g. 'MINE', 'MINF', 'FSCL_YR', 'Description_EN') is missing or renamed in the CSV, this code will raise a KeyError.


🏁 Script executed:

#!/bin/bash
# Verify usage of expected columns throughout the codebase
rg "row\['(MINE|MINF|FSCL_YR|Description_EN|Description_FR|Ttl-min-net-exp_Ttl-dép-min-nettes|Cons-spec-purpose-accnts_Cmpts-fins-dét-cons|Accrual-and-other-adjs_Redressements-courus-et-autres|Cons-Crown-corps-and-other-ents_Socs-d'État-et-autres-ents-cons|Tax-credits-and-repayments_Crédits-et-remboursements-fiscaux|Internal-expenses_Charges-internes)'\]"

Length of output: 1238


CSV Columns Verified – No Missing Columns Detected

I ran a search for all the expected CSV columns in query-engine/src/indexer.py and confirmed that the code references the expected columns (MINE, MINF, FSCL_YR, Description_EN, Description_FR, Ttl-min-net-exp_Ttl-dép-min-nettes, Cons-spec-purpose-accnts_Cmpts-fins-dét-cons, Accrual-and-other-adjs_Redressements-courus-et-autres, Tax-credits-and-repayments_Crédits-et-remboursements-fiscaux, and Internal-expenses_Charges-internes). Although the Cons-Crown-corps-and-other-ents_Socs-d'État-et-autres-ents-cons column is accessed using double quotes—unlike the rest—it is still valid Python syntax and correctly interacts with the CSV data.

  • Location: query-engine/src/indexer.py (Lines 106–120)
  • Outcome: All expected columns are present and referenced properly.

As a best practice, consider adding explicit error handling for missing columns to provide clearer error messages if the CSV schema ever changes. Otherwise, no modifications are needed.

README.en.md (12)

1-4: Header and Navigation Links Clarity

The title and the navigation links (for English and French versions) are well defined and immediately guide the user to the appropriate documentation.


5-6: Introduction Section Effectiveness

The introductory paragraph succinctly explains the purpose of CanadaSpends, providing a clear value proposition for users curious about government spending data in natural language.


7-19: Comprehensive Table of Contents

The Table of Contents is detailed and logically arranged, helping users quickly locate sections like Overview, Features, and Troubleshooting.


20-26: Overview Section

The Overview section clearly outlines the three main components of the tool. This high-level description is effective for new users.


27-36: Features Section

The features are presented clearly and enhanced with emojis, which improve readability and user engagement.


37-43: Getting Started – Prerequisites

The prerequisites are explained in a straightforward manner, allowing users to quickly determine the system requirements (Docker, Docker Compose, and a Cohere API key).


44-61: Installation Instructions

The installation steps are very clear. The use of fenced code blocks with language identifiers (e.g., bash and env) makes the commands easy to understand and follow.


62-68: Service Startup Instructions

The commands for starting the services are concise and precise. Users are provided with clear instructions on how to launch the query engine chat using Docker.


69-73: Overview of Launched Services

The “This launches:” section effectively lists the different services (Weaviate, API service, and chat interface). It might be useful later to verify these names match the actual service identifiers in the codebase.


74-91: Chat Interface Usage Details

This section explains how to use the chat interface with examples and available commands, which is very helpful. Be sure that these commands and descriptions remain consistent with the implemented functionality.


92-110: Architecture Description

The architecture section is neatly broken down into its components with numbered lists, which improves clarity and helps users understand the system’s structure.


127-150: Troubleshooting Guidance

The Troubleshooting section comprehensively addresses common issues. The clear steps and commands provided (like checking service status and viewing logs) will help users quickly diagnose problems.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~132-~132: The preposition ‘to’ seems more likely in this position.
Context: ...all required API keys are correctly set in .env - Verify CSV data is present ...

(AI_HYDRA_LEO_REPLACE_IN_TO)

README.fr.md (11)

1-4: Titre et Liens de Navigation Clairs

Le titre et les liens de navigation pour les versions en anglais et en français sont bien définis, ce qui facilite l’orientation de l’utilisateur dès le départ.

🧰 Tools
🪛 LanguageTool

[typographical] ~3-~3: Pas de correspondance fermante ou ouvrante pour le caractère « ] »
Context: ...rnementales English | Français Un outil puiss...

(UNPAIRED_BRACKETS)


5-6: Clarté de la Section d'Introduction

La description initiale résume clairement l’objectif du projet tout en invitant l’utilisateur à explorer l’outil.


7-19: Table des Matières Complète

La table des matières est exhaustive et offre une navigation fluide entre les différentes sections du document.


20-26: Section Aperçu

La section Aperçu décrit de manière concise les trois composants principaux du projet, ce qui est très utile pour comprendre rapidement l’outil.


27-36: Section Fonctionnalités

La liste des fonctionnalités est claire et attrayante, notamment grâce aux icônes qui améliorent la lisibilité.


37-43: Prérequis

Les prérequis sont énoncés de façon claire et succincte, permettant aux utilisateurs de vérifier rapidement leur environnement avant de procéder à l'installation.


44-61: Instructions d'Installation

Les étapes d’installation sont détaillées et comprennent des blocs de code pour aider à la compréhension des commandes.

🧰 Tools
🪛 LanguageTool

[typographical] ~53-~53: Il manque une espace après le point.
Context: ...l'environnement :** Créez un fichier .env dans le répertoire racine : ```env ...

(ESPACE_APRES_POINT)


62-68: Démarrage des Services

Les instructions pour démarrer les services (avec Docker) sont concises et devraient être faciles à suivre pour les utilisateurs.


69-73: Lancement des Services

La liste des services lancés est présentée clairement, permettant à l’utilisateur de savoir exactement quels composants seront démarrés.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~71-~71: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ... données vectorielle Weaviate - Service API - Interface de chat interactive ## Utilisation de ...

(NOM_A_V_PPA)


74-91: Utilisation de l'Interface de Chat

La section détaillant l’utilisation de l’interface de chat est bien structurée, avec des exemples de questions et des indications sur les commandes disponibles.

🧰 Tools
🪛 LanguageTool

[style] ~76-~76: Cette structure peut être allégée afin d’apporter de la fluidité à votre texte.
Context: ... données des dépenses gouvernementales. Il suffit de taper vos questions en langage naturel pour o...

(IL_SUFFIT_DE)


[style] ~87-~87: Un tiret long ou un autre signe de ponctuation plus approprié, car le trait d’union ne s’utilise que pour former un mot.
Context: ... naturel pour interroger les données de dépenses - clear - Efface l'écran - q, quit, ou exit - Quit...

(TIRET_LONG_1)


[typographical] ~89-~89: Caractère d’apostrophe incorrect.
Context: ...enses - clear - Efface l'écran - q, quit, ou exit - Quitte l'interface de chat...

(APOS_INCORRECT)


92-110: Description de l'Architecture

L’architecture du système est expliquée de manière détaillée et organisée, facilitant la compréhension des différents composants et de leurs interactions.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~101-~101: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ...onnées entre les sessions 2. Service API - Gère l'indexation des données - Traite le...

(NOM_A_V_PPA)


[style] ~104-~104: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...te les requêtes en langage naturel - Gère la communication avec les modèles d'IA ...

(VERBES_FAMILIERS_PREMIUM)


[style] ~109-~109: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...nses pour une meilleure lisibilité - Gère les commandes utilisateur ## Développe...

(VERBES_FAMILIERS_PREMIUM)

query-engine/src/chat_interface.py (1)

34-55: Well-structured query processing and command handling!

The code demonstrates good practices:

  • Clear separation of concerns between query processing and command handling
  • Appropriate error handling in query processing
  • Simple and effective command handling logic

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
query-engine/src/start.py (1)

28-29: Make server configuration configurable.

Consider making the host and port configurable via environment variables for better flexibility.

+import os
+
 def run_api_server():
-    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=False)
+    host = os.getenv("API_HOST", "0.0.0.0")
+    port = int(os.getenv("API_PORT", "8000"))
+    reload = os.getenv("API_RELOAD", "false").lower() == "true"
+    uvicorn.run("main:app", host=host, port=port, reload=reload)
README.md (1)

115-125: Add language specification to code block.

Specify the language for the code block to enable proper syntax highlighting.

-```
+```text
 CanadaSpends/
 ├── query-engine/
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 853ee07 and 023c207.

📒 Files selected for processing (5)
  • README.fr.md (1 hunks)
  • README.md (2 hunks)
  • query-engine/docker-compose.yaml (1 hunks)
  • query-engine/src/chat_interface.py (1 hunks)
  • query-engine/src/start.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • query-engine/src/chat_interface.py
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~132-~132: The preposition ‘to’ seems more likely in this position.
Context: ...all required API keys are correctly set in .env - Verify CSV data is present ...

(AI_HYDRA_LEO_REPLACE_IN_TO)

README.fr.md

[typographical] ~53-~53: Il manque une espace après le point.
Context: ...l'environnement :** Créez un fichier .env dans le répertoire racine : ```env ...

(ESPACE_APRES_POINT)


[uncategorized] ~71-~71: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ... données vectorielle Weaviate - Service API - Interface de chat interactive ## Utilisation de ...

(NOM_A_V_PPA)


[style] ~76-~76: Cette structure peut être allégée afin d’apporter de la fluidité à votre texte.
Context: ... données des dépenses gouvernementales. Il suffit de taper vos questions en langage naturel pour o...

(IL_SUFFIT_DE)


[style] ~87-~87: Un tiret long ou un autre signe de ponctuation plus approprié, car le trait d’union ne s’utilise que pour former un mot.
Context: ... naturel pour interroger les données de dépenses - clear - Efface l'écran - q, quit, ou exit - Quit...

(TIRET_LONG_1)


[typographical] ~89-~89: Caractère d’apostrophe incorrect.
Context: ...enses - clear - Efface l'écran - q, quit, ou exit - Quitte l'interface de chat...

(APOS_INCORRECT)


[uncategorized] ~101-~101: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ...onnées entre les sessions 2. Service API - Gère l'indexation des données - Traite le...

(NOM_A_V_PPA)


[style] ~104-~104: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...te les requêtes en langage naturel - Gère la communication avec les modèles d'IA ...

(VERBES_FAMILIERS_PREMIUM)


[style] ~109-~109: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...nses pour une meilleure lisibilité - Gère les commandes utilisateur ## Développe...

(VERBES_FAMILIERS_PREMIUM)


[uncategorized] ~135-~135: Ce mot doit commencer par une minuscule.
Context: ...des services : docker-compose ps - Consultez les logs : docker-compose logs 2. **...

(DEUX_POINTS_MAJ)


[typographical] ~149-~149: Une espace est requise.
Context: ...t échoue, contactez hi@canadaspends.com

(DEUX_POINTS_ESPACE)

🪛 markdownlint-cli2 (0.17.2)
README.md

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

README.fr.md

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🪛 YAMLlint (1.35.1)
query-engine/docker-compose.yaml

[error] 1-1: trailing spaces

(trailing-spaces)


[error] 2-2: trailing spaces

(trailing-spaces)


[error] 3-3: trailing spaces

(trailing-spaces)


[error] 4-4: trailing spaces

(trailing-spaces)


[error] 5-5: trailing spaces

(trailing-spaces)


[error] 6-6: trailing spaces

(trailing-spaces)


[error] 10-10: trailing spaces

(trailing-spaces)


[error] 11-11: trailing spaces

(trailing-spaces)


[error] 25-25: trailing spaces

(trailing-spaces)


[error] 50-50: trailing spaces

(trailing-spaces)

🪛 Ruff (0.8.2)
query-engine/src/start.py

9-9: chat_interface.main imported but unused

Remove unused import: chat_interface.main

(F401)


32-32: Redefinition of unused main from line 9

Remove definition: main

(F811)

🔇 Additional comments (10)
query-engine/src/start.py (5)

9-9: Remove unused import.

The chat_interface.main import is unused and later redefined at line 32.

🧰 Tools
🪛 Ruff (0.8.2)

9-9: chat_interface.main imported but unused

Remove unused import: chat_interface.main

(F401)


13-27: LGTM! Well-formatted welcome message.

The function effectively uses rich formatting to display a clear and informative welcome message.


31-33: LGTM! Proper async handling.

The function correctly uses asyncio.run to handle the async chat interface.

🧰 Tools
🪛 Ruff (0.8.2)

32-32: Redefinition of unused main from line 9

Remove definition: main

(F811)


43-44: Replace fixed sleep with health check.

Using a fixed sleep duration is not a reliable way to wait for the API server to start.


46-63: LGTM! Well-implemented error handling.

The code effectively handles user input validation and provides a clean exit mechanism with proper error handling.

query-engine/docker-compose.yaml (5)

1-78: Fix YAML formatting: Remove trailing spaces and ensure a newline at the end of the file.
The YAML file contains multiple trailing space issues (e.g., lines 1, 2, 3, 4, 5, 6, 10, 11, 25, and 50) as flagged by YAMLlint. Please remove these trailing spaces and confirm that the file ends with a newline.

Consider running the following command to address these formatting issues:

sed -i 's/[[:space:]]*$//' query-engine/docker-compose.yaml && echo >> query-engine/docker-compose.yaml
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 1-1: trailing spaces

(trailing-spaces)


[error] 2-2: trailing spaces

(trailing-spaces)


[error] 3-3: trailing spaces

(trailing-spaces)


[error] 4-4: trailing spaces

(trailing-spaces)


[error] 5-5: trailing spaces

(trailing-spaces)


[error] 6-6: trailing spaces

(trailing-spaces)


[error] 10-10: trailing spaces

(trailing-spaces)


[error] 11-11: trailing spaces

(trailing-spaces)


[error] 25-25: trailing spaces

(trailing-spaces)


[error] 50-50: trailing spaces

(trailing-spaces)


4-23: Weaviate service configuration approved.
The configuration for the weaviate service—including image, port mappings, environment variables, and network alias settings—is well structured and clear.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 4-4: trailing spaces

(trailing-spaces)


[error] 5-5: trailing spaces

(trailing-spaces)


[error] 6-6: trailing spaces

(trailing-spaces)


[error] 10-10: trailing spaces

(trailing-spaces)


[error] 11-11: trailing spaces

(trailing-spaces)


24-48: API service configuration validated.
The api service is correctly configured with build context, Dockerfile specification, command, volume mounts, environment variables, and dependency on the weaviate service.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 25-25: trailing spaces

(trailing-spaces)


49-74: Chat service configuration verified with interactive terminal support.
The chat service is set up properly with the appropriate build context, command, volumes, environment settings, and particularly the interactive terminal options (stdin_open and tty) required for an interactive chat interface.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 50-50: trailing spaces

(trailing-spaces)


75-78: Network configuration approved.
The custom bridge network (query-network) is correctly defined with the bridge driver, ensuring proper communication among services.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
query-engine/src/indexer.py (3)

78-80: Improve exception handling.

The current exception handling is too broad. Consider catching specific exceptions for better error handling.

-        except Exception as e:
-            logger.error(f"Error during initialization: {str(e)}")
-            raise
+        except pd.errors.EmptyDataError:
+            logger.error("CSV file is empty")
+            raise
+        except pd.errors.ParserError as e:
+            logger.error(f"Error parsing CSV file: {str(e)}")
+            raise
+        except weaviate.exceptions.WeaviateBaseError as e:
+            logger.error(f"Weaviate error: {str(e)}")
+            raise
+        except Exception as e:
+            logger.error(f"Unexpected error during initialization: {str(e)}")
+            raise

82-89: Improve exception handling in schema creation.

The bare except block should be replaced with contextlib.suppress as suggested by static analysis.

+from contextlib import suppress
+
 def _create_weaviate_schema(self, collection_name: str):
     # Delete if exists
-    try:
+    with suppress(Exception):
         self.weaviate_client.collections.delete(collection_name)
-    except:
-        pass
🧰 Tools
🪛 Ruff (0.8.2)

85-88: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


87-87: Do not use bare except

(E722)


101-121: Add column validation and type hints in document preparation.

The method should validate DataFrame columns before processing and include proper type hints.

-def _prepare_documents(self, df: pd.DataFrame) -> list[Document]:
+def _prepare_documents(self, df: pd.DataFrame) -> list[Document]:
+    required_columns = [
+        'MINE', 'MINF', 'FSCL_YR', 'Description_EN', 'Description_FR',
+        'Ttl-min-net-exp_Ttl-dép-min-nettes',
+        'Cons-spec-purpose-accnts_Cmpts-fins-dét-cons',
+        'Accrual-and-other-adjs_Redressements-courus-et-autres',
+        "Cons-Crown-corps-and-other-ents_Socs-d'État-et-autres-ents-cons",
+        'Tax-credits-and-repayments_Crédits-et-remboursements-fiscaux',
+        'Internal-expenses_Charges-internes'
+    ]
+    
+    missing_columns = [col for col in required_columns if col not in df.columns]
+    if missing_columns:
+        raise ValueError(f"Missing required columns: {', '.join(missing_columns)}")
+
     documents = []
     
-    for _, row in df.iterrows():
+    for _, row in df.iterrows():  # type: tuple[int, pd.Series]
README.md (1)

85-91: Available Commands Explained

The available commands for the chat interface are clearly enumerated. Although the hyphens work well in a markdown list, you may consider using an em dash (—) for a slightly more formal look if desired.

README.fr.md (4)

44-61: Instructions d’Installation et Configuration

Les étapes pour cloner le dépôt et configurer l’environnement (dont la création du fichier .env) sont bien structurées.

Note (selon les suggestions LanguageTool) :

  • Vérifiez qu’après certains signes de ponctuation (notamment dans "Configurer l'environnement :") une espace adéquate est utilisée pour améliorer la fluidité du texte.
🧰 Tools
🪛 LanguageTool

[typographical] ~53-~53: Il manque une espace après le point.
Context: ...l'environnement :** Créez un fichier .env dans le répertoire racine : ```env ...

(ESPACE_APRES_POINT)


74-77: Utilisation de l'Interface de Chat

La description sur l’utilisation de l’interface de chat est fluide.

Suggestion (selon LanguageTool) : Considérez d’alléger légèrement la formulation pour encore plus de fluidité.

🧰 Tools
🪛 LanguageTool

[style] ~76-~76: Cette structure peut être allégée afin d’apporter de la fluidité à votre texte.
Context: ... données des dépenses gouvernementales. Il suffit de taper vos questions en langage naturel pour o...

(IL_SUFFIT_DE)


85-91: Commandes Disponibles

La liste des commandes disponibles est claire.

Suggestion (LanguageTool) : Pour une présentation plus formelle, envisagez d’utiliser un tiret long (—) au lieu d’un simple trait d’union pour séparer les éléments, et vérifiez l’usage des apostrophes qui pourraient bénéficier d’un formatage typographique soigné.

🧰 Tools
🪛 LanguageTool

[style] ~87-~87: Un tiret long ou un autre signe de ponctuation plus approprié, car le trait d’union ne s’utilise que pour former un mot.
Context: ... naturel pour interroger les données de dépenses - clear - Efface l'écran - q, quit, ou exit - Quit...

(TIRET_LONG_1)


[typographical] ~89-~89: Caractère d’apostrophe incorrect.
Context: ...enses - clear - Efface l'écran - q, quit, ou exit - Quitte l'interface de chat...

(APOS_INCORRECT)


127-150: Section Dépannage Précise

La section "Dépannage" fournit des conseils pratiques pour résoudre des problèmes liés aux services, aux données et à l’API, et offre même des possibilités de support en cas de besoin.

Note (LanguageTool) :

  • Sur la ligne ~135, assurez-vous que la première lettre du contenu dans les listes reste cohérente (par exemple, « consultez les logs » pourrait être en minuscule pour l’uniformité).
  • À la ligne ~149, vérifiez qu’une espace est bien présente après la ponctuation avant l’adresse e-mail pour respecter les normes typographiques.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~135-~135: Ce mot doit commencer par une minuscule.
Context: ...des services : docker-compose ps - Consultez les logs : docker-compose logs 2. **...

(DEUX_POINTS_MAJ)


[typographical] ~149-~149: Une espace est requise.
Context: ...t échoue, contactez hi@canadaspends.com

(DEUX_POINTS_ESPACE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 023c207 and a11e6aa.

📒 Files selected for processing (7)
  • README.fr.md (1 hunks)
  • README.md (2 hunks)
  • query-engine/.env.example (1 hunks)
  • query-engine/src/chat_interface.py (1 hunks)
  • query-engine/src/indexer.py (1 hunks)
  • query-engine/src/main.py (1 hunks)
  • query-engine/src/query_engine.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • query-engine/.env.example
🚧 Files skipped from review as they are similar to previous changes (3)
  • query-engine/src/query_engine.py
  • query-engine/src/main.py
  • query-engine/src/chat_interface.py
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~132-~132: The preposition ‘to’ seems more likely in this position.
Context: ...all required API keys are correctly set in .env - Verify CSV data is present ...

(AI_HYDRA_LEO_REPLACE_IN_TO)

README.fr.md

[typographical] ~53-~53: Il manque une espace après le point.
Context: ...l'environnement :** Créez un fichier .env dans le répertoire racine : ```env ...

(ESPACE_APRES_POINT)


[uncategorized] ~71-~71: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ... données vectorielle Weaviate - Service API - Interface de chat interactive ## Utilisation de ...

(NOM_A_V_PPA)


[style] ~76-~76: Cette structure peut être allégée afin d’apporter de la fluidité à votre texte.
Context: ... données des dépenses gouvernementales. Il suffit de taper vos questions en langage naturel pour o...

(IL_SUFFIT_DE)


[style] ~87-~87: Un tiret long ou un autre signe de ponctuation plus approprié, car le trait d’union ne s’utilise que pour former un mot.
Context: ... naturel pour interroger les données de dépenses - clear - Efface l'écran - q, quit, ou exit - Quit...

(TIRET_LONG_1)


[typographical] ~89-~89: Caractère d’apostrophe incorrect.
Context: ...enses - clear - Efface l'écran - q, quit, ou exit - Quitte l'interface de chat...

(APOS_INCORRECT)


[uncategorized] ~101-~101: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ...onnées entre les sessions 2. Service API - Gère l'indexation des données - Traite le...

(NOM_A_V_PPA)


[style] ~104-~104: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...te les requêtes en langage naturel - Gère la communication avec les modèles d'IA ...

(VERBES_FAMILIERS_PREMIUM)


[style] ~109-~109: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...nses pour une meilleure lisibilité - Gère les commandes utilisateur ## Développe...

(VERBES_FAMILIERS_PREMIUM)


[uncategorized] ~135-~135: Ce mot doit commencer par une minuscule.
Context: ...des services : docker-compose ps - Consultez les logs : docker-compose logs 2. **...

(DEUX_POINTS_MAJ)


[typographical] ~149-~149: Une espace est requise.
Context: ...t échoue, contactez hi@canadaspends.com

(DEUX_POINTS_ESPACE)

🪛 Ruff (0.8.2)
query-engine/src/indexer.py

85-88: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


87-87: Do not use bare except

(E722)

🪛 markdownlint-cli2 (0.17.2)
README.fr.md

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🔇 Additional comments (27)
query-engine/src/indexer.py (2)

1-22: LGTM! Well-organized imports.

The imports are properly organized and grouped by functionality, with standard library imports followed by third-party imports.


122-125: LGTM! Proper cleanup in close method.

The close method properly handles the cleanup of the Weaviate client connection.

README.md (14)

1-5: Title and Introduction Updated

The title now clearly reflects the new interactive focus, and the introductory paragraph concisely explains the tool’s purpose.


3-4: French Documentation Link Added

The link to the French documentation (*[Français](README.fr.md)*) is a welcome addition that improves accessibility.


7-13: Enhanced Table of Contents

New entries such as "Using the Chat Interface," "Example Questions," "Available Commands," "Architecture," "Development," and "Troubleshooting" have been added. This expanded TOC greatly improves navigation and usability.


22-26: Overview Section Improvements

The overview now enumerates the three main components—Interactive Chat Interface, Data Processing, and Vector Search—clearly setting the stage for users.


29-36: Detailed Features List

The features list now covers key functionalities (e.g. natural language query interface, AI-powered response, semantic search, Docker deployment). The bullet points are clear and descriptive.


41-43: Prerequisites Clarified

The prerequisites section now mentions both Docker/Docker Compose and the Cohere API key. This is succinct and helpful for users to set up the environment.


46-61: Installation Instructions Enhanced

The installation steps now provide clear, step‐by‐step instructions for cloning the repository, configuring the environment (with a detailed .env file example), and starting the services. Every step is explicit and easy to follow.


62-68: Service Startup Instructions

The directions for starting the services using docker-compose commands are well documented. Including both the detached mode and the one‑off run for the chat interface makes it clear how to launch the tool.


69-73: Launch Process Description

The “This launches:” section clearly lists the components that are started. This additional explanation enhances user understanding of the system architecture.


74-77: Chat Interface Usage Overview

The new "Using the Chat Interface" section provides a concise explanation of how users can interact with the tool. This is an excellent addition that clarifies the user experience.


78-84: Example Questions Section

The list of example questions is practical and gives users a good idea of how to phrase their queries.


92-110: Architecture Breakdown

The Architecture section succinctly outlines the three main components—Vector Database, API Service, and Chat Interface—with clear bullet points describing each. This is well structured and informative.


111-126: Source Code Structure and Development

The "Development" section detailing the file organization is clear, and the fenced code block’s language is specified (text), which improves readability.


127-150: Troubleshooting Guidelines

The Troubleshooting section provides comprehensive steps and tips for resolving common issues related to services, data, and API connectivity. This thoroughness is very beneficial for end users.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~132-~132: The preposition ‘to’ seems more likely in this position.
Context: ...all required API keys are correctly set in .env - Verify CSV data is present ...

(AI_HYDRA_LEO_REPLACE_IN_TO)

README.fr.md (11)

1-6: Titre et Introduction en Français

Le titre en français et le paragraphe introductif décrivent clairement l’objectif du projet pour un public francophone.


3-4: Lien vers la Documentation en Anglais

Le lien vers la version anglaise (*[English](README.md)*) a été corrigé conformément aux remarques précédentes.


7-16: Table des Matières Complète

La table des matières a été enrichie avec les sections "Utilisation de l'interface de chat", "Exemples de questions", "Commandes disponibles", "Architecture", "Développement" et "Dépannage". Ceci améliore la navigation dans le document.


20-26: Section Aperçu Améliorée

La section "Aperçu" présente clairement les trois composantes principales (interface de chat interactive, traitement des données, recherche vectorielle), ce qui aide à comprendre rapidement les fonctionnalités du projet.


27-36: Liste des Fonctionnalités en Français

La liste des fonctionnalités est bien traduite et couvre l’ensemble des points clés tels que l’interface en langage naturel, la génération de réponses par IA, l’accès en temps réel aux données et le déploiement via Docker.


37-43: Instructions de Prérequis Claires

La section "Pour commencer" et notamment "Prérequis" indique clairement l’installation de Docker, Docker Compose et l’utilisation de la clé API Cohere.


62-68: Démarrage des Services

Les commandes pour démarrer les services via Docker Compose sont claires et concises.


69-73: Description du Processus de Lancement

La section qui décrit ce que les commandes lancent (base de données vectorielle, service API et interface de chat) est très informative et aide à comprendre l’architecture globale.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~71-~71: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ... données vectorielle Weaviate - Service API - Interface de chat interactive ## Utilisation de ...

(NOM_A_V_PPA)


78-84: Exemples de Questions

Les exemples de questions permettent aux utilisateurs de comprendre rapidement comment interroger le système.


92-110: Section Architecture Détaillée

La section "Architecture" décrit de manière structurée les trois composantes du système avec des explications claires pour chacune.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~101-~101: Un accent peut être ajouté pour former un participe passé, car l’adverbe « API » suit toujours un verbe conjugué.
Context: ...onnées entre les sessions 2. Service API - Gère l'indexation des données - Traite le...

(NOM_A_V_PPA)


[style] ~104-~104: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...te les requêtes en langage naturel - Gère la communication avec les modèles d'IA ...

(VERBES_FAMILIERS_PREMIUM)


[style] ~109-~109: Ce verbe peut être considéré comme familier dans un contexte formel.
Context: ...nses pour une meilleure lisibilité - Gère les commandes utilisateur ## Développe...

(VERBES_FAMILIERS_PREMIUM)


111-126: Structure du Code Source

La section "Développement" présente correctement l’arborescence du code source. Le bloc de code en « text » améliore la lisibilité et la compréhension de l’organisation du projet.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

115-115: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
query-engine/src/chat_interface.py (3)

8-8: Remove unused import.

The rich.text.Text import is not used in the code.

-from rich.text import Text
🧰 Tools
🪛 Ruff (0.8.2)

8-8: rich.text.Text imported but unused

Remove unused import: rich.text.Text

(F401)


37-62: Consider making retry parameters configurable.

The retry logic is well-implemented, but consider making the number of attempts and delay configurable through environment variables for better flexibility in different environments.

+# At the top with other constants
+MAX_ATTEMPTS = int(os.getenv('API_INIT_MAX_ATTEMPTS', '30'))
+RETRY_DELAY = int(os.getenv('API_INIT_RETRY_DELAY', '4'))

 async def initialize_connection():
     """Wait for API to be ready."""
     with console.status("[bold blue]Waiting for API and index to initialize...", spinner="dots"):
-        for attempt in range(30):  # Try for 2 minutes (30 attempts * 4 seconds)
+        for attempt in range(MAX_ATTEMPTS):
             health_data = await check_api_health()
             
             if health_data:
                 status = health_data.get("status", "unknown")
                 message = health_data.get("message", "No status message")
                 
                 if status == "ready" and health_data.get("query_engine_ready"):
                     return True
                     
                 if status == "failed":
                     console.print(f"\n[red]API initialization failed: {message}[/red]")
                     return False
                     
-                if attempt < 29:  # Don't show this message on the last attempt
+                if attempt < MAX_ATTEMPTS - 1:  # Don't show this message on the last attempt
                     console.print(f"[dim]Status: {status} - {message} (attempt {attempt + 1}/30)[/dim]")
             else:
-                if attempt < 29:
+                if attempt < MAX_ATTEMPTS - 1:
                     console.print(f"[dim]Waiting for API to start (attempt {attempt + 1}/30)[/dim]")
                     
-            await asyncio.sleep(4)
+            await asyncio.sleep(RETRY_DELAY)
         return False

63-79: Add timeout handling for API requests.

Consider adding a timeout to prevent the request from hanging indefinitely.

 async def process_query(session: aiohttp.ClientSession, question: str):
     """Process a single query and display the response."""
     with console.status("[bold blue]Thinking...", spinner="dots"):
         try:
-            async with session.get(f"{API_URL}/query", params={"question": question}) as response:
+            timeout = aiohttp.ClientTimeout(total=30)  # 30 seconds timeout
+            async with session.get(
+                f"{API_URL}/query",
+                params={"question": question},
+                timeout=timeout
+            ) as response:
                 data = await response.json()
query-engine/src/main.py (2)

31-35: Enhance error handling with specific exceptions.

The generic exception handling could mask specific initialization issues. Consider catching and handling specific exceptions.

-    except Exception as e:
+    except (ValueError, ConnectionError) as e:
         error_msg = f"Failed to initialize query engine: {str(e)}"
         logger.error(error_msg)
         indexing_status = {"status": "failed", "message": error_msg}
         raise
+    except Exception as e:
+        error_msg = f"Unexpected error during initialization: {str(e)}"
+        logger.error(error_msg)
+        indexing_status = {"status": "failed", "message": "Internal server error"}
+        raise HTTPException(status_code=500, detail="Internal server error")

63-77: Standardize response structure between success and error cases.

The success and error response structures differ. Consider using a consistent format.

         return JSONResponse({
             "status": "success",
+            "error": null,
             "question": question,
             "answer": str(response)
         })
     except Exception as e:
         logger.error(f"Query error: {str(e)}")
         return JSONResponse(
             status_code=500,
             content={
                 "status": "error",
                 "error": str(e),
-                "question": question
+                "question": question,
+                "answer": null
             }
         )
query-engine/src/indexer.py (5)

183-188: Use contextlib.suppress for cleaner exception handling.

Replace the try-except-pass block with contextlib.suppress for better readability.

-        try:
-            self.weaviate_client.collections.delete(collection_name)
-        except:
-            pass
+        from contextlib import suppress
+        with suppress(Exception):
+            self.weaviate_client.collections.delete(collection_name)
🧰 Tools
🪛 Ruff (0.8.2)

184-187: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


186-186: Do not use bare except

(E722)


200-249: Add batch processing for better performance.

The document preparation processes one row at a time. Consider using batch processing for better performance with large datasets.

     def _prepare_documents(self, df: pd.DataFrame) -> list[Document]:
+        BATCH_SIZE = 1000
         documents = []
         
-        for _, row in df.iterrows():
+        for batch_idx in range(0, len(df), BATCH_SIZE):
+            batch = df.iloc[batch_idx:batch_idx + BATCH_SIZE]
+            batch_documents = []
+            
+            for _, row in batch.iterrows():
                 content_parts = []
                 # ... rest of the processing ...
-                documents.append(Document(text=content))
+                batch_documents.append(Document(text=content))
+            
+            documents.extend(batch_documents)
+            logger.info(f"Processed batch of {len(batch_documents)} documents. Total: {len(documents)}")
         
         return documents

28-59: Add XML validation and improve error handling.

The XML parsing could be more robust with these improvements:

  1. Validate XML schema before parsing
  2. Handle specific XML parsing exceptions
  3. Provide more context in error messages
 def _parse_xml(self, xml_path: str):
     """Parse the XML data dictionary and store field definitions."""
     try:
+        if not os.path.exists(xml_path):
+            raise FileNotFoundError(f"XML file not found: {xml_path}")
+
         tree = ET.parse(xml_path)
         root = tree.getroot()
+        
+        # Validate root element
+        if root.tag != 'dataDictionary':  # adjust tag name as per your schema
+            raise ValueError(f"Invalid XML schema: root element should be 'dataDictionary', got '{root.tag}'")
         
         for heading in root.findall('.//heading'):
             label = heading.find('label')
             if label is not None:
                 # Handle both simple labels and language-specific labels
                 if label.text:
                     label_text = label.text
                 else:
                     # Try to get English label
                     label_text = label.get('{http://www.w3.org/XML/1998/namespace}lang') or ''
                 
                 # Get English and French descriptions
                 descriptions = heading.findall('description')
                 en_desc = fr_desc = ""
                 for desc in descriptions:
                     if desc.get('{http://www.w3.org/XML/1998/namespace}lang') == 'en':
                         en_desc = desc.text
                     elif desc.get('{http://www.w3.org/XML/1998/namespace}lang') == 'fr':
                         fr_desc = desc.text
                 
                 self.field_definitions[label_text] = {
                     'en_description': en_desc or label_text,
                     'fr_description': fr_desc or label_text
                 }
+    except FileNotFoundError as e:
+        logger.error(f"XML file not found: {str(e)}")
+        raise
+    except ET.ParseError as e:
+        logger.error(f"XML parsing error in {xml_path}: {str(e)}")
+        raise
     except Exception as e:
-        logger.error(f"Error parsing data dictionary XML: {str(e)}")
+        logger.error(f"Unexpected error parsing {xml_path}: {str(e)}")
         raise

181-199: Improve error handling in schema creation.

The current implementation uses a bare except clause which is not recommended.

+from contextlib import suppress
+
 def _create_weaviate_schema(self, collection_name: str):
     # Delete if exists
-    try:
+    with suppress(Exception):
         self.weaviate_client.collections.delete(collection_name)
-    except:
-        pass
     
     # Create collection with v4 syntax
     self.weaviate_client.collections.create(
🧰 Tools
🪛 Ruff (0.8.2)

184-187: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


186-186: Do not use bare except

(E722)


200-250: Optimize document preparation performance.

Consider these performance improvements:

  1. Use list comprehension for better performance
  2. Add tqdm for better progress visualization
  3. Use string joining more efficiently
+from tqdm import tqdm
+
 def _prepare_documents(self, df: pd.DataFrame) -> list[Document]:
     """
     Prepare documents from DataFrame rows dynamically.
     Handles any columns present in the data and uses data dictionary definitions.
     """
-    documents = []
+    def format_value(value):
+        if isinstance(value, (int, float)):
+            return f"{value:,}"
+        return str(value).strip()
     
-    for _, row in df.iterrows():
-        content_parts = []
-        
+    def process_row(row):
+        content_parts = []        
         # Process each column in the DataFrame
         for col in df.columns:
             value = row[col]
             if pd.isna(value):  # Handle missing values
                 continue
             
             # Get field information from data dictionary
             field_info = (self.data_dictionary.get_field_info(col) 
                         if self.data_dictionary else {'en_description': col})
             
             # Format the field value
-            if isinstance(value, (int, float)):
-                formatted_value = f"{value:,}"
-            else:
-                formatted_value = str(value).strip()
+            formatted_value = format_value(value)
             
             # Skip empty values
             if not formatted_value:
                 continue
             
             # Add to content with both English and French descriptions when available
             if (field_info.get('fr_description') and 
                 field_info.get('fr_description') != field_info.get('en_description')):
                 content_parts.append(
                     f"{field_info['en_description']} ({field_info['fr_description']}): {formatted_value}"
                 )
             else:
                 content_parts.append(
                     f"{field_info['en_description']}: {formatted_value}"
                 )
         
-        # Join all parts with proper formatting
-        content = "\n".join(content_parts)
-        documents.append(Document(text=content))
-        
-        # Log progress periodically
-        if len(documents) % 1000 == 0:
-            logger.info(f"Processed {len(documents)} documents")
+        return Document(text="\n".join(content_parts))
     
-    return documents
+    # Process all rows with progress bar
+    documents = [
+        process_row(row) for _, row in tqdm(
+            df.iterrows(),
+            total=len(df),
+            desc="Preparing documents",
+            unit="doc"
+        )
+    ]
+    return documents
query-engine/docker-compose.yaml (1)

29-31: Use named volumes for data persistence.

Using bind mounts for data directory could lead to permission issues. Consider using named volumes.

     volumes:
-      - ./data:/app/data
+      - query_data:/app/data
       - ./src:/app/src
+
+volumes:
+  query_data:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a11e6aa and 19ecf50.

⛔ Files ignored due to path filters (1)
  • query-engine/data/transfer_payments/transfer_payments.csv is excluded by !**/*.csv
📒 Files selected for processing (6)
  • .gitignore (1 hunks)
  • query-engine/Dockerfile (1 hunks)
  • query-engine/docker-compose.yaml (1 hunks)
  • query-engine/src/chat_interface.py (1 hunks)
  • query-engine/src/indexer.py (1 hunks)
  • query-engine/src/main.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .gitignore
  • query-engine/Dockerfile
🧰 Additional context used
🧠 Learnings (1)
query-engine/src/chat_interface.py (1)
Learnt from: jharris1679
PR: BuildCanada/CanadaSpends#2
File: query-engine/src/chat_interface.py:27-27
Timestamp: 2025-02-21T22:29:26.681Z
Learning: The hardcoded CSV path in `query-engine/src/chat_interface.py` will be replaced with flat storage in a future update, so suggestions about making the path configurable can be ignored.
🪛 Ruff (0.8.2)
query-engine/src/chat_interface.py

8-8: rich.text.Text imported but unused

Remove unused import: rich.text.Text

(F401)

query-engine/src/indexer.py

184-187: Use contextlib.suppress(Exception) instead of try-except-pass

Replace with contextlib.suppress(Exception)

(SIM105)


186-186: Do not use bare except

(E722)

🪛 YAMLlint (1.35.1)
query-engine/docker-compose.yaml

[error] 1-1: trailing spaces

(trailing-spaces)


[error] 2-2: trailing spaces

(trailing-spaces)


[error] 3-3: trailing spaces

(trailing-spaces)


[error] 4-4: trailing spaces

(trailing-spaces)


[error] 5-5: trailing spaces

(trailing-spaces)


[error] 6-6: trailing spaces

(trailing-spaces)


[error] 10-10: trailing spaces

(trailing-spaces)


[error] 11-11: trailing spaces

(trailing-spaces)


[error] 25-25: trailing spaces

(trailing-spaces)


[error] 50-50: trailing spaces

(trailing-spaces)

🔇 Additional comments (15)
query-engine/src/chat_interface.py (7)

12-13: LGTM! Well-configured API URL.

Good practice using environment variables with a sensible default for Docker service.


15-24: LGTM! Clear and well-formatted welcome message.

The welcome message provides clear instructions and good use of rich formatting.


25-36: LGTM! Robust API health check implementation.

Good implementation with proper async session management and error handling.


80-91: LGTM! Clean command handling implementation.

Simple and effective implementation of command handling.


92-110: LGTM! Well-structured chat loop.

Excellent implementation with proper session management and error handling.


111-136: LGTM! Comprehensive main function implementation.

Excellent error handling with informative messages and proper cleanup.


137-141: LGTM! Clean entry point implementation.

Standard Python entry point with proper asyncio integration.

query-engine/src/main.py (3)

46-51: LGTM! Well-structured FastAPI initialization.

The FastAPI app is properly initialized with clear metadata and lifespan management.


79-86: LGTM! Comprehensive health check endpoint.

The health check provides useful information about both indexing status and query engine readiness.


88-89: Disable reload mode for production.

The reload mode is enabled which is not recommended for production use as it can lead to memory leaks and unexpected behavior.

Consider using an environment variable to control the reload mode:

-    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True) 
+    reload_mode = os.getenv('DEBUG', 'false').lower() == 'true'
+    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=reload_mode)
query-engine/src/indexer.py (4)

70-91: Add environment variable validation and improve error handling.

The constructor should validate environment variables before use to prevent runtime errors.


130-180: Review async implementation and module organization.

Several issues need attention:

  1. The method is marked async but doesn't use any async operations.
  2. Late import of QueryEngine suggests a potential circular dependency.
  3. Collection name is now derived from schema name but could be made more configurable.

70-91: Add environment variable validation and improve error handling.

The constructor should validate environment variables before use to prevent runtime errors.


130-180: Review async implementation and module organization.

Several issues need attention:

  1. The method is marked async but doesn't use any async operations.
  2. Late import of QueryEngine suggests a potential circular dependency.
  3. Hardcoded collection name reduces reusability.
query-engine/docker-compose.yaml (1)

1-79: Fix YAML formatting.

Remove trailing spaces from the YAML file and add a newline at the end of the file.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 1-1: trailing spaces

(trailing-spaces)


[error] 2-2: trailing spaces

(trailing-spaces)


[error] 3-3: trailing spaces

(trailing-spaces)


[error] 4-4: trailing spaces

(trailing-spaces)


[error] 5-5: trailing spaces

(trailing-spaces)


[error] 6-6: trailing spaces

(trailing-spaces)


[error] 10-10: trailing spaces

(trailing-spaces)


[error] 11-11: trailing spaces

(trailing-spaces)


[error] 25-25: trailing spaces

(trailing-spaces)


[error] 50-50: trailing spaces

(trailing-spaces)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
query-engine/src/main.py (4)

13-15: Consider using a class-based approach instead of global variables.

Global variables can make testing difficult and potentially create thread-safety issues in a FastAPI application. Consider encapsulating the state in a class:

-# Global variables
-query_engine = None
-indexing_status = {"status": "not_started", "message": "Initialization not started"}
+class AppState:
+    def __init__(self):
+        self.query_engine = None
+        self.indexing_status = {"status": "not_started", "message": "Initialization not started"}
+
+app_state = AppState()

46-51: Consider using dynamic version from package metadata.

Instead of hardcoding the version, consider reading it from your package metadata:

+from importlib.metadata import version
+
 app = FastAPI(
     title="Canada Spends Query Engine",
     description="API for querying Canadian government spending data",
-    version="1.0.0",
+    version=version("query-engine"),
     lifespan=lifespan
 )

53-78: Enhance error handling and response format.

Consider the following improvements:

  1. Add specific error handling for different types of exceptions
  2. Include a timestamp in the response
  3. Use Pydantic models for request/response validation
+from datetime import datetime
+from pydantic import BaseModel
+
+class QueryResponse(BaseModel):
+    status: str
+    question: str
+    answer: str
+    timestamp: datetime
+
 @app.get("/query")
-async def query_data(question: str):
+async def query_data(question: str) -> QueryResponse:
     """
     Query the spending data with a natural language question.
     Returns both the question and the answer.
     """
     if not query_engine:
         raise HTTPException(status_code=503, detail="Query engine not initialized")
     try:
         response = await query_engine.query(question)
-        return JSONResponse({
-            "status": "success",
-            "question": question,
-            "answer": str(response)
-        })
+        return QueryResponse(
+            status="success",
+            question=question,
+            answer=str(response),
+            timestamp=datetime.utcnow()
+        )
     except Exception as e:
         logger.error(f"Query error: {str(e)}")
+        if isinstance(e, ValueError):
+            status_code = 400
+        else:
+            status_code = 500
         return JSONResponse(
-            status_code=500,
+            status_code=status_code,
             content={
                 "status": "error",
                 "error": str(e),
-                "question": question
+                "question": question,
+                "timestamp": datetime.utcnow().isoformat()
             }
         )

79-86: Enhance health check with additional metrics.

Consider adding more useful metrics to the health check:

 @app.get("/health")
 async def health_check():
     """Check if the API and query engine are healthy."""
     return {
         "status": indexing_status["status"],
         "message": indexing_status["message"],
-        "query_engine_ready": query_engine is not None
+        "query_engine_ready": query_engine is not None,
+        "uptime": get_uptime(),
+        "memory_usage": get_memory_usage(),
+        "total_queries": get_query_count(),
+        "timestamp": datetime.utcnow().isoformat()
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19ecf50 and 93800cb.

⛔ Files ignored due to path filters (1)
  • query-engine/data/detailed-transfer-payments/pt-tp-2024.csv is excluded by !**/*.csv
📒 Files selected for processing (1)
  • query-engine/src/main.py (1 hunks)
🔇 Additional comments (1)
query-engine/src/main.py (1)

10-11: Add environment variable validation.

Consider validating required environment variables after loading them to prevent runtime errors.

@jharris1679 jharris1679 merged commit 1a10cd8 into main Feb 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants