Add Memory component #150

galshubeli · 2025-08-28T15:03:16Z

Summary by CodeRabbit

New Features
- Memory-powered personalization influencing analysis/SQL generation; follow-up question generator for ambiguous/off-topic queries.
- Persistent memory tooling for saving/searching queries, conversation summaries, and background maintenance.
Improvements
- More accurate relevancy detection with standardized on/off-topic responses and suggestions.
- Configurable short-term memory length for chat context.
UI
- Final responses are persisted to result history while avoiding duplicate entries.
Chores
- Added a VCS-sourced dependency; installation may require Git.

coderabbitai · 2025-08-28T15:03:24Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds Graphiti-backed memory tooling and Azure OpenAI wiring, threads memory_context into AnalysisAgent and graphs route flows, refactors follow-up and relevancy agents, persists query/conversation memories, updates frontend streaming to store final responses, and introduces a VCS dependency for graphiti-core.

Changes

Cohort / File(s)	Summary
Dependencies `Pipfile`	Adds VCS dependency `graphiti-core` from `https://github.com/FalkorDB/graphiti.git` (ref: "staging").
Analysis Agent (memory-aware prompts) `api/agents/analysis_agent.py`	Adds `memory_context` parameter to `get_analysis` and `_build_prompt`; injects memory section and personalization guidance into the prompt.
Follow-up Agent Refactor `api/agents/follow_up_agent.py`	Replaces JSON-based follow-up flow with natural-language generation, renames prompt constant, subclasses `BaseAgent`, and changes API to `generate_follow_up_question(...) -> str`.
Relevancy Agent Prompt/Init `api/agents/relevancy_agent.py`	Removes explicit `__init__`, rewrites `RELEVANCY_PROMPT` to require full conversation context, focuses on SQL-translatability, and standardizes output categories.
Memory Module Export `api/memory/__init__.py`	Adds module docstring, re-exports `MemoryTool`, and defines `__all__`.
Graphiti Memory Tool (new) `api/memory/graphiti_tool.py`	New `MemoryTool` class, async factory, node ensureers, memory ingestion, query persistence/deduplication, similarity/search APIs, summarization, cleanup routines, plus `AzureOpenAIConfig`, client wiring and Graphiti client creation helpers.
Graphs Route Integration `api/routes/graphs.py`	Creates/uses `MemoryTool`, obtains `memory_context` before analysis, passes it to `AnalysisAgent.get_analysis`, persists query/conversation memories, schedules cleanup, integrates `FollowUpAgent`, and extends streaming payloads with `final_response` flags.
Config `api/config.py`	Adds `SHORT_MEMORY_LENGTH = 5`; removes several commented AWS credential lines.
Frontend Stream Handling `app/ts/modules/chat.ts`	Stores any streamed step with `final_response === true` into `state.result_history` before other handling.
Frontend Message Handling `app/ts/modules/messages.ts`	Removes pushing final-result messages into `state.result_history` (now handled by stream handler).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant FE as Frontend
  participant API as API / graphs route
  participant Mem as MemoryTool
  participant Ana as AnalysisAgent
  participant DB as Database
  participant FU as FollowUpAgent

  U->>FE: Ask question
  FE->>API: Stream request (question)
  API->>Mem: create(user_id, graph_id)
  API->>Mem: search_memories(question)
  Mem-->>API: memory_context
  API->>Ana: get_analysis(question, tables, db_desc, instructions, memory_context)
  Ana-->>API: analysis_result (SQL / flags)
  alt On-topic
    API->>DB: Execute SQL
    DB-->>API: Results / Error
    API-->>FE: Stream query_result / ai_response (with final_response)
    par Persist memory
      API->>Mem: save_query_memory(query, sql, success/error)
      API->>Mem: add_new_memory(full_response)
      API->>Mem: clean_memory()
    end
  else Off-topic
    API->>FU: generate_follow_up_question(question, analysis_result, found_tables)
    FU-->>API: follow-up text
    API-->>FE: Stream followup_questions (final_response)
    par Persist off-topic attempt
      API->>Mem: save_query_memory(query, "", false, "off-topic")
      API->>Mem: add_new_memory(full_response)
      API->>Mem: clean_memory()
    end
  end

sequenceDiagram
  autonumber
  participant FE as Frontend
  participant API as API / destructive confirm
  participant Mem as MemoryTool
  participant DB as Database

  FE->>API: Confirm destructive operation
  API->>DB: Execute SQL (destructive)
  DB-->>API: Success / Failure
  API-->>FE: Stream destructive_confirmation + final_result (final_response)
  par Persist outcome
    API->>Mem: save_query_memory(query, sql, success, error?)
    API->>Mem: add_new_memory(full_response)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Async graph worker #98 — Modifies api/agents/relevancy_agent.py; related to prompt and init changes.
Staging #123 — Overlaps agents and api/routes/graphs.py edits; likely touches the same integration.
Fix login handler #128 — Also touches api/agents/relevancy_agent.py and output formatting; strong code-level overlap.

Suggested reviewers

swilly22

Poem

Thump-thump, I tuck your queries in my burrowed brain,
I nibble context crumbs so answers bloom again.
I hop to schemas, hum a tiny tune,
I store a memory, then fetch it soon.
Together we query, and hop toward noon. 🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch memory-component

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-08-28T15:04:11Z

Dependency Review

The following issues were found:

✅ 0 vulnerable package(s)
✅ 0 package(s) with incompatible licenses
✅ 0 package(s) with invalid SPDX license definitions
⚠️ 1 package(s) with unknown licenses.

See the Details below.

License Issues

Pipfile.lock

Package	Version	License	Issue Type
posthog	6.7.0	Null	Unknown License

OpenSSF Scorecard

Package

Version

Score

Details

pip/graphiti-core

Unknown

pip/backoff

2.2.1

Unknown

pip/diskcache

5.6.3

Unknown

pip/graphiti-core

Unknown

pip/hf-xet

1.1.9

Unknown

pip/neo4j

5.28.2

🟢 6.8

Details

Check	Score	Reason
Packaging	⚠️ -1	packaging workflow not detected
Maintained	🟢 10	30 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 10
Code-Review	🟢 10	all changesets reviewed
Dangerous-Workflow	⚠️ -1	no workflows found
Token-Permissions	⚠️ -1	No tokens found
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Vulnerabilities	🟢 10	0 existing vulnerabilities detected
License	🟢 9	license file detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Signed-Releases	⚠️ -1	no releases found
Security-Policy	🟢 10	security policy file detected
Fuzzing	⚠️ 0	project is not fuzzed
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/numpy

2.3.2

🟢 7.2

Details

Check	Score	Reason
Code-Review	🟢 10	all changesets reviewed
Dependency-Update-Tool	🟢 10	update tool detected
Maintained	🟢 10	30 commit(s) and 16 issue activity found in the last 90 days -- score normalized to 10
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Packaging	⚠️ -1	packaging workflow not detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Token-Permissions	🟢 9	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Pinned-Dependencies	🟢 3	dependency not pinned by hash detected -- score normalized to 3
License	🟢 9	license file detected
Fuzzing	🟢 10	project is fuzzed
Branch-Protection	🟢 3	branch protection is not maximal on development and all release branches
Vulnerabilities	⚠️ 0	26 existing vulnerabilities detected
Signed-Releases	⚠️ 0	Project has not signed or included provenance with any releases.
SAST	🟢 9	SAST tool detected but not run on all commits
Security-Policy	🟢 9	security policy file detected
CI-Tests	🟢 10	16 out of 16 merged PRs checked by a CI test -- score normalized to 10
Contributors	🟢 10	project has 99 contributing companies or organizations

pip/posthog

6.7.0

Unknown

pip/pytz

2025.2

🟢 4.6

Details

Check	Score	Reason
Maintained	⚠️ 0	0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Code-Review	⚠️ 1	Found 3/25 approved changesets -- score normalized to 1
Security-Policy	🟢 9	security policy file detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Vulnerabilities	🟢 10	0 existing vulnerabilities detected
License	🟢 10	license file detected
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Fuzzing	🟢 10	project is fuzzed
Packaging	🟢 10	packaging workflow detected
Signed-Releases	⚠️ 0	Project has not signed or included provenance with any releases.
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/tenacity

9.1.2

🟢 5.8

Details

Check	Score	Reason
Code-Review	🟢 8	Found 17/21 approved changesets -- score normalized to 8
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Maintained	⚠️ 0	1 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	🟢 9	detected GitHub workflow tokens with excessive permissions
Vulnerabilities	🟢 10	0 existing vulnerabilities detected
License	🟢 10	license file detected
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ -1	no releases found
Packaging	🟢 10	packaging workflow detected
Security-Policy	⚠️ 0	security policy file not detected
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

Scanned Files

Pipfile
Pipfile.lock

Copilot

Pull Request Overview

This PR introduces a comprehensive memory component to QueryWeaver that stores and retrieves conversation history and database interactions using Graphiti (a graph-based memory system) with Azure OpenAI integration. The memory system helps improve SQL generation by providing context from previous user interactions and database queries.

Adds Graphiti-based memory storage for user conversations and database interactions
Integrates memory context into the SQL analysis pipeline to improve query generation
Implements follow-up question generation for failed or incomplete queries

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
api/memory/graphiti_tool.py	New comprehensive memory management tool with Graphiti integration
api/memory/init.py	Memory module initialization
api/routes/graphs.py	Integrates memory tool into query processing pipeline and adds final_response flags
api/agents/analysis_agent.py	Updates analysis agent to accept and use memory context
api/agents/follow_up_agent.py	Refactored to generate conversational follow-up questions
api/agents/relevancy_agent.py	Updates relevancy checking with conversation context awareness
app/ts/modules/messages.ts	Removes result_history tracking from final result messages
app/ts/modules/chat.ts	Adds result_history tracking for final responses based on final_response flag
Pipfile	Adds graphiti-core dependency

api/memory/graphiti_tool.py

api/routes/graphs.py

api/agents/follow_up_agent.py

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

api/agents/analysis_agent.py (1)

14-21: Make instructions and memory_context keyword-only and update callsites

Change get_analysis signature to:

def get_analysis(
    self,
    user_query: str,
    combined_tables: list,
    db_description: str,
    *,
    instructions: str | None = None,
    memory_context: str | None = None,
) -> dict:

Update the call in api/routes/graphs.py:376–378 to:

    answer_an = agent_an.get_analysis(
        queries_history[-1],
        result,
        db_description,
        instructions=instructions,
        memory_context=memory_context,
    )

Apply the same keyword-only refactor to _build_prompt and its invocation in analysis_agent.py.

api/agents/follow_up_agent.py (1)

80-83: Remove unused exception variable and shorten fallback (pylint F841/C0301).

-        except Exception as e:
-            # Fallback response if LLM call fails
-            return "I'm having trouble generating a follow-up question right now. Could you try rephrasing your question or providing more specific details about what you're looking for?"
+        except Exception:
+            # Fallback response if LLM call fails
+            return ("Sorry—I couldn’t generate a follow‑up. Could you rephrase or share the "
+                    "specific details you’re after?")

api/agents/relevancy_agent.py (1)

84-89: Async method calls sync LLM; switch to acompletion to avoid blocking.

This prevents event-loop blocking and aligns with async API.

-from litellm import completion
+from litellm import acompletion
@@
-        completion_result = completion(
+        completion_result = await acompletion(
             model=Config.COMPLETION_MODEL,
             messages=self.messages,
             temperature=0,
         )

♻️ Duplicate comments (2)

api/agents/follow_up_agent.py (1)
71-75: Lower temperature for consistency; prior feedback suggested ≤0.8.

Set to 0.8 for more predictable follow-ups.
-                temperature=0.9
+                temperature=0.8
api/routes/graphs.py (1)

610-613: Enhance callback error handling to ensure proper string formatting.

The callback should handle the case where t.exception() might return None more explicitly.

🧹 Nitpick comments (9)

api/agents/analysis_agent.py (3)

121-140: Fix trailing whitespace and long lines (pylint C0303/C0301).

Trim trailing spaces and wrap overlong prompt bullets.

-        
-        # Include memory context in the prompt if available
+        # Include memory context in the prompt if available
@@
-            13. CRITICAL PERSONALIZATION CHECK: If missing user identification/personalization is a significant or primary component of the query (e.g., "show my orders", "my account balance", "my recent purchases", "how many employees I have", "products I own") AND no user identification is available in memory context or schema, set "is_sql_translatable" to false. However, if memory context contains user identification (like user name or previous successful personal queries), then personal queries ARE translatable even if they are the primary component of the query.
+            13. CRITICAL PERSONALIZATION CHECK:
+                - If the query is primarily personal (e.g., "my orders", "my account balance")
+                  AND no user identification exists in memory or schema → set
+                  "is_sql_translatable" to false.
+                - If memory context identifies the user (name/ID/prior personal queries) →
+                  treat such queries as translatable.

Also applies to: 235-239

52-103: Too many locals in _format_schema (pylint R0914). Extract helpers.

Small extraction reduces locals and improves readability.

@@
-    def _format_schema(self, schema_data: List) -> str:
+    def _format_schema(self, schema_data: List) -> str:
+        def _format_column(column: dict) -> str:
+            col_name = column.get("columnName", "")
+            col_type = column.get("dataType", None)
+            col_description = column.get("description", "")
+            col_key = column.get("keyType", None)
+            nullable = column.get("nullable", False)
+            key_info = (
+                ", PRIMARY KEY" if col_key == "PRI" else ", FOREIGN KEY" if col_key == "FK" else ""
+            )
+            return (f"  - {col_name} ({col_type},{key_info},{col_key},{nullable}): "
+                    f"{col_description}")
@@
-            for column in columns:
-                col_name = column.get("columnName", "")
-                col_type = column.get("dataType", None)
-                col_description = column.get("description", "")
-                col_key = column.get("keyType", None)
-                nullable = column.get("nullable", False)
-
-                key_info = (
-                    ", PRIMARY KEY"
-                    if col_key == "PRI"
-                    else ", FOREIGN KEY" if col_key == "FK" else ""
-                )
-                column_str = (f"  - {col_name} ({col_type},{key_info},{col_key},"
-                             f"{nullable}): {col_description}")
-                table_str += column_str + "\n"
+            for column in columns:
+                table_str += _format_column(column) + "\n"

49-50: Consider storing full analysis JSON in history, not only SQL.

Appending only sql_query degrades future-context quality.

-        self.messages.append({"role": "assistant", "content": analysis["sql_query"]})
+        self.messages.append({"role": "assistant", "content": response})

api/agents/follow_up_agent.py (2)

56-60: Wrap long lines and strip trailing whitespace (pylint C0301/C0303).

Also stringify list fields for a cleaner prompt.

-        is_translatable = analysis_result.get("is_sql_translatable", False) if analysis_result else False
-        missing_info = analysis_result.get("missing_information", []) if analysis_result else []
-        ambiguities = analysis_result.get("ambiguities", []) if analysis_result else []
-        explanation = analysis_result.get("explanation", "No detailed explanation available") if analysis_result else "No analysis result available"
+        is_translatable = (
+            analysis_result.get("is_sql_translatable", False)
+            if analysis_result else False
+        )
+        missing_info = (
+            ", ".join(analysis_result.get("missing_information", []))
+            if analysis_result else ""
+        )
+        ambiguities = (
+            ", ".join(analysis_result.get("ambiguities", []))
+            if analysis_result else ""
+        )
+        explanation = (
+            analysis_result.get("explanation", "No detailed explanation available")
+            if analysis_result else "No analysis result available"
+        )
@@
-            MISSING_INFO=missing_info,
-            AMBIGUITIES=ambiguities,
+            MISSING_INFO=missing_info,
+            AMBIGUITIES=ambiguities,
             EXPLANATION=explanation
         )

Also applies to: 69-79

36-41: found_tables is unused; either use or mark as intentionally unused.

Minimal change: rename to _found_tables.

-        found_tables: list = None
+        _found_tables: list | None = None

api/memory/graphiti_tool.py (3)

225-225: Remove f-strings without placeholders (Ruff F541).

-                print(f"Query with same user_query and sql_query already exists, skipping creation")
+                print("Query with same user_query and sql_query already exists, skipping creation")
@@
-                memory_context += f"\n"
+                memory_context += "\n"

Also applies to: 484-486

376-377: Replace prints with logging; avoid noisy stdout in production.

Introduce a module logger and use logger.info/debug/exception.

+import logging
+logger = logging.getLogger(__name__)
@@
-                    print(f'Found database node: {node.name} with UUID: {node.uuid}')
+                    logger.debug("Found database node: %s (uuid=%s)", node.name, node.uuid)
@@
-                print(f'\nDatabase Facts Search Results for {self.graph_id}:')
+                logger.debug("Database Facts search results for %s", self.graph_id)

Also applies to: 393-398

594-607: Avoid mutating environment in AzureOpenAIConfig; derive from env only.

Setting MODEL_NAME has side effects; remove it.

-    def __init__(self):
-        # Set the model name as requested
-        os.environ["MODEL_NAME"] = "gpt-4.1"
-        
+    def __init__(self):

api/routes/graphs.py (1)

296-297: Consider adding error handling for memory tool initialization.

The memory tool creation is awaited later but lacks explicit error handling. If memory tool creation fails, it could cause issues downstream.

Add try-catch to handle potential memory tool initialization failures:

-    memory_tool_task = asyncio.create_task(MemoryTool.create(request.state.user_id, graph_id))
-
+    memory_tool_task = asyncio.create_task(MemoryTool.create(request.state.user_id, graph_id))

Then wrap the memory tool usage in error handling:

try:
    memory_tool = await memory_tool_task
    search_memories = await memory_tool.search_memories(
        query=queries_history[-1]
    )
    memory_context = search_memories.get("memory_context", "")
except Exception as e:
    logging.error(f"Memory tool initialization or search failed: {e}")
    memory_context = ""  # Fallback to empty context
    memory_tool = None  # Set to None to skip memory operations later

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d3a69e2 and a76f547.

⛔ Files ignored due to path filters (1)

Pipfile.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

Pipfile (1 hunks)
api/agents/analysis_agent.py (5 hunks)
api/agents/follow_up_agent.py (1 hunks)
api/agents/relevancy_agent.py (2 hunks)
api/memory/__init__.py (1 hunks)
api/memory/graphiti_tool.py (1 hunks)
api/routes/graphs.py (18 hunks)
app/ts/modules/chat.ts (1 hunks)
app/ts/modules/messages.ts (0 hunks)

💤 Files with no reviewable changes (1)

app/ts/modules/messages.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Adhere to pylint standards across all Python files (repository uses make lint)

Files:

api/memory/__init__.py
api/memory/graphiti_tool.py
api/agents/relevancy_agent.py
api/agents/analysis_agent.py
api/routes/graphs.py
api/agents/follow_up_agent.py

{Pipfile,Pipfile.lock}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use Pipenv for dependency management; add and lock dependencies via Pipfile and Pipfile.lock

Files:

Pipfile

app/**

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Keep the TypeScript frontend sources in app/ and build them before production runs

Files:

app/ts/modules/chat.ts

🧬 Code graph analysis (3)

app/ts/modules/chat.ts (1)

app/ts/modules/config.ts (1)

state (74-78)

api/memory/graphiti_tool.py (1)

api/config.py (3)

Config (47-137)

get_vector_size (33-43)

embed (18-31)

api/routes/graphs.py (1)

api/memory/graphiti_tool.py (6)

MemoryTool (29-591)

create (47-56)

search_memories (419-502)

save_query_memory (166-253)

add_new_memory (132-164)

clean_memory (504-518)

🪛 Ruff (0.12.2)

api/memory/graphiti_tool.py

225-225: f-string without any placeholders

Remove extraneous f prefix

(F541)

245-245: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

484-484: f-string without any placeholders

Remove extraneous f prefix

(F541)

508-508: Local variable driver is assigned to but never used

Remove assignment to unused variable driver

(F841)

509-509: Local variable query is assigned to but never used

Remove assignment to unused variable query

(F841)

api/agents/follow_up_agent.py

80-80: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🪛 GitHub Actions: Pylint

api/agents/analysis_agent.py

[error] 106-106: C0301: Line too long (115/100)

[error] 121-121: C0303: Trailing whitespace

[error] 140-140: C0303: Trailing whitespace

[error] 237-237: C0301: Line too long (206/100)

[error] 238-238: C0301: Line too long (573/100)

[error] 14-14: R0913: Too many arguments (6/5)

[error] 14-14: R0917: Too many positional arguments (6/5)

[error] 52-52: R0914: Too many local variables (21/15)

[error] 105-105: R0913: Too many arguments (6/5)

[error] 105-105: R0917: Too many positional arguments (6/5)

api/agents/follow_up_agent.py

[error] 37-37: C0303: Trailing whitespace

[error] 56-56: C0301: Line too long (105/100)

[error] 59-59: C0301: Line too long (148/100)

[error] 60-60: C0303: Trailing whitespace

[error] 69-69: C0303: Trailing whitespace

[error] 76-76: C0303: Trailing whitespace

[error] 79-79: C0303: Trailing whitespace

[error] 82-82: C0301: Line too long (187/100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: e2e-tests
GitHub Check: e2e-tests

🔇 Additional comments (10)

Pipfile (1)
19-19: Fix Pipenv VCS dependency spec and pin to a commit for reproducibility
Pipenv expects git = "https://..." (no git+) and a stable commit SHA instead of a moving branch (staging). After updating, run pipenv lock --clear and verify the import locally since pipenv isn’t available in this environment.
Apply:
- graphiti-core = {ref = "staging", git = "git+https://github.com/FalkorDB/graphiti.git"}
+ graphiti-core = {git = "https://github.com/FalkorDB/graphiti.git", ref = "<PINNED_COMMIT_SHA>"}
api/agents/relevancy_agent.py (1)

69-75: RelevancyAgent inherits BaseAgent’s constructor; existing callsites are correct.
RelevancyAgent doesn’t define its own __init__ and inherits BaseAgent.__init__(queries_history, result_history) (api/agents/utils.py:10–13), so RelevancyAgent(queries_history, result_history) remains valid. Ignore the original constructor-removal warning.

Likely an incorrect or invalid review comment.

api/memory/__init__.py (1)

1-7: LGTM! Clean module initialization.

The module correctly exposes the public API for memory tooling with proper docstring and explicit __all__ declaration.

app/ts/modules/chat.ts (1)

111-115: Good implementation for tracking final responses.

The logic correctly captures messages marked with final_response: true into the result history, ensuring memory persistence across different message types. This aligns well with the backend's memory context features.

api/routes/graphs.py (6)

367-374: Good integration of memory context into analysis.

The memory search is properly integrated before SQL generation, providing relevant context from past interactions and similar queries.

561-575: Well-structured follow-up question generation for non-translatable queries.

The implementation properly handles cases where SQL generation fails by providing helpful follow-up questions to guide the user.

616-619: Good async implementation for memory persistence.

The background task for saving conversation memory is properly implemented with error handling callbacks, ensuring the main flow isn't blocked.

661-663: Good memory tool initialization for destructive operations.

The memory tool is properly created to track confirmed destructive operations.

746-758: Consistent memory tracking for confirmed operations.

The implementation properly saves successful confirmed queries to memory with appropriate error handling.

763-775: Good error recovery with memory persistence.

Failed destructive operations are properly tracked in memory, which is important for audit trails and learning from failures.

api/memory/graphiti_tool.py

api/routes/graphs.py

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (6)

api/memory/graphiti_tool.py (4)

320-331: Fix docstring to match signature/return.

Remove non-existent arg, clarify return.

Apply:

     async def search_user_summary(self, limit: int = 5) -> str:
-        """
-        Search for user node summary extracts the user's personal information and general preferences.
-        
-        Args:
-            query: Natural language query to search for
-            limit: Maximum number of results to return
-            
-        Returns:
-            List of user node summaries with metadata
-        """
+        """
+        Retrieve summary text for the current user node.
+
+        Args:
+            limit: Maximum number of results to inspect.
+
+        Returns:
+            A summary string if found; otherwise an empty string.
+        """

46-55: Initialize user node and make vector index creation idempotent.

Ensure user node exists and ignore “already exists” errors to avoid startup failures.

Apply:

     @classmethod
     async def create(cls, user_id: str, graph_id: str) -> "MemoryTool":
         """Async factory to construct and initialize the tool."""
         self = cls(user_id, graph_id)
-        await self._ensure_database_node(graph_id, user_id)
+        await asyncio.gather(
+            self._ensure_user_node(user_id),
+            self._ensure_database_node(graph_id, user_id),
+        )
 
         vector_size = Config.EMBEDDING_MODEL.get_vector_size()
         driver = self.graphiti_client.driver
-        await driver.execute_query(f"CREATE VECTOR INDEX FOR (p:Query) ON (p.embeddings) OPTIONS {{dimension:{vector_size}, similarityFunction:'euclidean'}}")
+        try:
+            await driver.execute_query(
+                f"CREATE VECTOR INDEX FOR (p:Query) ON (p.embeddings) "
+                f"OPTIONS {{dimension:{vector_size}, similarityFunction:'euclidean'}}"
+            )
+        except Exception as e:
+            if "already exists" not in str(e).lower():
+                raise

205-241: Parameterize Cypher, remove manual escaping, and fix result handling.

String interpolation risks injection and escaping breaks embeddings. Also treat execute_query result as tuple.

Apply:

-            relationship_type = "SUCCESS" if success else "FAILED"
-            
-            # Escape quotes in the query and SQL for Cypher
-            escaped_query = query.replace("'", "\\'").replace('"', '\\"')
-            escaped_sql = sql_query.replace("'", "\\'").replace('"', '\\"')
-            escaped_error = error.replace("'", "\\'").replace('"', '\\"') if error else ""
-            embeddings = Config.EMBEDDING_MODEL.embed(escaped_query)[0]
+            relationship_type = "SUCCESS" if success else "FAILED"
+            embeddings = Config.EMBEDDING_MODEL.embed(query)[0]
@@
-            check_query = f"""
-            MATCH (db:Entity {{uuid: "{database_node_uuid}"}})
-            MATCH (db)-[r]->(q:Query)
-            WHERE q.user_query = "{escaped_query}" AND q.sql_query = "{escaped_sql}"
-            RETURN q.uuid as existing_query_uuid
-            LIMIT 1
-            """
+            check_query = """
+            MATCH (db:Entity {uuid: $db_uuid})-[]->(q:Query)
+            WHERE q.user_query = $user_query AND q.sql_query = $sql_query
+            RETURN q.uuid as existing_query_uuid
+            LIMIT 1
+            """
@@
-            check_result = await graph_driver.execute_query(check_query)
-            
-            # If query already exists, don't create a duplicate
-            if check_result[0]:  # If records exist
+            records, _, _ = await graph_driver.execute_query(
+                check_query,
+                db_uuid=database_node_uuid,
+                user_query=query,
+                sql_query=sql_query,
+            )
+            if records:
                 print(f"Query with same user_query and sql_query already exists, skipping creation")
                 return True
@@
-            cypher_query = f"""
-            MATCH (db:Entity {{uuid: "{database_node_uuid}"}})
-            MERGE (q:Query {{
-                user_query: "{escaped_query}",
-                sql_query: "{escaped_sql}",
-                success: {str(success).lower()},
-                error: "{escaped_error}",
-                timestamp: timestamp(),
-                embeddings: vecf32($embedding)
-            }})
-            CREATE (db)-[:{relationship_type} {{timestamp: timestamp()}}]->(q)
-            RETURN q.uuid as query_uuid
-            """
+            cypher_query = """
+            MATCH (db:Entity {uuid: $db_uuid})
+            MERGE (q:Query {
+                user_query: $user_query,
+                sql_query: $sql_query,
+                success: $success,
+                error: $error,
+                timestamp: timestamp(),
+                embeddings: vecf32($embedding)
+            })
+            CREATE (db)-[:REL {type: $rel_type, timestamp: timestamp()}]->(q)
+            RETURN q.uuid as query_uuid
+            """
@@
-                result = await graph_driver.execute_query(cypher_query, embedding=embeddings)
-                return True
+                await graph_driver.execute_query(
+                    cypher_query,
+                    db_uuid=database_node_uuid,
+                    user_query=query,
+                    sql_query=sql_query,
+                    success=bool(success),
+                    error=error or "",
+                    rel_type=relationship_type,
+                    embedding=embeddings,
+                )
+                return True
             except Exception as cypher_error:
                 print(f"Error executing Cypher query: {cypher_error}")
                 return False

Also applies to: 243-249, 221-226

573-579: Don’t block the event loop: use litellm.acompletion in async context.

Switch to the async API and await it.

Apply:

-from litellm import completion
+from litellm import acompletion
@@
-            response = completion(
+            response = await acompletion(
                 model=self.config.COMPLETION_MODEL,
                 messages=[{"role": "user", "content": prompt}],
                 temperature=0.1
             )

api/routes/graphs.py (2)

620-622: Done callbacks: guard None and stringify exceptions.

Avoid f-string on None exceptions.

Apply:

-            save_query_task.add_done_callback(
-                lambda t: logging.error(f"Query memory save failed: {t.exception()}") 
-                if t.exception() else logging.info("Query memory saved successfully")
-            )
+            save_query_task.add_done_callback(
+                lambda t: logging.error(f"Query memory save failed: {str(t.exception())}")
+                if t.exception() else logging.info("Query memory saved successfully")
+            )
@@
-            save_task.add_done_callback(lambda t: logging.error(f"Memory save failed: {t.exception()}") if t.exception() else logging.info("Conversation saved to memory tool"))
+            save_task.add_done_callback(
+                lambda t: logging.error(f"Memory save failed: {str(t.exception())}")
+                if t.exception() else logging.info("Conversation saved to memory tool")
+            )

Also applies to: 627-627

630-636: Cleanup scheduling runs on every request; throttle it.

Comment says “once per week,” but it’s per request. Add simple rate limiting until a scheduler is introduced.

Apply:

-            # Clean old memory in background (once per week cleanup)
-            clean_memory_task = asyncio.create_task(memory_tool.clean_memory())
-            clean_memory_task.add_done_callback(
-                lambda t: logging.error(f"Memory cleanup failed: {t.exception()}") 
-                if t.exception() else logging.info("Memory cleanup completed successfully")
-            )
+            # Clean old memory occasionally; replace with a real scheduler later
+            import random
+            if random.random() < 0.01:
+                clean_memory_task = asyncio.create_task(memory_tool.clean_memory())
+                clean_memory_task.add_done_callback(
+                    lambda t: logging.error(f"Memory cleanup failed: {str(t.exception())}") 
+                    if t.exception() else logging.info("Memory cleanup completed successfully")
+                )

🧹 Nitpick comments (4)

api/memory/graphiti_tool.py (2)

58-94: Replace prints with logging and avoid bare except Exception.

Use logging.{info,warning,exception} and catch narrower exceptions where possible. Improves observability and lint compliance.

Also applies to: 95-131, 160-163

291-305: Use parameters for k/limit and avoid f-strings; align k with requested limit.

Prevents injection and keeps results bounded.

Apply:

-            cypher_query = f"""
-                    CALL db.idx.vector.queryNodes('Query', 'embeddings', 10, vecf32($embedding))
+            cypher_query = """
+                    CALL db.idx.vector.queryNodes('Query', 'embeddings', $k, vecf32($embedding))
                         YIELD node, score
-                        MATCH (db:Entity {{uuid: $database_node_uuid}})-[r]->(node)
+                        MATCH (db:Entity {uuid: $database_node_uuid})-[r]->(node)
                         RETURN node {
                             .user_query,
                             .sql_query,
                             .success,
                             .error
                         } AS query,
                         score
                         ORDER BY score ASC
-                        LIMIT {limit}
+                        LIMIT $limit
             """
@@
-                records, header, _  = await graph_driver.execute_query(cypher_query, embedding=query_embedding, database_node_uuid=database_node_uuid)
+                records, header, _  = await graph_driver.execute_query(
+                    cypher_query,
+                    embedding=query_embedding,
+                    database_node_uuid=database_node_uuid,
+                    k=max(10, int(limit)),
+                    limit=int(limit),
+                )

api/routes/graphs.py (2)

379-387: Pass memory_context safely; guard against non-str returns.

Given MemoryTool.fix above, this should always be a string. If not, coerce to empty string to be defensive.

Apply:

-            memory_context = await memory_tool.search_memories(
+            memory_context = await memory_tool.search_memories(
                 query=queries_history[-1]
             )
+            if not isinstance(memory_context, str):
+                memory_context = ""

768-783: Use logging.exception and stringify callback exceptions.

Improves debuggability.

Apply:

-            except Exception as e:
-                logging.error("Error executing confirmed SQL query: %s", str(e))
+            except Exception:
+                logging.exception("Error executing confirmed SQL query")
@@
-                save_query_task.add_done_callback(
-                    lambda t: logging.error(f"Failed confirmed query memory save failed: {t.exception()}") 
-                    if t.exception() else logging.info("Failed confirmed query memory saved successfully")
-                )
+                save_query_task.add_done_callback(
+                    lambda t: logging.error(f"Failed confirmed query memory save failed: {str(t.exception())}") 
+                    if t.exception() else logging.info("Failed confirmed query memory saved successfully")
+                )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a76f547 and c9f75c9.

📒 Files selected for processing (3)

api/config.py (1 hunks)
api/memory/graphiti_tool.py (1 hunks)
api/routes/graphs.py (17 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

api/config.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Define AI model configuration and system prompts in api/config.py rather than scattering them across modules

Files:

api/config.py

**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Adhere to pylint standards across all Python files (repository uses make lint)

Files:

api/config.py
api/routes/graphs.py
api/memory/graphiti_tool.py

🧬 Code graph analysis (2)

api/routes/graphs.py (4)

api/agents/analysis_agent.py (2)

AnalysisAgent (9-241)

get_analysis (14-50)

api/agents/follow_up_agent.py (2)

FollowUpAgent (33-82)

generate_follow_up_question (36-82)

api/config.py (1)

Config (47-130)

api/memory/graphiti_tool.py (6)

MemoryTool (29-590)

create (47-56)

search_memories (419-497)

save_query_memory (166-253)

add_new_memory (132-164)

clean_memory (499-518)

api/memory/graphiti_tool.py (1)

api/config.py (3)

Config (47-130)

get_vector_size (33-43)

embed (18-31)

🪛 Ruff (0.12.2)

api/config.py

58-58: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

api/routes/graphs.py

559-559: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

768-768: Do not catch blind exception: Exception

(BLE001)

769-769: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

781-781: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

api/memory/graphiti_tool.py

89-89: Consider moving this statement to an else block

(TRY300)

91-91: Do not catch blind exception: Exception

(BLE001)

126-126: Consider moving this statement to an else block

(TRY300)

128-128: Do not catch blind exception: Exception

(BLE001)

160-160: Do not catch blind exception: Exception

(BLE001)

166-166: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

225-225: f-string without any placeholders

Remove extraneous f prefix

(F541)

245-245: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

246-246: Consider moving this statement to an else block

(TRY300)

247-247: Do not catch blind exception: Exception

(BLE001)

251-251: Do not catch blind exception: Exception

(BLE001)

310-310: Consider moving this statement to an else block

(TRY300)

312-312: Do not catch blind exception: Exception

(BLE001)

316-316: Do not catch blind exception: Exception

(BLE001)

346-346: Consider moving this statement to an else block

(TRY300)

348-348: Do not catch blind exception: Exception

(BLE001)

415-415: Do not catch blind exception: Exception

(BLE001)

484-484: f-string without any placeholders

Remove extraneous f prefix

(F541)

488-488: Consider moving this statement to an else block

(TRY300)

490-490: Do not catch blind exception: Exception

(BLE001)

516-516: Do not catch blind exception: Exception

(BLE001)

582-584: Consider moving this statement to an else block

(TRY300)

586-586: Do not catch blind exception: Exception

(BLE001)

622-622: Avoid specifying long messages outside the exception class

(TRY003)

624-624: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)

api/config.py (1)

57-57: Good addition: bounded short-term memory length.

Clear, sensible default for trimming chat history.

api/routes/graphs.py (1)

295-305: History truncation logic looks correct.

Keeps N latest queries and aligns results size accordingly.

api/config.py

api/memory/graphiti_tool.py

api/routes/graphs.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (9)

api/routes/graphs.py (2)

621-631: Harden done_callbacks: guard None exception and stringify.

Avoid NoneType access and ensure readable logs.

-            save_query_task.add_done_callback(
-                lambda t: logging.error(f"Query memory save failed: {t.exception()}") 
-                if t.exception() else logging.info("Query memory saved successfully")
-            )
+            save_query_task.add_done_callback(
+                lambda t: logging.error(f"Query memory save failed: {str(t.exception())}") 
+                if t.exception() else logging.info("Query memory saved successfully")
+            )
@@
-            save_task.add_done_callback(lambda t: logging.error(f"Memory save failed: {t.exception()}") if t.exception() else logging.info("Conversation saved to memory tool"))
+            save_task.add_done_callback(
+                lambda t: logging.error(f"Memory save failed: {str(t.exception())}")
+                if t.exception() else logging.info("Conversation saved to memory tool")
+            )
@@
-                lambda t: logging.error(f"Confirmed query memory save failed: {t.exception()}") 
+                lambda t: logging.error(f"Confirmed query memory save failed: {str(t.exception())}")

Also applies to: 766-769

632-637: Run memory cleanup periodically, not per request.

Comment says weekly but code triggers every call; add simple rate-limit/scheduler.

-            # Clean old memory in background (once per week cleanup)
-            clean_memory_task = asyncio.create_task(memory_tool.clean_memory())
-            clean_memory_task.add_done_callback(
-                lambda t: logging.error(f"Memory cleanup failed: {t.exception()}") 
-                if t.exception() else logging.info("Memory cleanup completed successfully")
-            )
+            # Clean memory occasionally; TODO: move to scheduler/startup task
+            import random
+            if random.random() < 0.01:
+                clean_memory_task = asyncio.create_task(memory_tool.clean_memory())
+                clean_memory_task.add_done_callback(
+                    lambda t: logging.error(f"Memory cleanup failed: {str(t.exception())}") 
+                    if t.exception() else logging.info("Memory cleanup completed successfully")
+                )

api/memory/graphiti_tool.py (7)

32-36: Sanitize user_id before composing DB name.

Prevents invalid identifiers and abuses.

-        user_memory_db = f"{user_id}-memory"
+        import re
+        def _sanitize(s: str) -> str:
+            return re.sub(r"[^A-Za-z0-9_-]", "-", s)[:64] or "user"
+        user_memory_db = f"{_sanitize(user_id)}-memory"

320-331: Fix docstring: remove non-existent query arg and clarify return.

-    async def search_user_summary(self, limit: int = 5) -> str:
-        """
-        Search for user node summary extracts the user's personal information and general preferences.
-        
-        Args:
-            query: Natural language query to search for
-            limit: Maximum number of results to return
-            
-        Returns:
-            List of user node summaries with metadata
-        """
+    async def search_user_summary(self, limit: int = 5) -> str:
+        """
+        Retrieve summary text for the current user node.
+
+        Args:
+            limit: Max results to inspect.
+
+        Returns:
+            A summary string; empty if none.
+        """

419-431: Return type docstring mismatch.

Function returns a string; update docstring accordingly.

-        Returns:
-            Dict containing user_summary, database_facts, similar_queries, and memory_context
+        Returns:
+            memory_context: a single string composed of user, database, and similar-query context

591-604: Centralize model names via Config to avoid drift.

Remove hardcoded "gpt-4.1"/"text-embedding-ada-002" and read from api.config.Config.

 class AzureOpenAIConfig:
@@
-        # Set the model name as requested
-        os.environ["MODEL_NAME"] = "gpt-4.1"
-        
         self.api_key = os.getenv('AZURE_API_KEY')
-        self.endpoint = os.getenv('AZURE_API_BASE') 
+        self.endpoint = os.getenv('AZURE_API_BASE')
         self.api_version = os.getenv('AZURE_API_VERSION', '2024-02-01')
-        self.model_choice = "gpt-4.1"  # Use the model name directly
-        self.embedding_model = "text-embedding-ada-002"  # Use model name, not deployment
+        # Derive names from Config (strip optional provider prefix like "azure/")
+        self.model_choice = Config.COMPLETION_MODEL.split("/", 1)[-1]
+        self.embedding_model = Config.EMBEDDING_MODEL_NAME.split("/", 1)[-1]
         self.small_model = os.getenv('AZURE_SMALL_MODEL', 'gpt-4o-mini')
@@
-        small_model=config.small_model_deployment,
-        model=config.llm_deployment,
+        small_model=config.small_model_deployment,
+        model=config.llm_deployment,
@@
-            config=OpenAIEmbedderConfig(embedding_model=config.embedding_deployment),
+            config=OpenAIEmbedderConfig(embedding_model=config.embedding_deployment),

Also applies to: 645-648, 655-663

46-55: Make initialization idempotent and ensure User node exists.

Create user+db nodes together and guard vector index creation.

     async def create(cls, user_id: str, graph_id: str) -> "MemoryTool":
@@
-        await self._ensure_database_node(graph_id, user_id)
-
-        vector_size = Config.EMBEDDING_MODEL.get_vector_size()
-        driver = self.graphiti_client.driver
-        await driver.execute_query(f"CREATE VECTOR INDEX FOR (p:Query) ON (p.embeddings) OPTIONS {{dimension:{vector_size}, similarityFunction:'euclidean'}}")
+        await asyncio.gather(
+            self._ensure_user_node(user_id),
+            self._ensure_database_node(graph_id, user_id),
+        )
+        vector_size = Config.EMBEDDING_MODEL.get_vector_size()
+        driver = self.graphiti_client.driver
+        try:
+            await driver.execute_query(
+                f"CREATE VECTOR INDEX FOR (p:Query) ON (p.embeddings) "
+                f"OPTIONS {{dimension:{vector_size}, similarityFunction:'euclidean'}}"
+            )
+        except Exception as e:
+            if "already exists" not in str(e).lower():
+                raise

571-576: Do not block event loop: use async LLM call.

Switch to acompletion in async context.

-from litellm import completion
+from litellm import acompletion
@@
-            response = completion(
+            response = await acompletion(
                 model=self.config.COMPLETION_MODEL,
                 messages=[{"role": "user", "content": prompt}],
                 temperature=0.1
             )

166-254: Cypher injection and duplicate-check bugs in save_query_memory.

Manual escaping and f-strings are unsafe; also mis-handle execute_query result tuple and unused var. Parameterize and normalize relationship type.

@@
-            relationship_type = "SUCCESS" if success else "FAILED"
-            
-            # Escape quotes in the query and SQL for Cypher
-            escaped_query = query.replace("'", "\\'").replace('"', '\\"')
-            escaped_sql = sql_query.replace("'", "\\'").replace('"', '\\"')
-            escaped_error = error.replace("'", "\\'").replace('"', '\\"') if error else ""
-            embeddings = Config.EMBEDDING_MODEL.embed(escaped_query)[0]
+            relationship_type = "SUCCESS" if success else "FAILED"
+            embeddings = Config.EMBEDDING_MODEL.embed(query)[0]
@@
-            check_query = f"""
-            MATCH (db:Entity {{uuid: "{database_node_uuid}"}})
-            MATCH (db)-[r]->(q:Query)
-            WHERE q.user_query = "{escaped_query}" AND q.sql_query = "{escaped_sql}"
-            RETURN q.uuid as existing_query_uuid
-            LIMIT 1
-            """
+            check_query = """
+            MATCH (db:Entity {uuid: $db_uuid})-[]->(q:Query)
+            WHERE q.user_query = $user_query AND q.sql_query = $sql_query
+            RETURN q.uuid as existing_query_uuid
+            LIMIT 1
+            """
@@
-            check_result = await graph_driver.execute_query(check_query)
-            
-            # If query already exists, don't create a duplicate
-            if check_result[0]:  # If records exist
+            records, _, _ = await graph_driver.execute_query(
+                check_query,
+                db_uuid=database_node_uuid,
+                user_query=query,
+                sql_query=sql_query,
+            )
+            if records:
                 print(f"Query with same user_query and sql_query already exists, skipping creation")
                 return True
@@
-            cypher_query = f"""
-            MATCH (db:Entity {{uuid: "{database_node_uuid}"}})
-            MERGE (q:Query {{
-                user_query: "{escaped_query}",
-                sql_query: "{escaped_sql}",
-                success: {str(success).lower()},
-                error: "{escaped_error}",
-                timestamp: timestamp(),
-                embeddings: vecf32($embedding)
-            }})
-            CREATE (db)-[:{relationship_type} {{timestamp: timestamp()}}]->(q)
-            RETURN q.uuid as query_uuid
-            """
+            cypher_query = """
+            MATCH (db:Entity {uuid: $db_uuid})
+            MERGE (q:Query {
+                user_query: $user_query,
+                sql_query: $sql_query,
+                success: $success,
+                error: $error,
+                timestamp: timestamp(),
+                embeddings: vecf32($embedding)
+            })
+            CREATE (db)-[:REL {type: $rel_type, timestamp: timestamp()}]->(q)
+            RETURN q.uuid as query_uuid
+            """
@@
-                result = await graph_driver.execute_query(cypher_query, embedding=embeddings)
-                return True
+                await graph_driver.execute_query(
+                    cypher_query,
+                    db_uuid=database_node_uuid,
+                    user_query=query,
+                    sql_query=sql_query,
+                    success=bool(success),
+                    error=error or "",
+                    rel_type=relationship_type,
+                    embedding=embeddings,
+                )
+                return True

🧹 Nitpick comments (3)

api/routes/graphs.py (2)

391-395: Normalize error handling to Optional[str] and avoid bool/string mixing.

Use None as the sentinel and keep a consistent type; this simplifies truthiness checks and serialization.

@@
-            # Initialize response variables
-            user_readable_response = ""
-            follow_up_result = ""
-            execution_error = False
+            # Initialize response variables
+            user_readable_response = ""
+            follow_up_result = ""
+            execution_error: Optional[str] = None
@@
-                except Exception as e:
-                    execution_error = str(e)
+                except Exception as e:
+                    execution_error = str(e)
@@
-                    ) + MESSAGE_DELIMITER
+                    ) + MESSAGE_DELIMITER
@@
-            else:
-                execution_error = "Missing information"
+            else:
+                execution_error = "Missing information"
@@
-            if execution_error:
+            if execution_error:
                 full_response["error"] = execution_error
                 full_response["success"] = False
             else:
                 full_response["success"] = True
@@
-                memory_tool.save_query_memory(
+                memory_tool.save_query_memory(
                     query=queries_history[-1],
                     sql_query=answer_an["sql_query"],
                     success=full_response["success"],
-                    error=execution_error
+                    error=execution_error
                 )

Add import (top of file):

from typing import Optional

Also applies to: 559-559, 567-568, 571-577, 605-611, 614-620

561-561: Prefer logging.exception for errors with stack traces.

Improves observability and aligns with linter hint TRY400.

-                    logging.error("Error executing SQL query: %s", str(e))
+                    logging.exception("Error executing SQL query")
@@
-                logging.error("Error executing confirmed SQL query: %s", str(e))
+                logging.exception("Error executing confirmed SQL query")
@@
-                save_query_task.add_done_callback(
-                    lambda t: logging.error(f"Failed confirmed query memory save failed: {t.exception()}") 
+                save_query_task.add_done_callback(
+                    lambda t: logging.error(f"Failed confirmed query memory save failed: {str(t.exception())}")

Also applies to: 772-772, 784-784

api/memory/graphiti_tool.py (1)

290-305: Parameterize/guard LIMIT and clamp result size.

Avoid f-string LIMIT; enforce integer bounds.

-            cypher_query = f"""
+            cypher_query = """
                     CALL db.idx.vector.queryNodes('Query', 'embeddings', 10, vecf32($embedding))
                         YIELD node, score
                         MATCH (db:Entity {{uuid: $database_node_uuid}})-[r]->(node)
                         RETURN node {
                             .user_query,
                             .sql_query,
                             .success,
                             .error
                         } AS query,
                         score
                         ORDER BY score ASC
-                        LIMIT {limit}
+                        LIMIT $limit
             """
@@
-                records, header, _  = await graph_driver.execute_query(cypher_query, embedding=query_embedding, database_node_uuid=database_node_uuid)
+                records, header, _  = await graph_driver.execute_query(
+                    cypher_query,
+                    embedding=query_embedding,
+                    database_node_uuid=database_node_uuid,
+                    limit=int(max(0, min(limit, 50))),
+                )

Also applies to: 308-314

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c9f75c9 and 70340ba.

⛔ Files ignored due to path filters (1)

Pipfile.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

Pipfile (1 hunks)
api/memory/graphiti_tool.py (1 hunks)
api/routes/graphs.py (17 hunks)
app/ts/modules/chat.ts (1 hunks)
app/ts/modules/messages.ts (0 hunks)

💤 Files with no reviewable changes (1)

app/ts/modules/messages.ts

🚧 Files skipped from review as they are similar to previous changes (2)

Pipfile
app/ts/modules/chat.ts

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Adhere to pylint standards across all Python files (repository uses make lint)

Files:

api/routes/graphs.py
api/memory/graphiti_tool.py

🧠 Learnings (1)

📚 Learning: 2025-08-24T17:15:21.337Z

Learnt from: CR
PR: FalkorDB/QueryWeaver#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-08-24T17:15:21.337Z
Learning: Applies to api/config.py : Define AI model configuration and system prompts in `api/config.py` rather than scattering them across modules

Applied to files:

api/memory/graphiti_tool.py

🧬 Code graph analysis (2)

api/routes/graphs.py (4)

api/agents/analysis_agent.py (2)

AnalysisAgent (9-241)

get_analysis (14-50)

api/agents/follow_up_agent.py (2)

FollowUpAgent (33-82)

generate_follow_up_question (36-82)

api/config.py (1)

Config (47-130)

api/memory/graphiti_tool.py (6)

MemoryTool (29-588)

create (47-56)

search_memories (419-492)

save_query_memory (166-253)

add_new_memory (132-164)

clean_memory (494-517)

api/memory/graphiti_tool.py (1)

api/config.py (3)

Config (47-130)

get_vector_size (33-43)

embed (18-31)

🪛 Ruff (0.12.2)

api/routes/graphs.py

561-561: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

771-771: Do not catch blind exception: Exception

(BLE001)

772-772: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

784-784: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

api/memory/graphiti_tool.py

89-89: Consider moving this statement to an else block

(TRY300)

91-91: Do not catch blind exception: Exception

(BLE001)

126-126: Consider moving this statement to an else block

(TRY300)

128-128: Do not catch blind exception: Exception

(BLE001)

160-160: Do not catch blind exception: Exception

(BLE001)

225-225: f-string without any placeholders

Remove extraneous f prefix

(F541)

245-245: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

246-246: Consider moving this statement to an else block

(TRY300)

247-247: Do not catch blind exception: Exception

(BLE001)

251-251: Do not catch blind exception: Exception

(BLE001)

310-310: Consider moving this statement to an else block

(TRY300)

312-312: Do not catch blind exception: Exception

(BLE001)

316-316: Do not catch blind exception: Exception

(BLE001)

346-346: Consider moving this statement to an else block

(TRY300)

348-348: Do not catch blind exception: Exception

(BLE001)

415-415: Do not catch blind exception: Exception

(BLE001)

484-484: f-string without any placeholders

Remove extraneous f prefix

(F541)

488-488: Consider moving this statement to an else block

(TRY300)

490-490: Do not catch blind exception: Exception

(BLE001)

514-514: Consider moving this statement to an else block

(TRY300)

515-515: Do not catch blind exception: Exception

(BLE001)

580-582: Consider moving this statement to an else block

(TRY300)

584-584: Do not catch blind exception: Exception

(BLE001)

620-620: Avoid specifying long messages outside the exception class

(TRY003)

622-622: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: e2e-tests
GitHub Check: e2e-tests

🔇 Additional comments (3)

api/routes/graphs.py (2)

297-307: Short-term memory truncation logic looks correct.

History/results are trimmed consistently with Config.SHORT_MEMORY_LENGTH.

310-311: Good: warm up MemoryTool early.

Creating the tool concurrently hides its latency before first use.

api/memory/graphiti_tool.py (1)

498-515: Cleanup query looks good.

Correct ordering (DESC) and parameters; safe fallback return.

api/routes/graphs.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

api/routes/graphs.py (3)

475-485: Avoid blocking the event loop on DB execution

execute_sql_query appears synchronous; calling it inside the async generator can block the FastAPI event loop. Offload to a thread.

-                    is_schema_modifying, operation_type = (
-                        loader_class.is_schema_modifying_query(sql_query)
-                    )
-
-                    query_results = loader_class.execute_sql_query(answer_an["sql_query"], db_url)
+                    is_schema_modifying, operation_type = loader_class.is_schema_modifying_query(sql_query)
+                    query_results = await asyncio.to_thread(
+                        loader_class.execute_sql_query, answer_an["sql_query"], db_url
+                    )

51-60: Fix mutable default in Pydantic model

chat: list = [] creates shared state across requests. Use Field(default_factory=list).

-from pydantic import BaseModel
+from pydantic import BaseModel, Field
@@
 class ConfirmRequest(BaseModel):
@@
-    chat: list = []
+    chat: list = Field(default_factory=list)

684-689: Stream consistency: include final_response flags in confirm route

Align with the main route and frontend expectations by flagging interim vs final messages.

-                if not loader_class:
-                    yield json.dumps({
-                        "type": "error",
-                        "message": "Unable to determine database type"
-                    }) + MESSAGE_DELIMITER
+                if not loader_class:
+                    yield json.dumps({
+                        "type": "error",
+                        "final_response": True,
+                        "message": "Unable to determine database type"
+                    }) + MESSAGE_DELIMITER
                     return
@@
-                step = {"type": "reasoning_step",
-                       "message": "Step 2: Executing confirmed SQL query"}
+                step = {"type": "reasoning_step",
+                        "final_response": False,
+                        "message": "Step 2: Executing confirmed SQL query"}
                 yield json.dumps(step) + MESSAGE_DELIMITER
@@
-                yield json.dumps(
+                yield json.dumps(
                     {
                         "type": "query_result",
                         "data": query_results,
+                        "final_response": False
                     }
                 ) + MESSAGE_DELIMITER
@@
-                    step = {"type": "reasoning_step",
-                           "message": "Step 3: Schema change detected - refreshing graph..."}
+                    step = {"type": "reasoning_step",
+                            "final_response": False,
+                            "message": "Step 3: Schema change detected - refreshing graph..."}
                     yield json.dumps(step) + MESSAGE_DELIMITER
@@
-                        yield json.dumps(
+                        yield json.dumps(
                             {
                                 "type": "schema_refresh",
+                                "final_response": False,
                                 "message": (f"✅ Schema change detected ({operation_type} "
                                           "operation)\n\n🔄 Graph schema has been automatically "
                                           "refreshed with the latest database structure."),
                                 "refresh_status": "success"
                             }
                         ) + MESSAGE_DELIMITER
@@
-                        yield json.dumps(
+                        yield json.dumps(
                             {
                                 "type": "schema_refresh",
+                                "final_response": False,
                                 "message": (f"⚠️ Schema was modified but graph refresh failed: "
                                           f"{refresh_message}"),
                                 "refresh_status": "failed"
                             }
                         ) + MESSAGE_DELIMITER
@@
-                yield json.dumps(
+                yield json.dumps(
                     {
                         "type": "ai_response",
+                        "final_response": True,
                         "message": user_readable_response,
                     }
                 ) + MESSAGE_DELIMITER
@@
-                yield json.dumps(
-                    {"type": "error", "message": "Error executing query"}
+                yield json.dumps(
+                    {"type": "error", "final_response": True, "message": "Error executing query"}
                 ) + MESSAGE_DELIMITER

Also applies to: 691-694, 700-705, 717-727, 728-735, 751-756, 789-791

♻️ Duplicate comments (3)

api/routes/graphs.py (3)

391-395: Normalize error sentinel to Optional[str] (follow prior suggestion)

Use None instead of False for “no error” to avoid type ambiguity.

-            execution_error = False
+            execution_error: Optional[str] = None

Additional import needed:

from typing import Optional

633-639: Throttle memory cleanup (weekly) or move to a scheduler

Cleanup currently runs on every request, contradicting the comment and adding overhead. Gate by time or probability, or schedule at app startup.

Example (module-level throttle):

# at module top
_CLEAN_LAST_RUN = 0
_CLEAN_LOCK = asyncio.Lock()
_CLEAN_INTERVAL_SECS = 7 * 24 * 3600

-            # Clean old memory in background (once per week cleanup)
-            clean_memory_task = asyncio.create_task(memory_tool.clean_memory())
-            clean_memory_task.add_done_callback(
-                lambda t: logging.error(f"Memory cleanup failed: {t.exception()}") 
-                if t.exception() else logging.info("Memory cleanup completed successfully")
-            )
+            # Clean old memory occasionally (approx weekly)
+            now = time.time()
+            async with _CLEAN_LOCK:
+                if now - _CLEAN_LAST_RUN >= _CLEAN_INTERVAL_SECS:
+                    _CLEAN_LAST_RUN = now
+                    clean_memory_task = asyncio.create_task(memory_tool.clean_memory())
+                    clean_memory_task.add_done_callback(
+                        lambda t: logging.error(f"Memory cleanup failed: {t.exception()}") 
+                        if t.exception() else logging.info("Memory cleanup completed successfully")
+                    )

297-307: Remove stray extra blank line after truncation block

Minor style cleanup to keep linters happy.

-                result_history = []
-
+                result_history = []

🧹 Nitpick comments (7)

api/routes/graphs.py (7)
310-311: Create MemoryTool only on on-topic path to avoid wasted work/leaks

The task is created before relevancy is known; on off-topic it’s never awaited/cancelled. Instantiate only when needed.
-    memory_tool_task = asyncio.create_task(MemoryTool.create(request.state.user_id, graph_id))
+    # Defer MemoryTool creation until query is confirmed on-topic
-            memory_tool = await memory_tool_task
+            memory_tool = await MemoryTool.create(request.state.user_id, graph_id)
             memory_context = await memory_tool.search_memories(
                 query=queries_history[-1]
             )
Also applies to: 381-385

559-566: Log stack trace for SQL execution failures

Prefer logging.exception to capture traceback.
-                except Exception as e:
-                    execution_error = str(e)
+                except Exception as e:
+                    execution_error = str(e)
                     overall_elapsed = time.perf_counter() - overall_start
-                    logging.error("Error executing SQL query: %s", str(e))
+                    logging.exception("Error executing SQL query")
89-93: Docstring mismatch with implementation

The function does not wrap with repr(); either update docstring or implement repr wrapping. Recommend updating docstring for accuracy.
-    Sanitize input for safe logging—remove newlines, 
-    carriage returns, tabs, and wrap in repr().
+    Sanitize input for safe logging — remove newlines, carriage returns, and tabs.
77-83: Broaden DB URL detection for MySQL variants

Support common SQLAlchemy-style prefixes.
-    elif db_url_lower.startswith('mysql://'):
+    elif db_url_lower.startswith(('mysql://', 'mysql+pymysql://', 'mysql+asyncmy://')):
         return 'mysql', MySQLLoader
364-371: Optional: generate a conversational off-topic reply

When off-topic, consider using FollowUpAgent to propose helpful rephrasing/questions instead of only echoing the reason.

411-411: Nit: fix spelling

“postgress” → “Postgres”.
-            # If the SQL query is valid, execute it using the postgress database db_url
+            # If the SQL query is valid, execute it using the Postgres database db_url
666-671: Security: confirm route executes client-sent SQL

The confirm endpoint executes sql_query from the request body. Bind this to a server-issued token or hash of the previously generated SQL to prevent arbitrary injection via the confirmation call.

Store the generated SQL (and a short-lived token) server-side when you emit the destructive_confirmation.

In confirm route, validate the token and ensure the SQL matches what was issued before executing.

Also applies to: 699-700

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 70340ba and ef2ba17.

📒 Files selected for processing (1)

api/routes/graphs.py (17 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Adhere to pylint standards across all Python files (repository uses make lint)

Files:

api/routes/graphs.py

🧬 Code graph analysis (1)

api/routes/graphs.py (7)

api/agents/analysis_agent.py (1)

get_analysis (14-50)

api/agents/response_formatter_agent.py (1)

ResponseFormatterAgent (42-135)

api/agents/follow_up_agent.py (2)

FollowUpAgent (33-82)

generate_follow_up_question (36-82)

api/auth/user_management.py (1)

token_required (254-294)

api/config.py (1)

Config (47-130)

api/graph.py (2)

find (241-331)

get_db_description (39-54)

api/memory/graphiti_tool.py (6)

MemoryTool (29-588)

create (47-56)

search_memories (419-492)

save_query_memory (166-253)

add_new_memory (132-164)

clean_memory (494-517)

🪛 Ruff (0.12.2)

api/routes/graphs.py

493-493: SyntaxError: Expected ':', found '='

560-560: SyntaxError: Expected ',', found '='

561-561: SyntaxError: Expected ':', found name

561-561: SyntaxError: Expected ',', found '='

562-562: SyntaxError: Expected ':', found name

563-563: SyntaxError: Expected ',', found name

567-567: SyntaxError: Expected ':', found 'yield'

570-570: SyntaxError: Expected ',', found 'else'

570-570: SyntaxError: Expected ',', found ':'

571-571: SyntaxError: Expected ':', found '='

573-573: SyntaxError: Expected ':', found name

573-573: SyntaxError: Expected ',', found '='

580-580: SyntaxError: Expected ':', found 'yield'

588-588: SyntaxError: Expected ',', found name

588-588: SyntaxError: Expected ':', found '='

589-589: SyntaxError: Expected ':', found name

596-596: SyntaxError: Expected ',', found name

596-596: SyntaxError: Expected ':', found '='

599-599: SyntaxError: Expected ':', found name

599-599: SyntaxError: Expected ',', found '='

606-606: SyntaxError: Expected 'else', found ':'

607-607: SyntaxError: Expected ',', found '='

608-608: SyntaxError: Expected ':', found name

608-608: SyntaxError: Expected ',', found '='

609-609: SyntaxError: Expected ':', found 'else'

609-609: SyntaxError: Expected ',', found ':'

610-610: SyntaxError: Expected ':', found '='

614-614: SyntaxError: Expected ':', found name

614-614: SyntaxError: Expected ',', found '='

622-622: SyntaxError: Expected ':', found name

628-628: SyntaxError: Expected ',', found name

628-628: SyntaxError: Expected ':', found '='

630-630: SyntaxError: Expected ':', found name

631-631: SyntaxError: Expected ',', found name

634-634: SyntaxError: Expected ':', found name

634-634: SyntaxError: Expected ',', found '='

635-635: SyntaxError: Expected ':', found name

🪛 GitHub Actions: Pylint

api/routes/graphs.py

[error] 492-492: Pylint E0001: Parsing failed: '{' was never closed

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: e2e-tests

api/routes/graphs.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

api/routes/graphs.py (1)
59-59: Fix mutable default in ConfirmRequest.chat

Using a mutable default list can cause cross-request state leaks. Make it Optional and guard at use-site.
 class ConfirmRequest(BaseModel):
@@
-    chat: list = []
+    chat: Optional[List[Any]] = None
Also handle None when reading:
-    queries_history = confirm_data.chat if hasattr(confirm_data, 'chat') else []
+    queries_history = (confirm_data.chat or []) if hasattr(confirm_data, 'chat') else []
Add missing imports near the top of the file:
from typing import Any, List, Optional
Also applies to: 666-667

♻️ Duplicate comments (2)

api/routes/graphs.py (2)

391-395: Normalize error handling: keep Exception, log stacktrace, serialize lazily

Use an Exception sentinel (None when no error), log with logging.exception, and convert to str only when persisting/serializing. This aligns with Ruff TRY400 and preserves type info.

-            user_readable_response = ""
-            follow_up_result = ""
-            execution_error = False
+            user_readable_response = ""
+            follow_up_result = ""
+            execution_error: Optional[Exception] = None
@@
-                except Exception as e:
-                    execution_error = str(e)
+                except Exception as e:
+                    execution_error = e
                     overall_elapsed = time.perf_counter() - overall_start
-                    logging.error("Error executing SQL query: %s", str(e))
+                    logging.exception("Error executing SQL query")
@@
-            else:
-                execution_error = "Missing information"
+            else:
+                execution_error = ValueError("Missing information")
                 # SQL query is not valid/translatable - generate follow-up questions
                 follow_up_result = follow_up_agent.generate_follow_up_question(
                     user_question=queries_history[-1],
                     analysis_result=answer_an,
                     found_tables=result
                 )
@@
-            if execution_error:
-                full_response["error"] = execution_error
+            if execution_error:
+                full_response["error"] = str(execution_error)
                 full_response["success"] = False
             else:
                 full_response["success"] = True
@@
                 memory_tool.save_query_memory(
                     query=queries_history[-1],
                     sql_query=answer_an["sql_query"],
                     success=full_response["success"],
-                    error=execution_error
+                    error=str(execution_error) if execution_error else None
                 )

Also applies to: 558-561, 570-576, 605-610, 613-620

621-624: Harden background task callbacks and throttle cleanup

Convert exceptions to strings in callbacks to avoid None formatting issues.
Cleanup runs on every request; throttle until a scheduler is added.

 save_query_task.add_done_callback(
-    lambda t: logging.error(f"Query memory save failed: {t.exception()}") 
-    if t.exception() else logging.info("Query memory saved successfully")
+    lambda t: logging.error("Query memory save failed: %s", str(t.exception()))
+    if t.exception() else logging.info("Query memory saved successfully")
 )
@@
 save_task.add_done_callback(
-    lambda t: logging.error(f"Memory save failed: {t.exception()}") if t.exception() else logging.info("Conversation saved to memory tool")
+    lambda t: logging.error("Memory save failed: %s", str(t.exception()))
+    if t.exception() else logging.info("Conversation saved to memory tool")
 )
@@
-# Clean old memory in background (once per week cleanup)
-clean_memory_task = asyncio.create_task(memory_tool.clean_memory())
-clean_memory_task.add_done_callback(
-    lambda t: logging.error(f"Memory cleanup failed: {t.exception()}") 
-    if t.exception() else logging.info("Memory cleanup completed successfully")
-)
+# Clean old memory in background (run occasionally, not on every request)
+import random
+if random.random() < 0.01:  # ~1% of requests
+    clean_memory_task = asyncio.create_task(memory_tool.clean_memory())
+    clean_memory_task.add_done_callback(
+        lambda t: logging.error("Memory cleanup failed: %s", str(t.exception()))
+        if t.exception() else logging.info("Memory cleanup completed successfully")
+    )

Also applies to: 627-631, 632-637

🧹 Nitpick comments (4)

api/routes/graphs.py (4)

364-371: Consider generating a follow-up suggestion even for off-topic queries

You already instantiate FollowUpAgent. Using it here can return a helpful redirect question rather than only the reason.

If you want, I can wire a minimal prompt that turns the relevancy reason into a clarifying question.

690-692: Make streaming payloads consistent: include final_response flags in confirm flow

Downstream consumers rely on final_response. Add False for mid-stream steps and True for terminal events.

-                step = {"type": "reasoning_step",
-                       "message": "Step 2: Executing confirmed SQL query"}
+                step = {"type": "reasoning_step",
+                        "final_response": False,
+                        "message": "Step 2: Executing confirmed SQL query"}
@@
-                yield json.dumps(
-                    {
-                        "type": "query_result",
-                        "data": query_results,
-                    }
-                ) + MESSAGE_DELIMITER
+                yield json.dumps(
+                    {
+                        "type": "query_result",
+                        "data": query_results,
+                        "final_response": False
+                    }
+                ) + MESSAGE_DELIMITER
@@
-                    step = {"type": "reasoning_step",
-                           "message": "Step 3: Schema change detected - refreshing graph..."}
+                    step = {"type": "reasoning_step",
+                            "final_response": False,
+                            "message": "Step 3: Schema change detected - refreshing graph..."}
@@
-                        yield json.dumps(
-                            {
-                                "type": "schema_refresh",
-                                "message": (f"✅ Schema change detected ({operation_type} "
-                                          "operation)\n\n🔄 Graph schema has been automatically "
-                                          "refreshed with the latest database structure."),
-                                "refresh_status": "success"
-                            }
-                        ) + MESSAGE_DELIMITER
+                        yield json.dumps(
+                            {
+                                "type": "schema_refresh",
+                                "message": (f"✅ Schema change detected ({operation_type} "
+                                            "operation)\n\n🔄 Graph schema has been automatically "
+                                            "refreshed with the latest database structure."),
+                                "refresh_status": "success",
+                                "final_response": False
+                            }
+                        ) + MESSAGE_DELIMITER
@@
-                        yield json.dumps(
-                            {
-                                "type": "schema_refresh",
-                                "message": (f"⚠️ Schema was modified but graph refresh failed: "
-                                          f"{refresh_message}"),
-                                "refresh_status": "failed"
-                            }
-                        ) + MESSAGE_DELIMITER
+                        yield json.dumps(
+                            {
+                                "type": "schema_refresh",
+                                "message": (f"⚠️ Schema was modified but graph refresh failed: "
+                                            f"{refresh_message}"),
+                                "refresh_status": "failed",
+                                "final_response": False
+                            }
+                        ) + MESSAGE_DELIMITER
@@
-                step = {"type": "reasoning_step",
-                       "message": f"Step {step_num}: Generating user-friendly response"}
+                step = {"type": "reasoning_step",
+                        "final_response": False,
+                        "message": f"Step {step_num}: Generating user-friendly response"}
@@
-                yield json.dumps(
-                    {
-                        "type": "ai_response",
-                        "message": user_readable_response,
-                    }
-                ) + MESSAGE_DELIMITER
+                yield json.dumps(
+                    {
+                        "type": "ai_response",
+                        "message": user_readable_response,
+                        "final_response": True
+                    }
+                ) + MESSAGE_DELIMITER
@@
-                yield json.dumps(
-                    {"type": "error", "message": "Error executing query"}
-                ) + MESSAGE_DELIMITER
+                yield json.dumps(
+                    {"type": "error", "message": "Error executing query", "final_response": True}
+                ) + MESSAGE_DELIMITER
@@
-            yield json.dumps(
-                {
-                    "type": "operation_cancelled",
-                    "message": "Operation cancelled. The destructive SQL query was not executed."
-                }
-            ) + MESSAGE_DELIMITER
+            yield json.dumps(
+                {
+                    "type": "operation_cancelled",
+                    "message": "Operation cancelled. The destructive SQL query was not executed.",
+                    "final_response": True
+                }
+            ) + MESSAGE_DELIMITER

Also applies to: 699-705, 708-711, 717-725, 727-734, 738-740, 750-755, 789-791, 793-798

772-772: Log exceptions with stack traces in confirm flow

Use logging.exception in except paths to aid debugging and satisfy Ruff TRY400.

-                logging.error("Error executing confirmed SQL query: %s", str(e))
+                logging.exception("Error executing confirmed SQL query")
@@
-                    lambda t: logging.error(f"Failed confirmed query memory save failed: {t.exception()}") 
+                    lambda t: logging.error("Failed confirmed query memory save failed: %s", str(t.exception()))

Also applies to: 784-784

673-675: Initialize MemoryTool lazily in confirm flow

Create the MemoryTool only when the operation is confirmed to avoid unnecessary I/O on cancelled requests.

-        # Create memory tool for saving query results
-        memory_tool = await MemoryTool.create(request.state.user_id, graph_id)
-        
+        # Lazily create memory tool only when needed
@@
-        if confirmation == "CONFIRM":
+        if confirmation == "CONFIRM":
+            memory_tool = await MemoryTool.create(request.state.user_id, graph_id)

Also applies to: 676-676

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ef2ba17 and 050a9c2.

📒 Files selected for processing (1)

api/routes/graphs.py (17 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Adhere to pylint standards across all Python files (repository uses make lint)

Files:

api/routes/graphs.py

🧬 Code graph analysis (1)

api/routes/graphs.py (5)

api/agents/analysis_agent.py (2)

AnalysisAgent (9-241)

get_analysis (14-50)

api/agents/follow_up_agent.py (2)

FollowUpAgent (33-82)

generate_follow_up_question (36-82)

api/auth/user_management.py (1)

token_required (254-294)

api/config.py (1)

Config (47-130)

api/memory/graphiti_tool.py (6)

MemoryTool (29-588)

create (47-56)

search_memories (419-492)

save_query_memory (166-253)

add_new_memory (132-164)

clean_memory (494-517)

🪛 Ruff (0.12.2)

api/routes/graphs.py

561-561: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

771-771: Do not catch blind exception: Exception

(BLE001)

772-772: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

784-784: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: e2e-tests
GitHub Check: e2e-tests

🔇 Additional comments (2)

api/routes/graphs.py (2)

297-307: LGTM: short-term history truncation is correct

The slicing keeps chat/results aligned (N vs. N-1).

324-325: LGTM: consistent streaming final_response flags in main flow

Mid-stream steps are False; terminal responses are True. This prevents premature stream termination.

Also applies to: 340-340, 366-366, 460-461, 486-487, 493-496, 511-512, 522-523, 531-532, 546-546

coderabbitai · 2025-08-31T17:03:21Z

api/routes/graphs.py

+    memory_tool_task = asyncio.create_task(MemoryTool.create(request.state.user_id, graph_id))
+


🛠️ Refactor suggestion

Avoid spawning an unused MemoryTool task on off-topic queries

Creating MemoryTool up-front wastes work and can leak a background task when the query is off-topic. Instantiate it only for on-topic flows.

- memory_tool_task = asyncio.create_task(MemoryTool.create(request.state.user_id, graph_id)) @@ - memory_tool = await memory_tool_task + memory_tool = await MemoryTool.create(request.state.user_id, graph_id) memory_context = await memory_tool.search_memories( query=queries_history[-1] )

Also applies to: 381-384

🤖 Prompt for AI Agents

In api/routes/graphs.py around lines 310-311 (and similarly at 381-384), the code unconditionally spawns MemoryTool.create as a background asyncio task which wastes work and can leak when the query is off-topic; change the flow to avoid creating the MemoryTool task up-front and instead instantiate or create the MemoryTool only after you determine the query is on-topic (e.g., await or call MemoryTool.create inside the branch that handles on-topic queries), removing or deferring asyncio.create_task usage so no background task is spawned for off-topic requests.

galshubeli added 10 commits August 25, 2025 12:29

init

41bddae

init

4b7fb41

update-memory-usage

5efc1cb

add-memory-queries

fcccb66

add-mem-graphs

1d53b0c

Merge branch 'staging' into memory-component

2fdc773

update-agents

ae88db7

finlize-pipeline

d04c620

memory

c37dd14

clean

4dd5c31

galshubeli requested review from Copilot and gkorland August 28, 2025 15:03

Merge branch 'main' into memory-component

a872c5c

Copilot AI reviewed Aug 28, 2025

View reviewed changes

api/memory/graphiti_tool.py Outdated Show resolved Hide resolved

api/routes/graphs.py Outdated Show resolved Hide resolved

api/routes/graphs.py Show resolved Hide resolved

api/routes/graphs.py Outdated Show resolved Hide resolved

api/agents/follow_up_agent.py Outdated Show resolved Hide resolved

galshubeli added 2 commits August 28, 2025 18:07

cleans

c3f4d14

copilot-comments

a76f547

coderabbitai bot reviewed Aug 28, 2025

View reviewed changes

fix-clean-func

c9f75c9

coderabbitai bot reviewed Aug 31, 2025

View reviewed changes

galshubeli and others added 3 commits August 31, 2025 18:27

Update api/memory/graphiti_tool.py

16e2f1b

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

cop-comments

4a341c5

Merge branch 'main' into memory-component

70340ba

coderabbitai bot reviewed Aug 31, 2025

View reviewed changes

api/routes/graphs.py Outdated Show resolved Hide resolved

Update api/routes/graphs.py

ef2ba17

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Aug 31, 2025

View reviewed changes

api/routes/graphs.py Show resolved Hide resolved

Fix duplicate assignment in graph schema modification

050a9c2

coderabbitai bot reviewed Aug 31, 2025

View reviewed changes

gkorland merged commit c0bbe12 into main Aug 31, 2025
9 of 11 checks passed

gkorland deleted the memory-component branch August 31, 2025 17:04

This was referenced Sep 3, 2025

Fix memory saving #196

Closed

Staging #241

Merged

		memory_tool_task = asyncio.create_task(MemoryTool.create(request.state.user_id, graph_id))

Add Memory component #150

Add Memory component #150

Uh oh!

Conversation

galshubeli commented Aug 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

github-actions bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

License Issues

Pipfile.lock

OpenSSF Scorecard

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

galshubeli commented Aug 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

github-actions bot commented Aug 28, 2025 •

edited

Loading