-
Notifications
You must be signed in to change notification settings - Fork 4
Update inference results and and budget #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Get update from RAG-6 to RAG-7
get update from RAG-6 into RAG-7
Update wip-temp from RAG-7
Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip
Fixed merge conflicts
LLM connection creation changes (#108)
Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip
Disable Re ranker and Add LLM cost tracking (buerokratt#112)
Get update from buerokratt/RAG-Module into rootcodelabs/RAG-Module
merge buerokratt-wip after fixing merge conflicts
Pre demo fixes (buerokratt#147)
Langfuse wip (buerokratt#148)
* partialy completes prompt refiner * integrate prompt refiner with llm_config_module * fixed ruff lint issues * complete prompt refiner, chunk retriver and reranker * remove unnesessary comments * updated .gitignore * Remove data_sets from tracking * update .gitignore file * complete vault setup and response generator * remove ignore comment * removed old modules * fixed merge conflicts * added initial setup for the vector indexer * initial llm orchestration service update with context generation * added new endpoints * vector indexer with contextual retrieval * fixed requested changes * fixed issue * initial diff identifier setup * uncommment docker compose file * added test endpoint for orchestrate service * fixed ruff linting issue * Rag 103 budget related schema changes (#41) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * rename commonUtils --------- * Rag 93 update connection status (#47) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * rename commonUtils * Implement LLM connection status update functionality with API integration and UI enhancements --------- * Rag 99 production llm connections logic (#46) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * rename commonUtils * Add production connection retrieval and update related components * Implement LLM connection environment update and enhance connection management logic --------- * Rag 119 endpoint to update used budget (#42) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * Add functionality to update used budget for LLM connections with validation and response handling * Implement budget threshold checks and connection deactivation logic in update process * resolve pr comments --------- * Rag 113 warning and termination banners (#43) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * Add budget status check and update BudgetBanner component * rename commonUtils * resove pr comments --------- * rag-105-reset-used-budget-cron-job (#44) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * Add cron job to reset used budget * rename commonUtils * resolve pr comments * Remove trailing slash from vault/agent-out in .gitignore --------- * Rag 101 budget check functionality (#45) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * rename commonUtils * budget check functionality --------- * gui running on 3003 issue fixed * gui running on 3003 issue fixed (#50) * added get-configuration.sqpl and updated llmconnections.ts * Add SQL query to retrieve configuration values * Hashicorp key saving (#51) * gui running on 3003 issue fixed * Add SQL query to retrieve configuration values --------- * Remove REACT_APP_NOTIFICATION_NODE_URL variable Removed REACT_APP_NOTIFICATION_NODE_URL environment variable. * added initil diff identifier functionality * test phase1 * Refactor inference and connection handling in YAML and TypeScript files * fixes (#52) * gui running on 3003 issue fixed * Add SQL query to retrieve configuration values * Refactor inference and connection handling in YAML and TypeScript files --------- * Add entry point script for Vector Indexer with command line interface * fix (#53) * gui running on 3003 issue fixed * Add SQL query to retrieve configuration values * Refactor inference and connection handling in YAML and TypeScript files * Add entry point script for Vector Indexer with command line interface --------- * diff fixes * uncomment llm orchestration service in docker compose file * complete vector indexer * Add YAML configurations and scripts for managing vault secrets * Add vault secret management functions and endpoints for LLM connections * Add Test Production LLM page with messaging functionality and styles * fixed issue * fixed merge conflicts * fixed issue * fixed issue * updated with requested chnages * fixed test ui endpoint request responses schema issue * fixed dvc path issue * added dspy optimization * filters fixed * refactor: restructure llm_connections table for improved configuration and tracking * feat: enhance LLM connection handling with AWS and Azure embedding credentials * fixed issues * refactor: remove redundant Azure and AWS credential assignments in vault secret functions * fixed issue * intial vault setup script * complete vault authentication handling * review requested change fix * fixed issues according to the pr review * fixed issues in docker compose file relevent to pr review --------- Co-authored-by: Charith Nuwan Bimsara <59943919+nuwangeek@users.noreply.github.com> Co-authored-by: erangi-ar <erangika.ariyasena@rootcode.io>
Vault Authentication token handling (buerokratt#154)
Added English keys and Estonian translations (buerokratt#156)
Bug fixes (buerokratt#157)
Sync rootcodelabs/RAG-Module wip with buerokratt/RAG-Module wip
Sync wip branches
Chunk retrieval quality enhancement (buerokratt#172)
Rag 149- Show chunk context in Test LLM Connection Page (buerokratt#173)
QA bug fixes (buerokratt#174)
Sync wip branches
Get update from wip
…RAG-Module into RAG-test-bug-fixes Get update from remote branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive production inference data storage and budget tracking capabilities to the LLM orchestration service. The changes enable storing detailed inference analytics (questions, chunks, embeddings, answers) for production and testing environments, while also implementing automatic budget tracking for LLM connections based on actual usage costs.
Key Changes:
- Added three new utility modules for production inference storage, connection ID fetching, and budget tracking
- Integrated budget updates throughout the orchestration service at all exit points (success, error, streaming, non-streaming)
- Added document reference extraction and delivery in both streaming (as SSE events) and non-streaming (appended to content) modes
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/utils/production_store.py |
New utility for storing production inference data to Resql with sync/async support |
src/utils/connection_id_fetcher.py |
New utility for fetching and caching LLM connection IDs by environment |
src/utils/budget_tracker.py |
New utility for tracking and updating LLM connection budget usage |
src/models/request_models.py |
Added DocumentReference model for structured document URL references |
src/llm_orchestrator_config/llm_ochestrator_constants.py |
Added service endpoint constants and fixed technical issue message formatting |
src/llm_orchestration_service.py |
Integrated budget tracking, production storage, and document references throughout orchestration flows |
Dockerfile.llm_orchestration_service |
Minor whitespace formatting fix |
DSL/Resql/rag-search/POST/store-production-inference-result.sql |
Updated SQL to include llm_connection_id field |
DSL/Resql/rag-search/POST/get-testing-connection.sql |
New SQL query for fetching testing environment connection details |
Comments suppressed due to low confidence (1)
src/llm_orchestration_service.py:1068
- This import of module asyncio is redundant, as it was previously imported on line 7.
import asyncio
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/llm_orchestration_service.py
Outdated
| log_step_timings(timing_dict, request.chatId) | ||
|
|
||
| # Update budget even on client disconnect | ||
| self._update_connection_budget(request.connection_id, costs_dict) |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing environment parameter in _update_connection_budget call. The method signature requires three parameters (connection_id, costs_dict, environment), but only two are provided. This will cause the method to use the default value "development" instead of the actual request environment, potentially updating the wrong budget.
| self._update_connection_budget(request.connection_id, costs_dict) | |
| self._update_connection_budget(request.connection_id, costs_dict, request.environment) |
| self.timeout = 5 # seconds | ||
|
|
||
| # Cache connection IDs to avoid repeated requests | ||
| self._connection_cache: Dict[str, int] = {} |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _connection_cache dictionary is not thread-safe. In a multi-threaded environment (like a web server), concurrent access to this cache could lead to race conditions. Consider using threading.Lock or a thread-safe data structure like collections.defaultdict with proper locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already fixed
| Production Inference Data Storage Utility | ||
| This module provides functionality to store production inference results | ||
| to the Ruuter endpoint for analytics and monitoring purposes. |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent documentation: The module docstring says "to the Ruuter endpoint" but the actual implementation uses Resql directly (line 26: RAG_SEARCH_RESQL). The class docstring (line 20) correctly says "via Resql endpoint". Consider updating this to match the implementation and say "to the Resql endpoint" for consistency.
| to the Ruuter endpoint for analytics and monitoring purposes. | |
| to the Resql endpoint for analytics and monitoring purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also fixed
src/llm_orchestration_service.py
Outdated
| accumulated_response: str, | ||
| ) -> None: | ||
| """ | ||
| Async version: Store production inference data to Ruuter endpoint for analytics. |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring says "Store production inference data to Ruuter endpoint" but the implementation uses Resql directly. Consider updating to say "to Resql endpoint" for accuracy.
| Async version: Store production inference data to Ruuter endpoint for analytics. | |
| Async version: Store production inference data to Resql endpoint for analytics. |
src/llm_orchestration_service.py
Outdated
| ) | ||
| try: | ||
| # Use synchronous fetch to avoid event loop issues | ||
| production_id = budget_tracker.fetch_production_connection_id_sync() |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method fetch_production_connection_id_sync() is called on budget_tracker, but this method doesn't exist in the BudgetTracker class. The BudgetTracker uses connection_fetcher.fetch_connection_id_sync(). This should be budget_tracker.connection_fetcher.fetch_connection_id_sync("production") or use the connection_fetcher directly via get_connection_id_fetcher().
| production_id = budget_tracker.fetch_production_connection_id_sync() | |
| production_id = budget_tracker.connection_fetcher.fetch_connection_id_sync("production") |
src/llm_orchestration_service.py
Outdated
| """LLM Orchestration Service - Business logic for LLM orchestration.""" | ||
|
|
||
| from typing import Optional, List, Dict, Union, Any, AsyncIterator | ||
| import json |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The json module is imported twice - once on line 4 and again as json_module on line 12. The first import appears to be unused and should be removed to avoid confusion.
| import json |
src/llm_orchestration_service.py
Outdated
| question=refined_output.original_question, | ||
| chunks=relevant_chunks, | ||
| max_blocks=ResponseGenerationConstants.DEFAULT_MAX_BLOCKS, | ||
| max_blocks=10, |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The value 10 is hardcoded in multiple places (lines 362, 401, 2077) instead of using a constant. Consider defining a constant like MAX_CONTEXT_BLOCKS = 10 to improve maintainability and make it easier to adjust this value in the future. Previously this used ResponseGenerationConstants.DEFAULT_MAX_BLOCKS which was 5, so this is also a functional change that increases the context window from 5 to 10 blocks.
src/llm_orchestration_service.py
Outdated
| final_response: OrchestrationResponse, | ||
| ) -> None: | ||
| """ | ||
| Store production inference data to Ruuter endpoint for analytics. |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring says "Store production inference data to Ruuter endpoint" but the implementation uses Resql directly (via get_production_store() which uses RAG_SEARCH_RESQL). Consider updating to say "to Resql endpoint" for accuracy.
| Store production inference data to Ruuter endpoint for analytics. | |
| Store production inference data to Resql endpoint for analytics. |
| import json | ||
| import os | ||
| import time | ||
| import asyncio |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'asyncio' is not used.
| import asyncio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also fixed
Update inference results and and budget (buerokratt#181)
No description provided.