Skip to content

Conversation

@nuwangeek
Copy link
Collaborator

No description provided.

nuwangeek and others added 30 commits September 10, 2025 21:21
Get update from RAG-6 to RAG-7
get update from RAG-6 into RAG-7
Update wip-temp from RAG-7
Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip
LLM connection creation changes (#108)
Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip
Disable Re ranker and Add LLM cost tracking  (buerokratt#112)
Get update from buerokratt/RAG-Module into rootcodelabs/RAG-Module
merge buerokratt-wip after fixing merge conflicts
* partialy completes prompt refiner

* integrate prompt refiner with llm_config_module

* fixed ruff lint issues

* complete prompt refiner, chunk retriver and reranker

* remove unnesessary comments

* updated .gitignore

* Remove data_sets from tracking

* update .gitignore file

* complete vault setup and response generator

* remove ignore comment

* removed old modules

* fixed merge conflicts

* added initial setup for the vector indexer

* initial llm orchestration service update with context generation

* added new endpoints

* vector indexer with contextual retrieval

* fixed requested changes

* fixed issue

* initial diff identifier setup

* uncommment docker compose file

* added test endpoint for orchestrate service

* fixed ruff linting issue

* Rag 103 budget related schema changes (#41)

* Refactor llm_connections table: update budget tracking fields and reorder columns

* Add budget threshold fields and logic to LLM connection management

* Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections

* resolve pr comments & refactoring

* rename commonUtils

---------



* Rag 93 update connection status (#47)

* Refactor llm_connections table: update budget tracking fields and reorder columns

* Add budget threshold fields and logic to LLM connection management

* Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections

* resolve pr comments & refactoring

* rename commonUtils

* Implement LLM connection status update functionality with API integration and UI enhancements

---------



* Rag 99 production llm connections logic (#46)

* Refactor llm_connections table: update budget tracking fields and reorder columns

* Add budget threshold fields and logic to LLM connection management

* Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections

* resolve pr comments & refactoring

* rename commonUtils

* Add production connection retrieval and update related components

* Implement LLM connection environment update and enhance connection management logic

---------



* Rag 119 endpoint to update used budget (#42)

* Refactor llm_connections table: update budget tracking fields and reorder columns

* Add budget threshold fields and logic to LLM connection management

* Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections

* resolve pr comments & refactoring

* Add functionality to update used budget for LLM connections with validation and response handling

* Implement budget threshold checks and connection deactivation logic in update process

* resolve pr comments

---------



* Rag 113 warning and termination banners (#43)

* Refactor llm_connections table: update budget tracking fields and reorder columns

* Add budget threshold fields and logic to LLM connection management

* Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections

* resolve pr comments & refactoring

* Add budget status check and update BudgetBanner component

* rename commonUtils

* resove pr comments

---------



* rag-105-reset-used-budget-cron-job (#44)

* Refactor llm_connections table: update budget tracking fields and reorder columns

* Add budget threshold fields and logic to LLM connection management

* Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections

* resolve pr comments & refactoring

* Add cron job to reset used budget

* rename commonUtils

* resolve pr comments

* Remove trailing slash from vault/agent-out in .gitignore

---------



* Rag 101 budget check functionality (#45)

* Refactor llm_connections table: update budget tracking fields and reorder columns

* Add budget threshold fields and logic to LLM connection management

* Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections

* resolve pr comments & refactoring

* rename commonUtils

* budget check functionality

---------



* gui running on 3003 issue fixed

* gui running on 3003 issue fixed (#50)



* added get-configuration.sqpl and updated llmconnections.ts

* Add SQL query to retrieve configuration values

* Hashicorp key saving (#51)

* gui running on 3003 issue fixed

* Add SQL query to retrieve configuration values

---------



* Remove REACT_APP_NOTIFICATION_NODE_URL variable

Removed REACT_APP_NOTIFICATION_NODE_URL environment variable.

* added initil diff identifier functionality

* test phase1

* Refactor inference and connection handling in YAML and TypeScript files

* fixes (#52)

* gui running on 3003 issue fixed

* Add SQL query to retrieve configuration values

* Refactor inference and connection handling in YAML and TypeScript files

---------



* Add entry point script for Vector Indexer with command line interface

* fix (#53)

* gui running on 3003 issue fixed

* Add SQL query to retrieve configuration values

* Refactor inference and connection handling in YAML and TypeScript files

* Add entry point script for Vector Indexer with command line interface

---------



* diff fixes

* uncomment llm orchestration service in docker compose file

* complete vector indexer

* Add YAML configurations and scripts for managing vault secrets

* Add vault secret management functions and endpoints for LLM connections

* Add Test Production LLM page with messaging functionality and styles

* fixed issue

* fixed merge conflicts

* fixed issue

* fixed issue

* updated with requested chnages

* fixed test ui endpoint request responses schema issue

* fixed dvc path issue

* added dspy optimization

* filters fixed

* refactor: restructure llm_connections table for improved configuration and tracking

* feat: enhance LLM connection handling with AWS and Azure embedding credentials

* fixed issues

* refactor: remove redundant Azure and AWS credential assignments in vault secret functions

* fixed issue

* intial vault setup script

* complete vault authentication handling

* review requested change fix

* fixed issues according to the pr review

* fixed issues in docker compose file relevent to pr review

---------

Co-authored-by: Charith Nuwan Bimsara <59943919+nuwangeek@users.noreply.github.com>
Co-authored-by: erangi-ar <erangika.ariyasena@rootcode.io>
Vault Authentication token handling (buerokratt#154)
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive production inference data storage and budget tracking capabilities to the LLM orchestration service. The changes enable storing detailed inference analytics (questions, chunks, embeddings, answers) for production and testing environments, while also implementing automatic budget tracking for LLM connections based on actual usage costs.

Key Changes:

  • Added three new utility modules for production inference storage, connection ID fetching, and budget tracking
  • Integrated budget updates throughout the orchestration service at all exit points (success, error, streaming, non-streaming)
  • Added document reference extraction and delivery in both streaming (as SSE events) and non-streaming (appended to content) modes

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/utils/production_store.py New utility for storing production inference data to Resql with sync/async support
src/utils/connection_id_fetcher.py New utility for fetching and caching LLM connection IDs by environment
src/utils/budget_tracker.py New utility for tracking and updating LLM connection budget usage
src/models/request_models.py Added DocumentReference model for structured document URL references
src/llm_orchestrator_config/llm_ochestrator_constants.py Added service endpoint constants and fixed technical issue message formatting
src/llm_orchestration_service.py Integrated budget tracking, production storage, and document references throughout orchestration flows
Dockerfile.llm_orchestration_service Minor whitespace formatting fix
DSL/Resql/rag-search/POST/store-production-inference-result.sql Updated SQL to include llm_connection_id field
DSL/Resql/rag-search/POST/get-testing-connection.sql New SQL query for fetching testing environment connection details
Comments suppressed due to low confidence (1)

src/llm_orchestration_service.py:1068

  • This import of module asyncio is redundant, as it was previously imported on line 7.
        import asyncio

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

log_step_timings(timing_dict, request.chatId)

# Update budget even on client disconnect
self._update_connection_budget(request.connection_id, costs_dict)
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing environment parameter in _update_connection_budget call. The method signature requires three parameters (connection_id, costs_dict, environment), but only two are provided. This will cause the method to use the default value "development" instead of the actual request environment, potentially updating the wrong budget.

Suggested change
self._update_connection_budget(request.connection_id, costs_dict)
self._update_connection_budget(request.connection_id, costs_dict, request.environment)

Copilot uses AI. Check for mistakes.
self.timeout = 5 # seconds

# Cache connection IDs to avoid repeated requests
self._connection_cache: Dict[str, int] = {}
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _connection_cache dictionary is not thread-safe. In a multi-threaded environment (like a web server), concurrent access to this cache could lead to race conditions. Consider using threading.Lock or a thread-safe data structure like collections.defaultdict with proper locking.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already fixed

Production Inference Data Storage Utility
This module provides functionality to store production inference results
to the Ruuter endpoint for analytics and monitoring purposes.
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent documentation: The module docstring says "to the Ruuter endpoint" but the actual implementation uses Resql directly (line 26: RAG_SEARCH_RESQL). The class docstring (line 20) correctly says "via Resql endpoint". Consider updating this to match the implementation and say "to the Resql endpoint" for consistency.

Suggested change
to the Ruuter endpoint for analytics and monitoring purposes.
to the Resql endpoint for analytics and monitoring purposes.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also fixed

accumulated_response: str,
) -> None:
"""
Async version: Store production inference data to Ruuter endpoint for analytics.
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says "Store production inference data to Ruuter endpoint" but the implementation uses Resql directly. Consider updating to say "to Resql endpoint" for accuracy.

Suggested change
Async version: Store production inference data to Ruuter endpoint for analytics.
Async version: Store production inference data to Resql endpoint for analytics.

Copilot uses AI. Check for mistakes.
)
try:
# Use synchronous fetch to avoid event loop issues
production_id = budget_tracker.fetch_production_connection_id_sync()
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method fetch_production_connection_id_sync() is called on budget_tracker, but this method doesn't exist in the BudgetTracker class. The BudgetTracker uses connection_fetcher.fetch_connection_id_sync(). This should be budget_tracker.connection_fetcher.fetch_connection_id_sync("production") or use the connection_fetcher directly via get_connection_id_fetcher().

Suggested change
production_id = budget_tracker.fetch_production_connection_id_sync()
production_id = budget_tracker.connection_fetcher.fetch_connection_id_sync("production")

Copilot uses AI. Check for mistakes.
"""LLM Orchestration Service - Business logic for LLM orchestration."""

from typing import Optional, List, Dict, Union, Any, AsyncIterator
import json
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json module is imported twice - once on line 4 and again as json_module on line 12. The first import appears to be unused and should be removed to avoid confusion.

Suggested change
import json

Copilot uses AI. Check for mistakes.
question=refined_output.original_question,
chunks=relevant_chunks,
max_blocks=ResponseGenerationConstants.DEFAULT_MAX_BLOCKS,
max_blocks=10,
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The value 10 is hardcoded in multiple places (lines 362, 401, 2077) instead of using a constant. Consider defining a constant like MAX_CONTEXT_BLOCKS = 10 to improve maintainability and make it easier to adjust this value in the future. Previously this used ResponseGenerationConstants.DEFAULT_MAX_BLOCKS which was 5, so this is also a functional change that increases the context window from 5 to 10 blocks.

Copilot uses AI. Check for mistakes.
final_response: OrchestrationResponse,
) -> None:
"""
Store production inference data to Ruuter endpoint for analytics.
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says "Store production inference data to Ruuter endpoint" but the implementation uses Resql directly (via get_production_store() which uses RAG_SEARCH_RESQL). Consider updating to say "to Resql endpoint" for accuracy.

Suggested change
Store production inference data to Ruuter endpoint for analytics.
Store production inference data to Resql endpoint for analytics.

Copilot uses AI. Check for mistakes.
import json
import os
import time
import asyncio
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'asyncio' is not used.

Suggested change
import asyncio

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also fixed

@Thirunayan22 Thirunayan22 merged commit 58b23a1 into buerokratt:wip Dec 1, 2025
5 of 9 checks passed
nuwangeek added a commit to rootcodelabs/RAG-Module that referenced this pull request Dec 1, 2025
Update inference results and and budget (buerokratt#181)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants