Update inference results and and budget #181

nuwangeek · 2025-12-01T05:41:26Z

No description provided.

Get update from RAG-6 to RAG-7

get update from RAG-6 into RAG-7

Update wip-temp from RAG-7

Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip

Wip

Fixed merge conflicts

LLM connection creation changes (#108)

Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip

Disable Re ranker and Add LLM cost tracking (buerokratt#112)

Get update from buerokratt/RAG-Module into rootcodelabs/RAG-Module

merge buerokratt-wip after fixing merge conflicts

Pre demo fixes (buerokratt#147)

Langfuse wip (buerokratt#148)

* partialy completes prompt refiner * integrate prompt refiner with llm_config_module * fixed ruff lint issues * complete prompt refiner, chunk retriver and reranker * remove unnesessary comments * updated .gitignore * Remove data_sets from tracking * update .gitignore file * complete vault setup and response generator * remove ignore comment * removed old modules * fixed merge conflicts * added initial setup for the vector indexer * initial llm orchestration service update with context generation * added new endpoints * vector indexer with contextual retrieval * fixed requested changes * fixed issue * initial diff identifier setup * uncommment docker compose file * added test endpoint for orchestrate service * fixed ruff linting issue * Rag 103 budget related schema changes (#41) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * rename commonUtils --------- * Rag 93 update connection status (#47) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * rename commonUtils * Implement LLM connection status update functionality with API integration and UI enhancements --------- * Rag 99 production llm connections logic (#46) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * rename commonUtils * Add production connection retrieval and update related components * Implement LLM connection environment update and enhance connection management logic --------- * Rag 119 endpoint to update used budget (#42) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * Add functionality to update used budget for LLM connections with validation and response handling * Implement budget threshold checks and connection deactivation logic in update process * resolve pr comments --------- * Rag 113 warning and termination banners (#43) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * Add budget status check and update BudgetBanner component * rename commonUtils * resove pr comments --------- * rag-105-reset-used-budget-cron-job (#44) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * Add cron job to reset used budget * rename commonUtils * resolve pr comments * Remove trailing slash from vault/agent-out in .gitignore --------- * Rag 101 budget check functionality (#45) * Refactor llm_connections table: update budget tracking fields and reorder columns * Add budget threshold fields and logic to LLM connection management * Enhance budget management: update budget status logic, adjust thresholds, and improve form handling for LLM connections * resolve pr comments & refactoring * rename commonUtils * budget check functionality --------- * gui running on 3003 issue fixed * gui running on 3003 issue fixed (#50) * added get-configuration.sqpl and updated llmconnections.ts * Add SQL query to retrieve configuration values * Hashicorp key saving (#51) * gui running on 3003 issue fixed * Add SQL query to retrieve configuration values --------- * Remove REACT_APP_NOTIFICATION_NODE_URL variable Removed REACT_APP_NOTIFICATION_NODE_URL environment variable. * added initil diff identifier functionality * test phase1 * Refactor inference and connection handling in YAML and TypeScript files * fixes (#52) * gui running on 3003 issue fixed * Add SQL query to retrieve configuration values * Refactor inference and connection handling in YAML and TypeScript files --------- * Add entry point script for Vector Indexer with command line interface * fix (#53) * gui running on 3003 issue fixed * Add SQL query to retrieve configuration values * Refactor inference and connection handling in YAML and TypeScript files * Add entry point script for Vector Indexer with command line interface --------- * diff fixes * uncomment llm orchestration service in docker compose file * complete vector indexer * Add YAML configurations and scripts for managing vault secrets * Add vault secret management functions and endpoints for LLM connections * Add Test Production LLM page with messaging functionality and styles * fixed issue * fixed merge conflicts * fixed issue * fixed issue * updated with requested chnages * fixed test ui endpoint request responses schema issue * fixed dvc path issue * added dspy optimization * filters fixed * refactor: restructure llm_connections table for improved configuration and tracking * feat: enhance LLM connection handling with AWS and Azure embedding credentials * fixed issues * refactor: remove redundant Azure and AWS credential assignments in vault secret functions * fixed issue * intial vault setup script * complete vault authentication handling * review requested change fix * fixed issues according to the pr review * fixed issues in docker compose file relevent to pr review --------- Co-authored-by: Charith Nuwan Bimsara <59943919+nuwangeek@users.noreply.github.com> Co-authored-by: erangi-ar <erangika.ariyasena@rootcode.io>

Vault Authentication token handling (buerokratt#154)

Added English keys and Estonian translations (buerokratt#156)

Bug fixes (buerokratt#157)

Sync rootcodelabs/RAG-Module wip with buerokratt/RAG-Module wip

Sync wip branches

Chunk retrieval quality enhancement (buerokratt#172)

Rag 149- Show chunk context in Test LLM Connection Page (buerokratt#173)

QA bug fixes (buerokratt#174)

Sync wip branches

Get update from wip

…RAG-Module into RAG-test-bug-fixes Get update from remote branch

Copilot

Pull request overview

This PR adds comprehensive production inference data storage and budget tracking capabilities to the LLM orchestration service. The changes enable storing detailed inference analytics (questions, chunks, embeddings, answers) for production and testing environments, while also implementing automatic budget tracking for LLM connections based on actual usage costs.

Key Changes:

Added three new utility modules for production inference storage, connection ID fetching, and budget tracking
Integrated budget updates throughout the orchestration service at all exit points (success, error, streaming, non-streaming)
Added document reference extraction and delivery in both streaming (as SSE events) and non-streaming (appended to content) modes

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`src/utils/production_store.py`	New utility for storing production inference data to Resql with sync/async support
`src/utils/connection_id_fetcher.py`	New utility for fetching and caching LLM connection IDs by environment
`src/utils/budget_tracker.py`	New utility for tracking and updating LLM connection budget usage
`src/models/request_models.py`	Added `DocumentReference` model for structured document URL references
`src/llm_orchestrator_config/llm_ochestrator_constants.py`	Added service endpoint constants and fixed technical issue message formatting
`src/llm_orchestration_service.py`	Integrated budget tracking, production storage, and document references throughout orchestration flows
`Dockerfile.llm_orchestration_service`	Minor whitespace formatting fix
`DSL/Resql/rag-search/POST/store-production-inference-result.sql`	Updated SQL to include `llm_connection_id` field
`DSL/Resql/rag-search/POST/get-testing-connection.sql`	New SQL query for fetching testing environment connection details

Comments suppressed due to low confidence (1)

src/llm_orchestration_service.py:1068

This import of module asyncio is redundant, as it was previously imported on line 7.

        import asyncio

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-01T05:48:26Z

src/llm_orchestration_service.py

                    log_step_timings(timing_dict, request.chatId)
+
+                    # Update budget even on client disconnect
+                    self._update_connection_budget(request.connection_id, costs_dict)


Missing environment parameter in _update_connection_budget call. The method signature requires three parameters (connection_id, costs_dict, environment), but only two are provided. This will cause the method to use the default value "development" instead of the actual request environment, potentially updating the wrong budget.

Suggested change

self._update_connection_budget(request.connection_id, costs_dict)

self._update_connection_budget(request.connection_id, costs_dict, request.environment)

Copilot · 2025-12-01T05:48:26Z

src/utils/connection_id_fetcher.py

+        self.timeout = 5  # seconds
+
+        # Cache connection IDs to avoid repeated requests
+        self._connection_cache: Dict[str, int] = {}


The _connection_cache dictionary is not thread-safe. In a multi-threaded environment (like a web server), concurrent access to this cache could lead to race conditions. Consider using threading.Lock or a thread-safe data structure like collections.defaultdict with proper locking.

this is already fixed

Copilot · 2025-12-01T05:48:26Z

src/utils/production_store.py

+Production Inference Data Storage Utility
+
+This module provides functionality to store production inference results
+to the Ruuter endpoint for analytics and monitoring purposes.


Inconsistent documentation: The module docstring says "to the Ruuter endpoint" but the actual implementation uses Resql directly (line 26: RAG_SEARCH_RESQL). The class docstring (line 20) correctly says "via Resql endpoint". Consider updating this to match the implementation and say "to the Resql endpoint" for consistency.

Suggested change

to the Ruuter endpoint for analytics and monitoring purposes.

to the Resql endpoint for analytics and monitoring purposes.

this also fixed

Copilot · 2025-12-01T05:48:27Z

src/llm_orchestration_service.py

+        accumulated_response: str,
+    ) -> None:
+        """
+        Async version: Store production inference data to Ruuter endpoint for analytics.


The docstring says "Store production inference data to Ruuter endpoint" but the implementation uses Resql directly. Consider updating to say "to Resql endpoint" for accuracy.

Suggested change

Async version: Store production inference data to Ruuter endpoint for analytics.

Async version: Store production inference data to Resql endpoint for analytics.

Copilot · 2025-12-01T05:48:27Z

src/llm_orchestration_service.py

+                )
+                try:
+                    # Use synchronous fetch to avoid event loop issues
+                    production_id = budget_tracker.fetch_production_connection_id_sync()


The method fetch_production_connection_id_sync() is called on budget_tracker, but this method doesn't exist in the BudgetTracker class. The BudgetTracker uses connection_fetcher.fetch_connection_id_sync(). This should be budget_tracker.connection_fetcher.fetch_connection_id_sync("production") or use the connection_fetcher directly via get_connection_id_fetcher().

Suggested change

production_id = budget_tracker.fetch_production_connection_id_sync()

production_id = budget_tracker.connection_fetcher.fetch_connection_id_sync("production")

Copilot · 2025-12-01T05:48:27Z

src/llm_orchestration_service.py

 """LLM Orchestration Service - Business logic for LLM orchestration."""

 from typing import Optional, List, Dict, Union, Any, AsyncIterator
 import json


The json module is imported twice - once on line 4 and again as json_module on line 12. The first import appears to be unused and should be removed to avoid confusion.

Suggested change

import json

Copilot · 2025-12-01T05:48:28Z

src/llm_orchestration_service.py

                    question=refined_output.original_question,
                    chunks=relevant_chunks,
-                    max_blocks=ResponseGenerationConstants.DEFAULT_MAX_BLOCKS,
+                    max_blocks=10,


[nitpick] The value 10 is hardcoded in multiple places (lines 362, 401, 2077) instead of using a constant. Consider defining a constant like MAX_CONTEXT_BLOCKS = 10 to improve maintainability and make it easier to adjust this value in the future. Previously this used ResponseGenerationConstants.DEFAULT_MAX_BLOCKS which was 5, so this is also a functional change that increases the context window from 5 to 10 blocks.

Copilot · 2025-12-01T05:48:28Z

src/llm_orchestration_service.py

+        final_response: OrchestrationResponse,
+    ) -> None:
+        """
+        Store production inference data to Ruuter endpoint for analytics.


The docstring says "Store production inference data to Ruuter endpoint" but the implementation uses Resql directly (via get_production_store() which uses RAG_SEARCH_RESQL). Consider updating to say "to Resql endpoint" for accuracy.

Suggested change

Store production inference data to Ruuter endpoint for analytics.

Store production inference data to Resql endpoint for analytics.

Copilot · 2025-12-01T05:48:28Z

src/llm_orchestration_service.py

 import json
 import os
 import time
+import asyncio


Import of 'asyncio' is not used.

Suggested change

import asyncio

This is also fixed

Update inference results and and budget (buerokratt#181)

nuwangeek and others added 30 commits September 10, 2025 21:21

Merge pull request #17 from rootcodelabs/RAG-6

1b38b0d

Get update from RAG-6 to RAG-7

Merge pull request #18 from rootcodelabs/RAG-6

7eb51d2

get update from RAG-6 into RAG-7

partialy completes prompt refiner

5d1c9ce

integrate prompt refiner with llm_config_module

e7382d1

fixed ruff lint issues

a7a2303

Merge pull request #22 from rootcodelabs/RAG-7

61023b4

Update wip-temp from RAG-7

complete prompt refiner, chunk retriver and reranker

a7eeeeb

remove unnesessary comments

ef4630e

updated .gitignore

76515ce

Remove data_sets from tracking

1c059b8

update .gitignore file

f4ca76d

complete vault setup and response generator

a7d7d68

remove ignore comment

5589413

Merge pull request #25 from buerokratt/wip

369e9a0

Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip

Merge branch 'RAG-81' into wip

33915ea

Merge pull request #32 from buerokratt/wip

286b761

Wip

fixed merge conflicts

9cf32b3

Merge branch 'buerokratt-wip' into wip

4d282d3

Fixed merge conflicts

removed old modules

be08c23

fixed merge conflicts

2f39178

Merge pull request #35 from buerokratt/wip

1a8cb4e

LLM connection creation changes (#108)

Merge pull request #36 from buerokratt/wip

18138f2

Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip

Merge pull request #38 from buerokratt/wip

176ffd5

Disable Re ranker and Add LLM cost tracking (buerokratt#112)

Merge pull request #39 from buerokratt/wip

98d2dfd

Get update from buerokratt/RAG-Module into rootcodelabs/RAG-Module

fixed merge conflicts

0cef78c

Merge branch 'buerokratt-wip' into wip

7842184

merge buerokratt-wip after fixing merge conflicts

Merge pull request #65 from buerokratt/wip

4ac0dfd

Pre demo fixes (buerokratt#147)

Merge pull request #66 from buerokratt/wip

c46c3d9

Langfuse wip (buerokratt#148)

Merge pull request #71 from buerokratt/wip

84ce0bb

Vault Authentication token handling (buerokratt#154)

erangi-ar and others added 13 commits November 11, 2025 15:51

Merge pull request #73 from buerokratt/wip

742c6e5

Added English keys and Estonian translations (buerokratt#156)

Merge pull request #74 from buerokratt/wip

7a6f129

Bug fixes (buerokratt#157)

Merge pull request #76 from buerokratt/wip

c85b43a

Sync rootcodelabs/RAG-Module wip with buerokratt/RAG-Module wip

Merge pull request #80 from buerokratt/wip

c806f87

Sync wip branches

Merge pull request #86 from buerokratt/wip

ced54ab

Chunk retrieval quality enhancement (buerokratt#172)

Merge pull request #87 from buerokratt/wip

fa925ed

Rag 149- Show chunk context in Test LLM Connection Page (buerokratt#173)

Merge pull request #88 from buerokratt/wip

eb0e3b5

QA bug fixes (buerokratt#174)

fixed number chunk issue

df9c982

fixed ruff format issue

ed35dda

Merge pull request #90 from buerokratt/wip

8ce90c3

Sync wip branches

complete inference result update and budget updating

1cbf4d7

Merge pull request #91 from rootcodelabs/wip

269b5e6

Get update from wip

Merge branch 'RAG-test-bug-fixes' of https://github.com/rootcodelabs/…

6f6b55a

…RAG-Module into RAG-test-bug-fixes Get update from remote branch

nuwangeek requested a review from Copilot December 1, 2025 05:41

Copilot started reviewing on behalf of nuwangeek December 1, 2025 05:42 View session

Copilot finished reviewing on behalf of nuwangeek December 1, 2025 05:46

Copilot AI reviewed Dec 1, 2025

View reviewed changes

nuwangeek added 6 commits December 1, 2025 12:36

fixed issues

9ed348a

fixed ruff format issue

2400d24

fixed endpoint issue

721cabc

fixed format issues

8e734cd

fixed issue

1ca2f41

fixed issue

03fb7ae

nuwangeek requested a review from Thirunayan22 December 1, 2025 09:07

Thirunayan22 approved these changes Dec 1, 2025

View reviewed changes

Thirunayan22 merged commit 58b23a1 into buerokratt:wip Dec 1, 2025
5 of 9 checks passed

nuwangeek added a commit to rootcodelabs/RAG-Module that referenced this pull request Dec 1, 2025

Merge pull request #92 from buerokratt/wip

301a03a

Update inference results and and budget (buerokratt#181)

	self._update_connection_budget(request.connection_id, costs_dict)
	self._update_connection_budget(request.connection_id, costs_dict, request.environment)

	to the Ruuter endpoint for analytics and monitoring purposes.
	to the Resql endpoint for analytics and monitoring purposes.

	Async version: Store production inference data to Ruuter endpoint for analytics.
	Async version: Store production inference data to Resql endpoint for analytics.

	production_id = budget_tracker.fetch_production_connection_id_sync()
	production_id = budget_tracker.connection_fetcher.fetch_connection_id_sync("production")

	Store production inference data to Ruuter endpoint for analytics.
	Store production inference data to Resql endpoint for analytics.

Update inference results and and budget #181

Update inference results and and budget #181

Uh oh!

Conversation

nuwangeek commented Dec 1, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

nuwangeek Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

nuwangeek Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

nuwangeek Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants