Skip to content

Conversation

@nuwangeek
Copy link
Collaborator

@nuwangeek nuwangeek commented Oct 2, 2025

Add LLM cost tracking and update reranker flow

  • Removed reranker from the main retrieval pipeline
  • Commented out encoder model loading to simplify flow
  • Added metadata collection function in cost_utils to extract LLM costs and token usage
  • Implemented total cost calculation utility in cost_utils
  • Integrated logging in LLM Orchestrator Service to track cost and token metrics

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR disables the reranker component for performance optimization and adds comprehensive LLM cost tracking throughout the system. The changes focus on removing reranker dependencies while implementing detailed usage monitoring for all LLM operations.

  • Completely disabled reranker functionality by commenting out initialization and usage code
  • Added new cost tracking utilities to monitor LLM usage, tokens, and costs across components
  • Integrated cost tracking into response generation and prompt refinement workflows

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/vector_indexer/hybrid_retrieval.py Disabled reranker initialization and usage, always using fusion scores only
src/utils/cost_utils.py New utility module for LLM cost calculation and usage tracking
src/response_generator/response_generate.py Added usage tracking to response generation workflow
src/prompt_refine_manager/prompt_refiner.py Added usage tracking to prompt refinement process
src/llm_orchestration_service.py Integrated cost tracking across orchestration workflow with detailed logging
pyproject.toml Removed rerankers dependency from project requirements

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

# )
# self.reranker = None

# Reranker disabled - set to None
Copy link

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This comment is redundant since the code immediately below sets self.reranker = None and there's already a comprehensive comment block above explaining the reranker is disabled. Consider removing this line.

Suggested change
# Reranker disabled - set to None

Copilot uses AI. Check for mistakes.
Comment on lines 93 to 129
def track_lm_usage(
operation: Callable[..., Any], *args, **kwargs
) -> tuple[Any, Dict[str, Any]]:
"""
Context manager-like function to track LM usage for any operation.
Args:
operation: The function to execute and track
*args: Positional arguments for the operation
**kwargs: Keyword arguments for the operation
Returns:
Tuple of (operation_result, usage_info_dict)
Example:
result, usage = track_lm_usage(predictor, question="What is AI?")
"""
# Get initial history length
lm = dspy.settings.lm
history_length_before = len(lm.history) if lm and hasattr(lm, "history") else 0

# Execute the operation
result = operation(*args, **kwargs)

# Extract usage from new history entries
usage_info = get_default_usage_dict()

if lm and hasattr(lm, "history"):
try:
new_history = lm.history[history_length_before:]
usage_info = extract_cost_from_lm_history(new_history)
except Exception as e:
logger.warning(f"Failed to extract usage info: {str(e)}")

return result, usage_info


Copy link

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function track_lm_usage is defined but not used anywhere in the codebase according to the diff. Consider removing it if it's not needed, or add test coverage if it's intended for future use.

Suggested change
def track_lm_usage(
operation: Callable[..., Any], *args, **kwargs
) -> tuple[Any, Dict[str, Any]]:
"""
Context manager-like function to track LM usage for any operation.
Args:
operation: The function to execute and track
*args: Positional arguments for the operation
**kwargs: Keyword arguments for the operation
Returns:
Tuple of (operation_result, usage_info_dict)
Example:
result, usage = track_lm_usage(predictor, question="What is AI?")
"""
# Get initial history length
lm = dspy.settings.lm
history_length_before = len(lm.history) if lm and hasattr(lm, "history") else 0
# Execute the operation
result = operation(*args, **kwargs)
# Extract usage from new history entries
usage_info = get_default_usage_dict()
if lm and hasattr(lm, "history"):
try:
new_history = lm.history[history_length_before:]
usage_info = extract_cost_from_lm_history(new_history)
except Exception as e:
logger.warning(f"Failed to extract usage info: {str(e)}")
return result, usage_info

Copilot uses AI. Check for mistakes.
Comment on lines 26 to 29
"""Stateless service class for handling LLM orchestration business logic."""

def __init__(self) -> None:
"""Initialize the stateless orchestration service."""
Copy link

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The docstring mentions 'stateless orchestration service' but the method now has logic that tracks costs in a dictionary, making it somewhat stateful during execution. Consider updating the docstring to reflect this behavior.

Suggested change
"""Stateless service class for handling LLM orchestration business logic."""
def __init__(self) -> None:
"""Initialize the stateless orchestration service."""
"""
Service class for handling LLM orchestration business logic.
The service does not maintain state between requests (stateless in the architectural sense),
but tracks per-request state (such as costs) internally during the execution of a request.
"""
def __init__(self) -> None:
"""
Initialize the orchestration service.
Note: The service does not persist state between requests, but tracks per-request
information (e.g., costs) internally during request processing.
"""

Copilot uses AI. Check for mistakes.

total_costs = calculate_total_costs(costs_dict)

logger.info("=" * 50)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this?

nuwangeek and others added 3 commits October 3, 2025 16:27
Get update from buerokratt/RAG-Module wip to rootcodelabs/RAG-Module wip
Get update from wip into RAG-111
@Thirunayan22 Thirunayan22 merged commit 0294237 into buerokratt:wip Oct 3, 2025
6 of 7 checks passed
nuwangeek added a commit to rootcodelabs/RAG-Module that referenced this pull request Oct 3, 2025
Disable Re ranker and Add LLM cost tracking  (buerokratt#112)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants