-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
P1Priority 1: Critical, fix as soon as possiblePriority 1: Critical, fix as soon as possibleenhancementNew feature or requestNew feature or request
Description
Type: Infrastructure / Monitoring
Priority: P2 (Nice to have)
Effort: 3-4 hours
Description
Implement a centralized error ID system to improve error tracking, monitoring, and debugging across the application. Error IDs enable better correlation of errors across logs, metrics, and monitoring dashboards.
Background
During PR #78 review, we identified that error logs lack unique identifiers for tracking in monitoring systems like Sentry, DataDog, or Grafana. While we have good structured logging with context fields, we can't easily:
- Track how often specific errors occur across deployments
- Set up alerts for specific error conditions
- Correlate related errors across different services
- Link user-reported issues to specific error types
Current State
Error logging uses descriptive messages but no standardized IDs:
logger.error(
"Community %s configured to use %s but env var not set, falling back to platform key.",
community_id,
env_var,
extra={
"community_id": community_id,
"env_var_missing": True,
},
)Proposed Solution
Create a centralized error IDs module with standardized error codes:
1. Create Error IDs Module
# src/constants/error_ids.py
"""Centralized error IDs for tracking and monitoring.
Error ID Format: OSA_[Category][Number]
- OSA_E: General errors
- OSA_C: Configuration errors
- OSA_A: Authentication/Authorization errors
- OSA_S: Sync errors
- OSA_K: Knowledge base errors
"""
class ErrorIds:
"""Error IDs for tracking in monitoring systems."""
# Configuration Errors (C001-C099)
API_KEY_ENV_VAR_MISSING = "OSA_C001"
API_KEY_NOT_CONFIGURED = "OSA_C002"
COMMUNITY_CONFIG_INVALID = "OSA_C003"
CORS_ORIGIN_INVALID = "OSA_C004"
# Authentication Errors (A001-A099)
API_KEY_INVALID = "OSA_A001"
ORIGIN_NOT_AUTHORIZED = "OSA_A002"
BYOK_REQUIRED = "OSA_A003"
# Sync Errors (S001-S099)
GITHUB_SYNC_FAILED = "OSA_S001"
PAPERS_SYNC_FAILED = "OSA_S002"
# Knowledge Base Errors (K001-K099)
DOCUMENT_FETCH_FAILED = "OSA_K001"
SEARCH_FAILED = "OSA_K002"
# General Errors (E001-E099)
REGISTRY_NOT_INITIALIZED = "OSA_E001"
INTERNAL_SERVER_ERROR = "OSA_E002"2. Update Logging Calls
from src.constants.error_ids import ErrorIds
logger.error(
"Community %s configured to use %s but env var not set, falling back to platform key.",
community_id,
env_var,
extra={
"error_id": ErrorIds.API_KEY_ENV_VAR_MISSING,
"community_id": community_id,
"env_var": env_var,
"env_var_missing": True,
"fallback_to_platform": True,
},
)3. Include Error IDs in HTTPExceptions
raise HTTPException(
status_code=403,
detail={
"error_id": ErrorIds.ORIGIN_NOT_AUTHORIZED,
"message": "Origin not authorized. Please provide API key via X-OpenRouter-Key header.",
"help_url": "https://docs.osa.osc.earth/errors/OSA_A002"
}
)Acceptance Criteria
- Create
src/constants/error_ids.pywith ErrorIds class - Define error IDs for all major error categories
- Update all logger.error() calls to include error_id in extra
- Update all HTTPException raises to include error_id in detail
- Add error_id to JSON log formatter output
- Document error ID format and categories in module docstring
- Add tests verifying error IDs are present in logs
- Create error reference documentation (optional)
Benefits
- Monitoring: Easy to set up alerts for specific error IDs
- Debugging: Quickly find all instances of a specific error across logs
- Analytics: Track error frequency and trends over time
- User Support: Users can report error IDs for faster diagnosis
- Documentation: Error IDs can link to detailed error documentation
Example Monitoring Query
With error IDs, you can easily query:
-- Grafana/Loki query
{app="osa"} | json | error_id="OSA_C001" | count_over_time[1h]
-- Count API key missing errors per community
{app="osa"} | json | error_id="OSA_C001" | count by community_idImplementation Notes
- Follow existing pattern from HEDit project (referenced in CLAUDE.md)
- Keep error IDs stable - never reuse IDs for different errors
- Document deprecated error IDs if errors are removed
- Consider creating error documentation site (e.g., docs.osa.osc.earth/errors/OSA_C001)
Related
- PR feat: API key validation and monitoring (Phase 6) #78: API key validation and monitoring
- Future: Sentry integration (would benefit from error IDs)
- Future: Error documentation site
Labels
enhancement, monitoring, P2, infrastructure
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1Priority 1: Critical, fix as soon as possiblePriority 1: Critical, fix as soon as possibleenhancementNew feature or requestNew feature or request