Implement health check backend API and storage functionality #11678

colesmcintosh · 2025-06-12T23:05:36Z

Title

Implement health check backend API and storage functionality

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature

Changes

Backend API Enhancements

Enhanced litellm/proxy/health_endpoints/_health_endpoints.py with comprehensive health check functionality
Added health check storage and retrieval logic in litellm/proxy/utils.py
Implemented individual model health checks with detailed error reporting
Added response time tracking and status monitoring

Key Features

Health Check Endpoints: Extended health check APIs to support individual model testing
Database Integration: Functions to save and retrieve health check results
Error Handling: Comprehensive error pattern recognition and reporting
Performance Monitoring: Response time tracking for model performance analysis

Screenshot of passing tests:

Dependencies:

Requires database schema changes from PR Add LiteLLM_HealthCheckTable to database schema #11677

- Introduced methods for saving health check results to the database, including validation and cleaning of data. - Added new health check endpoints to retrieve health check history and latest health statuses for models. - Updated model prices and context window configuration for new Azure transcription models.

vercel · 2025-06-12T23:05:40Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 13, 2025 10:05pm

- Introduced tests for PrismaClient health check methods, including saving results and retrieving health check history. - Added tests for the _save_health_check_to_db function to ensure proper handling of healthy and unhealthy endpoints. - Implemented mock objects to simulate database interactions and validate method behaviors.

ishaan-jaff

reviewed

litellm/proxy/health_endpoints/_health_endpoints.py

ishaan-jaff · 2025-06-13T14:35:03Z

litellm/proxy/health_endpoints/_health_endpoints.py

            )

+            # Optionally save health check result to database (non-blocking)
+            if prisma_client is not None and target_model is not None:
+                await _save_health_check_to_db(


use asyncio.create_task so we don't need to wait for writing to the DB

litellm/proxy/health_endpoints/_health_endpoints.py

ishaan-jaff · 2025-06-13T14:37:19Z

litellm/proxy/utils.py

+            if details is not None and isinstance(details, dict):
+                try:
+                    # Serialize and deserialize to ensure valid JSON and remove unsupported values
+                    serialized = json.dumps(details, default=str)


use safe_dumps in safe_json_dumps.py

ishaan-jaff · 2025-06-13T14:37:36Z

litellm/proxy/utils.py

+                try:
+                    # Serialize and deserialize to ensure valid JSON and remove unsupported values
+                    serialized = json.dumps(details, default=str)
+                    clean_details = json.loads(serialized)


use safe_json_loads.py

- Updated health endpoint to use `get_deployment` for retrieving model names based on model IDs, enhancing error handling for missing models. - Changed health check result saving to the database to be non-blocking by using `asyncio.create_task`. - Cleaned up code for better readability and maintainability.

…nd error handling - Removed unused imports and simplified exception handling in `_get_redoc_url` and `_get_docs_url` functions to manage circular imports. - Cleaned up logging statements for consistency and clarity. - Streamlined error message formatting in `handle_exception_on_proxy` function.

… improved clarity and robustness - Added type hints for `_end_user_list_transactions` to specify it as a dictionary mapping end user IDs to spend amounts. - Updated default values for optional fields in `SpendLogsPayload` to ensure they are initialized properly, enhancing error handling. - Refactored `_premium_user_check` function to improve model validation logic and error handling.

- Updated the disable_spend_updates method to return False if the environment variable DISABLE_SPEND_UPDATES is not set or is None, improving robustness in configuration handling.

- Enhanced the join_paths function to better manage leading and trailing slashes, ensuring correct path concatenation. - Added logic to handle cases where either base_path or route is empty, improving robustness and usability.

- Introduced a new method `_save_health_check_to_db` for saving health check results to the database, utilizing safe JSON functions for data integrity. - Refactored existing health check methods to streamline the process and improve error logging. - Updated email sending logic to ensure secure connections and better error handling. - Improved spend update logic with batch processing and retry mechanisms for database operations. - Added utility functions for projected spend calculations and enhanced validation for team configurations.

- Introduced `save_health_check_result` method to save health check results with detailed logging and validation. - Added `get_health_check_history` method for retrieving health check records with optional filtering. - Implemented `get_all_latest_health_checks` method to fetch the latest health checks for each model. - Enhanced error handling and logging for all new methods to improve reliability and traceability.

- Updated the `_save_health_check_to_db` function to call `save_health_check_result` with explicitly typed arguments instead of a dictionary spread, enhancing code clarity and type safety. - Removed unused method bindings in the mock Prisma client tests to streamline the test setup.

…reamline code and improve maintainability.

…ck result saving - Added `_validate_response_time` method to ensure response time values are valid and handle exceptions gracefully. - Introduced `_clean_details` method to validate and clean details JSON, improving data integrity. - Refactored `save_health_check_result` to utilize these new methods for optional fields, enhancing code clarity and maintainability. - Updated tests to bind new methods to the mock Prisma client for comprehensive testing.

- Introduced `_convert_health_check_to_dict` to standardize health check record conversion to dictionary format for JSON responses. - Added `_check_prisma_client` helper function to streamline database availability checks and improve error handling. - Refactored health check endpoints to utilize the new utility functions, enhancing code clarity and maintainability.

- Simplified the mock PrismaClient setup by consolidating method bindings. - Updated health check result saving tests to use parameterized scenarios for better coverage. - Added tests for health check history retrieval and graceful handling when no database client is provided. - Removed redundant mock functions to streamline the test suite.

- Added `_perform_health_check_and_save` to encapsulate health check execution and optional database saving. - Refactored health endpoint logic to utilize the new helper function, improving code clarity and reducing redundancy. - Enhanced error handling and streamlined the process of saving health check results to the database.

ishaan-jaff

LGTM !

…11678)" This reverts commit 5f34cee.

…#11678) * feat: Add health check functionality and endpoints - Introduced methods for saving health check results to the database, including validation and cleaning of data. - Added new health check endpoints to retrieve health check history and latest health statuses for models. - Updated model prices and context window configuration for new Azure transcription models. * test: Add unit tests for health check functionality - Introduced tests for PrismaClient health check methods, including saving results and retrieving health check history. - Added tests for the _save_health_check_to_db function to ensure proper handling of healthy and unhealthy endpoints. - Implemented mock objects to simulate database interactions and validate method behaviors. * Refactor health endpoint model ID handling and improve logging - Updated health endpoint to use `get_deployment` for retrieving model names based on model IDs, enhancing error handling for missing models. - Changed health check result saving to the database to be non-blocking by using `asyncio.create_task`. - Cleaned up code for better readability and maintainability. * Refactor utility functions in proxy module for improved readability and error handling - Removed unused imports and simplified exception handling in `_get_redoc_url` and `_get_docs_url` functions to manage circular imports. - Cleaned up logging statements for consistency and clarity. - Streamlined error message formatting in `handle_exception_on_proxy` function. * Enhance type hinting and default values in ProxyUpdateSpend class for improved clarity and robustness - Added type hints for `_end_user_list_transactions` to specify it as a dictionary mapping end user IDs to spend amounts. - Updated default values for optional fields in `SpendLogsPayload` to ensure they are initialized properly, enhancing error handling. - Refactored `_premium_user_check` function to improve model validation logic and error handling. * Fix disable_spend_updates method to handle None return value gracefully - Updated the disable_spend_updates method to return False if the environment variable DISABLE_SPEND_UPDATES is not set or is None, improving robustness in configuration handling. * Refactor join_paths function in utils.py for improved path handling - Enhanced the join_paths function to better manage leading and trailing slashes, ensuring correct path concatenation. - Added logic to handle cases where either base_path or route is empty, improving robustness and usability. * Enhance health check functionality and improve error handling - Introduced a new method `_save_health_check_to_db` for saving health check results to the database, utilizing safe JSON functions for data integrity. - Refactored existing health check methods to streamline the process and improve error logging. - Updated email sending logic to ensure secure connections and better error handling. - Improved spend update logic with batch processing and retry mechanisms for database operations. - Added utility functions for projected spend calculations and enhanced validation for team configurations. * Add health check methods for database interaction - Introduced `save_health_check_result` method to save health check results with detailed logging and validation. - Added `get_health_check_history` method for retrieving health check records with optional filtering. - Implemented `get_all_latest_health_checks` method to fetch the latest health checks for each model. - Enhanced error handling and logging for all new methods to improve reliability and traceability. * Refactor health check result saving to use typed arguments - Updated the `_save_health_check_to_db` function to call `save_health_check_result` with explicitly typed arguments instead of a dictionary spread, enhancing code clarity and type safety. - Removed unused method bindings in the mock Prisma client tests to streamline the test setup. * Remove unused `_save_health_check_to_db` function from utils.py to streamline code and improve maintainability. * Implement response time validation and details cleaning in health check result saving - Added `_validate_response_time` method to ensure response time values are valid and handle exceptions gracefully. - Introduced `_clean_details` method to validate and clean details JSON, improving data integrity. - Refactored `save_health_check_result` to utilize these new methods for optional fields, enhancing code clarity and maintainability. - Updated tests to bind new methods to the mock Prisma client for comprehensive testing. * Add health check utility functions and refactor existing endpoints - Introduced `_convert_health_check_to_dict` to standardize health check record conversion to dictionary format for JSON responses. - Added `_check_prisma_client` helper function to streamline database availability checks and improve error handling. - Refactored health check endpoints to utilize the new utility functions, enhancing code clarity and maintainability. * Refactor health check tests for improved clarity and coverage - Simplified the mock PrismaClient setup by consolidating method bindings. - Updated health check result saving tests to use parameterized scenarios for better coverage. - Added tests for health check history retrieval and graceful handling when no database client is provided. - Removed redundant mock functions to streamline the test suite. * Implement helper function for health check and database saving - Added `_perform_health_check_and_save` to encapsulate health check execution and optional database saving. - Refactored health endpoint logic to utilize the new helper function, improving code clarity and reducing redundancy. - Enhanced error handling and streamlined the process of saving health check results to the database.

…erriAI#11678)" This reverts commit 5f34cee.

colesmcintosh mentioned this pull request Jun 12, 2025

Implement health check frontend UI components and dashboard integration #11679

Merged

4 tasks

vercel bot deployed to Preview June 12, 2025 23:07 View deployment

Merge branch 'BerriAI:main' into health-check-backend

03eab98

vercel bot deployed to Preview June 12, 2025 23:27 View deployment

vercel bot deployed to Preview June 12, 2025 23:34 View deployment

colesmcintosh marked this pull request as ready for review June 12, 2025 23:39

ishaan-jaff requested changes Jun 13, 2025

View reviewed changes

Merge branch 'BerriAI:main' into health-check-backend

599a695

vercel bot deployed to Preview June 13, 2025 15:20 View deployment

vercel bot deployed to Preview June 13, 2025 16:00 View deployment

vercel bot deployed to Preview June 13, 2025 16:42 View deployment

vercel bot deployed to Preview June 13, 2025 16:49 View deployment

Fix disable_spend_updates method to handle None return value gracefully

0644c1d

- Updated the disable_spend_updates method to return False if the environment variable DISABLE_SPEND_UPDATES is not set or is None, improving robustness in configuration handling.

vercel bot deployed to Preview June 13, 2025 17:29 View deployment

vercel bot deployed to Preview June 13, 2025 17:44 View deployment

Merge branch 'BerriAI:main' into health-check-backend

7ba046c

vercel bot deployed to Preview June 13, 2025 18:10 View deployment

vercel bot deployed to Preview June 13, 2025 19:24 View deployment

vercel bot deployed to Preview June 13, 2025 19:50 View deployment

vercel bot deployed to Preview June 13, 2025 20:32 View deployment

colesmcintosh added 2 commits June 13, 2025 14:45

Remove unused _save_health_check_to_db function from utils.py to st…

81bde7d

…reamline code and improve maintainability.

vercel bot deployed to Preview June 13, 2025 20:48 View deployment

vercel bot deployed to Preview June 13, 2025 21:23 View deployment

vercel bot deployed to Preview June 13, 2025 21:27 View deployment

colesmcintosh and others added 2 commits June 13, 2025 16:02

Merge branch 'BerriAI:main' into health-check-backend

5196831

vercel bot deployed to Preview June 13, 2025 22:05 View deployment

ishaan-jaff approved these changes Jun 14, 2025

View reviewed changes

ishaan-jaff merged commit 5f34cee into BerriAI:main Jun 14, 2025
6 checks passed

ishaan-jaff added a commit that referenced this pull request Jun 14, 2025

Revert "Implement health check backend API and storage functionality (#…

1c1e41c

…11678)" This reverts commit 5f34cee.

X4tar pushed a commit to X4tar/litellm that referenced this pull request Jun 17, 2025

Revert "Implement health check backend API and storage functionality (B…

1e2802e

…erriAI#11678)" This reverts commit 5f34cee.

colesmcintosh mentioned this pull request Jun 18, 2025

Implement health check backend API and storage functionality - fix ci/cd #11852

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement health check backend API and storage functionality #11678

Implement health check backend API and storage functionality #11678

Uh oh!

colesmcintosh commented Jun 12, 2025 •

edited

Loading

Uh oh!

vercel bot commented Jun 12, 2025 •

edited

Loading

Uh oh!

ishaan-jaff left a comment

Uh oh!

Uh oh!

ishaan-jaff Jun 13, 2025

Uh oh!

Uh oh!

ishaan-jaff Jun 13, 2025

Uh oh!

ishaan-jaff Jun 13, 2025

Uh oh!

ishaan-jaff left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Implement health check backend API and storage functionality #11678

Implement health check backend API and storage functionality #11678

Uh oh!

Conversation

colesmcintosh commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Title

Relevant issues

Pre-Submission checklist

Type

Changes

Backend API Enhancements

Key Features

Uh oh!

vercel bot commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ishaan-jaff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ishaan-jaff Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ishaan-jaff Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

ishaan-jaff Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

ishaan-jaff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

colesmcintosh commented Jun 12, 2025 •

edited

Loading

vercel bot commented Jun 12, 2025 •

edited

Loading