feat: Add StreamMetadataProvider caching to prevent port exhaustion by xiangfu0 · Pull Request #16623 · apache/pinot

xiangfu0 · 2025-08-18T10:42:33Z

Problem

PR #15393 introduced unique client IDs to resolve JMX conflicts, but this created too many open Kafka connections/ports for Pinot servers, leading to resource exhaustion on high-traffic clusters.

Solution

Implements comprehensive caching for StreamMetadataProvider instances that:

Reuses connections while maintaining unique client ID benefits
Prevents orphan providers with explicit cleanup on recreation
Ensures thread safety with synchronized operations
Provides graceful shutdown with automatic cleanup hooks

Key Features

🔧 StreamMetadataProviderCacheManager

Singleton cache manager with separate caches for stream and partition providers
Configurable cache expiry (30 minutes) and size limits (1000 entries)
Automatic cleanup using Guava RemovalListeners
Base client ID extraction for proper caching despite unique suffixes

🔄 Enhanced StreamConsumerFactory

createCachedStreamMetadataProvider() - Get cached or create new stream provider
createCachedPartitionMetadataProvider() - Get cached or create new partition provider
recreateCachedStreamMetadataProvider() - Force recreation with explicit cleanup
recreateCachedPartitionMetadataProvider() - Force recreation with explicit cleanup

🛡️ Orphan Prevention

Explicit old provider cleanup before creating new ones during recreation
Synchronized recreation methods to prevent race conditions
Enhanced clearAll() that closes all providers before clearing cache
Shutdown hooks for graceful cleanup on application termination

📊 Updated Client Classes

PartitionGroupMetadataFetcher
PinotLLCRealtimeSegmentManager
RealtimeConsumptionRateManager
StreamMetadataProvider.computePartitionGroupMetadata

Testing

Includes comprehensive unit tests:

Cache reuse verification
Orphan prevention validation
Recreation cleanup testing
Thread safety verification

Benefits

✅ Reduced Port Usage: Reuses existing connections instead of creating new ones
✅ Better Performance: Cached connections avoid establishment overhead
✅ Resource Efficiency: Prevents resource exhaustion on high-traffic clusters
✅ Maintained Reliability: Still recreates connections on failures
✅ Backward Compatibility: Original methods continue to work

Impact

Resolves port exhaustion while maintaining all benefits of unique client IDs from PR #15393. Production-ready with comprehensive error handling and monitoring.

Related: Addresses issues introduced in #15393

Implements comprehensive caching solution for StreamMetadataProvider instances to address port exhaustion issues caused by random client IDs from PR apache#15393. Key improvements: * Add StreamMetadataProviderCacheManager with intelligent caching * Cache providers by table+topic+partition to enable reuse * Add explicit cleanup on recreation to prevent orphan providers * Implement synchronized recreation methods for thread safety * Add shutdown hooks for graceful cleanup on app termination * Update main client classes to use cached providers: - PartitionGroupMetadataFetcher - PinotLLCRealtimeSegmentManager - RealtimeConsumptionRateManager - StreamMetadataProvider.computePartitionGroupMetadata Benefits: * Reduces Kafka connection count and prevents port exhaustion * Maintains unique client ID benefits for failure isolation * Provides automatic recreation on failures for robustness * Ensures proper resource cleanup with no orphaned providers * Backwards compatible with existing code Includes comprehensive unit tests for cache functionality and proper provider lifecycle management.

xiangfu0 closed this Aug 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add StreamMetadataProvider caching to prevent port exhaustion#16623

feat: Add StreamMetadataProvider caching to prevent port exhaustion#16623
xiangfu0 wants to merge 1 commit intoapache:masterfrom
xiangfu0:cache-kafka-client

xiangfu0 commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiangfu0 commented Aug 18, 2025

Problem

Solution

Key Features

🔧 StreamMetadataProviderCacheManager

🔄 Enhanced StreamConsumerFactory

🛡️ Orphan Prevention

📊 Updated Client Classes

Testing

Benefits

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant