Currently, repeated queries trigger full LLM inference, increasing latency and cost. Proposal: - Add Redis-based caching - Cache responses for similar queries - Reduce redundant inference calls This will significantly improve performance and scalability.
Currently, repeated queries trigger full LLM inference, increasing latency and cost.
Proposal:
This will significantly improve performance and scalability.