Add Redis-based caching layer for inference optimization

Currently, repeated queries trigger full LLM inference, increasing latency and cost.

Proposal:
- Add Redis-based caching
- Cache responses for similar queries
- Reduce redundant inference calls

This will significantly improve performance and scalability.