English | 中文
AIGW is an intelligent inference scheduler for large-scale inference services. It provides intelligent routing, overload protection, and multi-tenant QoS capabilities through a global routing solution that is aware of load, KVCache, and Lora. This helps achieve higher throughput, lower latency, and efficient use of resources.
Early & quick developing
- A flexible, powerful, and easy-to-maintain Envoy Golang extension
- Near real-time load metric collection
- A balanced multi-factor composite decision-making algorithm
- A highly available architecture that supports horizontal scaling
AIGW is built based on Envoy and Istio. We express our sincere gratitude to them.
- Precise cache-awareness
- SLO-aware algorithm based on latency prediction
- PD separation scheduling
- DP level scheduling
This project is licensed under the Apache 2.0 License.

