On-the-fly activation compression for LLMs that reduces memory usage
recursive-neural-networks llm-training llm-inference llm-framework llm-performance resource-efficient-ai inference-memory-optimization kongformer
-
Updated
Mar 16, 2025 - Python