Neural Magic
Neural Magic (Acquired by Red Hat) empowers developers to optimize & deploy LLMs at scale. Our model compression & acceleration enable top performance with vLLM
Pinned Loading
Repositories
Showing 10 of 74 repositories
- compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
-
- lmms-eval Public Forked from EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
- vllm-flash-attention Public Forked from vllm-project/flash-attention
Fast and memory-efficient exact attention
Most used topics
Loading…