Skip to content
@neuralmagic

Neural Magic

Neural Magic (Acquired by Red Hat) empowers developers to optimize & deploy LLMs at scale. Our model compression & acceleration enable top performance with vLLM

Pinned Loading

  1. deepsparse Public archive

    Sparsity-aware deep learning inference runtime for CPUs

    Python 3.2k 189

Repositories

Showing 10 of 78 repositories
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 13 Apache-2.0 8,899 0 8 Updated Jul 20, 2025
  • axolotl Public Forked from axolotl-ai-cloud/axolotl

    Go ahead and axolotl questions

    Python 0 Apache-2.0 1,100 0 5 Updated Jul 20, 2025
  • research Public

    Repository to enable research flows

    Python 1 0 0 3 Updated Jul 18, 2025
  • speculators Public
    Python 10 Apache-2.0 1 23 (3 issues need help) 10 Updated Jul 18, 2025
  • 0 0 0 2 Updated Jul 18, 2025
  • flashinfer Public Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    Cuda 0 Apache-2.0 389 0 0 Updated Jul 18, 2025
  • DeepGEMM Public Forked from deepseek-ai/DeepGEMM

    DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

    Python 0 MIT 652 0 0 Updated Jul 18, 2025
  • compressed-tensors Public

    A safetensors extension to efficiently store sparse quantized tensors on disk

    Python 137 Apache-2.0 17 4 24 Updated Jul 17, 2025
  • arena-hard-auto Public Forked from lmarena/arena-hard-auto

    Arena-Hard-Auto: An automatic LLM benchmark.

    Python 0 Apache-2.0 113 0 1 Updated Jul 16, 2025
  • pplx-kernels Public Forked from ppl-ai/pplx-kernels

    Perplexity GPU Kernels

    C++ 0 MIT 49 0 0 Updated Jul 15, 2025