Skip to content
@SqueezeBits

SqueezeBits Inc.

We are squeezing bits.

Popular repositories Loading

  1. QUICK QUICK Public

    QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

    Python 111 5

  2. owlite owlite Public

    OwLite is a low-code AI model compression toolkit for AI models.

    Python 37 3

  3. owlite-examples owlite-examples Public

    OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

    Python 8

  4. vllm-quick vllm-quick Public

    Python 1

  5. .github .github Public

  6. mlperf_inference_results_v4.0 mlperf_inference_results_v4.0 Public

    C++ 1

Repositories

Showing 8 of 8 repositories
  • vllm-fork Public Forked from HabanaAI/vllm-fork

    A high-throughput and memory-efficient inference and serving engine for LLMs

    SqueezeBits/vllm-fork’s past year of commit activity
    Python 0 Apache-2.0 4,482 0 0 Updated Oct 30, 2024
  • SqueezeBits/vllm-hpu-extension’s past year of commit activity
    Python 0 Apache-2.0 5 0 0 Updated Oct 16, 2024
  • owlite-examples Public

    OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

    SqueezeBits/owlite-examples’s past year of commit activity
    Python 8 0 0 0 Updated Sep 27, 2024
  • owlite Public

    OwLite is a low-code AI model compression toolkit for AI models.

    SqueezeBits/owlite’s past year of commit activity
    Python 37 AGPL-3.0 3 0 0 Updated Sep 27, 2024
  • SqueezeBits/mlperf_inference_results_v4.0’s past year of commit activity
    C++ 0 Apache-2.0 1 0 1 Updated Jul 23, 2024
  • .github Public
    SqueezeBits/.github’s past year of commit activity
    0 0 0 0 Updated Jul 22, 2024
  • SqueezeBits/vllm-quick’s past year of commit activity
    Python 1 Apache-2.0 0 0 0 Updated Mar 13, 2024
  • QUICK Public

    QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

    SqueezeBits/QUICK’s past year of commit activity
    Python 111 MIT 5 5 0 Updated Mar 6, 2024

Top languages

Loading…

Most used topics

Loading…