Skip to content
@ModelTC

ModelTC

Model Infra

Pinned

  1. MQBench MQBench Public

    Model Quantization Benchmark

    Shell 712 135

  2. United-Perception United-Perception Public

    United Perception

    Python 421 65

  3. NNLQP NNLQP Public

    Python 32 3

  4. Dipoorlet Dipoorlet Public

    Offline Quantization Tools for Deploy.

    Python 97 12

  5. lightllm lightllm Public

    LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

    Python 1.7k 157

Repositories

Showing 10 of 33 repositories
  • lightllm Public

    LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

    Python 1,747 Apache-2.0 157 47 5 Updated Apr 12, 2024
  • TFMQ-DM Public

    [CVPR 2024 Highlight] TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

    Jupyter Notebook 10 Apache-2.0 2 0 0 Updated Apr 11, 2024
  • statecs Public
    Rust 1 Apache-2.0 1 0 0 Updated Apr 9, 2024
  • llmc Public

    llmc is an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.

    Python 32 Apache-2.0 2 0 0 Updated Apr 3, 2024
  • general-sam-py Public

    Python bindings for general-sam and some utilities

    Python 1 Apache-2.0 0 0 1 Updated Apr 1, 2024
  • Python 9 Apache-2.0 0 1 0 Updated Mar 31, 2024
  • DeepSpeed Public Forked from microsoft/DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

    Python 0 Apache-2.0 3,939 0 0 Updated Mar 28, 2024
  • general-sam Public

    A general suffix automaton implementation in Rust with Python bindings

    Rust 2 Apache-2.0 0 0 1 Updated Mar 28, 2024
  • greedy-tokenizer Public

    Greedily tokenize strings with the longest tokens iteratively.

    Python 0 Apache-2.0 0 0 0 Updated Mar 27, 2024
  • QLLM Public

    [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

    Python 20 Apache-2.0 0 0 0 Updated Mar 11, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…