Skip to content
View baotonglu's full-sized avatar
🐈‍⬛
Focusing
🐈‍⬛
Focusing

Highlights

  • Pro

Organizations

@sfu-dis

Block or report baotonglu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

NVIDIA Inference Xfer Library (NIXL)

C++ 209 20 Updated Mar 26, 2025

BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library

C++ 2,457 276 Updated Dec 20, 2024

High performance C++11 thread pool

C++ 573 128 Updated Oct 4, 2020

FlashInfer: Kernel Library for LLM Serving

Cuda 2,520 261 Updated Mar 29, 2025

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,015 74 Updated Jan 31, 2025

Low-Latency Transaction Scheduling via Userspace Interrupts: Why Wait or Yield When You Can Preempt? (SIGMOD 2025)

C++ 41 1 Updated Jan 21, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

358 8 Updated Mar 25, 2025

[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …

Python 948 47 Updated Mar 28, 2025

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,984 284 Updated Mar 11, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,945 188 Updated Mar 29, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,195 1,182 Updated Mar 21, 2025

Header-only C++/python library for fast approximate nearest neighbors

C++ 4,610 692 Updated Aug 11, 2024

VLDB 2024 paper repo. RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

C++ 41 9 Updated Sep 16, 2024

STREAM benchmark

C 375 147 Updated Feb 17, 2025

Transformers for Longer Sequences

Python 600 106 Updated Sep 1, 2022

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 43,040 6,537 Updated Mar 29, 2025

A library for efficient similarity search and clustering of dense vectors.

C++ 33,998 3,820 Updated Mar 29, 2025

ai4db and db4ai work

753 89 Updated Dec 26, 2024

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,336 450 Updated Apr 13, 2024

LLM inference in C/C++

C++ 77,343 11,244 Updated Mar 29, 2025

Learning material for CMU10-714: Deep Learning System

Jupyter Notebook 242 38 Updated May 12, 2024

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

HTML 71,626 15,337 Updated Feb 18, 2025

DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (PVLDB 2022, VLDB 2023)

Python 36 4 Updated Apr 21, 2023

The Art of Latency Hiding in Modern Database Engines (VLDB 2024)

C++ 55 5 Updated Oct 2, 2024

CLHT is a very fast and scalable (lock-based and lock-free) concurrent hash table with cache-line sized buckets.

C 158 23 Updated Oct 4, 2021

MICA: A Fast In-memory Key-Value Store (see isca2015 branch for the ISCA2015 version)

C 207 49 Updated Jan 18, 2016

Easy flamegraphs for Rust projects and everything else, without Perl or pipes <3

Rust 5,088 157 Updated Mar 25, 2025
Java 27 7 Updated Jan 17, 2022
C++ 15 8 Updated Jun 11, 2023

Source code for the book Exploring BeagleBone, by Derek Molloy (see www.exploringbeaglebone.com)

C++ 471 444 Updated Sep 12, 2020
Next
Showing results