baotonglu

Follow

🐈‍⬛

Focusing

Baotong Lu baotonglu

🐈‍⬛

Focusing

Follow

"Simple systems work and complex don’t."

109 followers · 56 following

Microsoft Research
Beijing
https://baotonglu.github.io/

Achievements

Achievements

Highlights

Pro

Organizations

Lists (3)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars

ai-dynamo / nixl

NVIDIA Inference Xfer Library (NIXL)

C++ 209 20 Updated Mar 26, 2025

bshoshany / thread-pool

BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library

C++ 2,457 276 Updated Dec 20, 2024

inkooboo / thread-pool-cpp

High performance C++11 thread pool

C++ 573 128 Updated Oct 4, 2020

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,520 261 Updated Mar 29, 2025

NVIDIA / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,015 74 Updated Jan 31, 2025

sfu-dis / preemptdb

Low-Latency Transaction Scheduling via Userspace Interrupts: Why Wait or Yield When You Can Preempt? (SIGMOD 2025)

C++ 41 1 Updated Jan 21, 2025

October2001 / Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

358 8 Updated Mar 25, 2025

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …

Python 948 47 Updated Mar 28, 2025

microsoft / LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,984 284 Updated Mar 11, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,945 188 Updated Mar 29, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 7,195 1,182 Updated Mar 21, 2025

nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors

C++ 4,610 692 Updated Aug 11, 2024

matchyc / RoarGraph

VLDB 2024 paper repo. RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

C++ 41 9 Updated Sep 16, 2024

jeffhammond / STREAM

STREAM benchmark

C 375 147 Updated Feb 17, 2025

google-research / bigbird

Transformers for Longer Sequences

Python 600 106 Updated Sep 1, 2022

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 43,040 6,537 Updated Mar 29, 2025

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

C++ 33,998 3,820 Updated Mar 29, 2025

TsinghuaDatabaseGroup / AIDB

ai4db and db4ai work

753 89 Updated Dec 26, 2024

openmlsys / openmlsys-zh

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,336 450 Updated Apr 13, 2024

ggml-org / llama.cpp

LLM inference in C/C++

C++ 77,343 11,244 Updated Mar 29, 2025

PKUFlyingPig / CMU10-714

Learning material for CMU10-714: Deep Learning System

Jupyter Notebook 242 38 Updated May 12, 2024

microsoft / ML-For-Beginners

12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

HTML 71,626 15,337 Updated Feb 18, 2025

utsaslab / dinomo

DINOMO: An Elastic, Scalable, High-Performance Key-Value Store for Disaggregated Persistent Memory (PVLDB 2022, VLDB 2023)

Python 36 4 Updated Apr 21, 2023

sfu-dis / mosaicdb

The Art of Latency Hiding in Modern Database Engines (VLDB 2024)

C++ 55 5 Updated Oct 2, 2024

LPD-EPFL / CLHT

CLHT is a very fast and scalable (lock-based and lock-free) concurrent hash table with cache-line sized buckets.

C 158 23 Updated Oct 4, 2021

efficient / mica

MICA: A Fast In-memory Key-Value Store (see isca2015 branch for the ISCA2015 version)

C 207 49 Updated Jan 18, 2016

flamegraph-rs / flamegraph

Easy flamegraphs for Rust projects and everything else, without Perl or pipes <3

Rust 5,088 157 Updated Mar 25, 2025

fallfish / sepbit

Java 27 7 Updated Jan 17, 2022

fallfish / AustereCache

C++ 15 8 Updated Jun 11, 2023

derekmolloy / exploringBB

Source code for the book Exploring BeagleBone, by Derek Molloy (see www.exploringbeaglebone.com)

C++ 471 444 Updated Sep 12, 2020