
Starred repositories
Fully open reproduction of DeepSeek-R1
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Zero Bubble Pipeline Parallelism
Official Repo for Open-Reasoner-Zero
Sky-T1: Train your own O1 preview model within $450
x86 PC emulator and x86-to-wasm JIT, running in the browser
SGLang is a fast serving framework for large language models and vision language models.
verl: Volcano Engine Reinforcement Learning for LLMs
β lichess.org: the forever free, adless and open source chess server β
π° Must-read papers on KV Cache Compression (constantly updating π€).
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Fast OS-level support for GPU checkpoint and restore
Triton-based implementation of Sparse Mixture of Experts.
Trio β a friendly Python library for async concurrency and I/O
β·οΈ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
prime is a framework for efficient, globally distributed training of AI models over the internet.
nsync is a C library that exports various synchronization primitives, such as mutexes
ππ€ Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Lightning fast C++/CUDA neural network framework
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Efficient Triton Kernels for LLM Training
A fast communication-overlapping library for tensor/expert parallelism on GPUs.