Skip to content
View dvmazur's full-sized avatar
πŸ‹
πŸ‹

Block or report dvmazur

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Collected sollutions from codeforces.com.

18 2 Updated May 1, 2022

Fully open reproduction of DeepSeek-R1

Python 23,496 2,141 Updated Mar 30, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,679 283 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,112 536 Updated Mar 28, 2025

Zero Bubble Pipeline Parallelism

Python 376 22 Updated Mar 4, 2025

Official Repo for Open-Reasoner-Zero

Python 1,689 81 Updated Mar 5, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,391 812 Updated Mar 1, 2025

Sky-T1: Train your own O1 preview model within $450

Python 3,168 322 Updated Mar 25, 2025

x86 PC emulator and x86-to-wasm JIT, running in the browser

JavaScript 20,497 1,472 Updated Mar 25, 2025
Jupyter Notebook 12 1 Updated Mar 28, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 12,643 1,391 Updated Mar 30, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 5,902 589 Updated Mar 30, 2025

β™ž lichess.org: the forever free, adless and open source chess server β™ž

Scala 16,421 2,364 Updated Mar 30, 2025

πŸ“° Must-read papers on KV Cache Compression (constantly updating πŸ€—).

358 8 Updated Mar 25, 2025

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

C++ 235 8 Updated Feb 23, 2025

Inspirational Mapping

Vue 2,445 61 Updated Sep 25, 2024

Fast OS-level support for GPU checkpoint and restore

C++ 170 15 Updated Mar 4, 2025
C++ 848 122 Updated Sep 10, 2023

Triton-based implementation of Sparse Mixture of Experts.

Python 209 17 Updated Nov 28, 2024

Trio – a friendly Python library for async concurrency and I/O

Python 6,411 352 Updated Mar 28, 2025

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Python 940 54 Updated Dec 6, 2024

prime is a framework for efficient, globally distributed training of AI models over the internet.

Python 689 67 Updated Mar 28, 2025

nsync is a C library that exports various synchronization primitives, such as mutexes

C 1,147 86 Updated Jul 23, 2024

πŸš€πŸ€– Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

Python 34,640 3,031 Updated Mar 28, 2025

Lightning fast C++/CUDA neural network framework

C++ 3,945 486 Updated Jan 27, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,948 652 Updated Mar 28, 2025

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,594 1,436 Updated Mar 26, 2025

Efficient Triton Kernels for LLM Training

Python 4,752 289 Updated Mar 28, 2025
Go 7 Updated Feb 10, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 815 53 Updated Mar 19, 2025
Next
Showing results