Skip to content
View withlin's full-sized avatar
🧸
🧸
  • nil
  • GuangZhou,China

Block or report withlin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A Datacenter Scale Distributed Inference Serving Framework

Rust 3,614 275 Updated Apr 10, 2025

kntrl is an eBPF based runtime agent that monitors and prevents anomalous behaviour defined by you on your pipeline. kntrl achieves this by monitoring kernel calls, and denying access as soon as yo…

C 91 5 Updated Mar 21, 2025

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,849 381 Updated Jul 11, 2024

FireFlyer Record file format, writer and reader for DL training samples.

Python 210 19 Updated Dec 1, 2022

Automated Machine Learning on Kubernetes

Python 1,564 468 Updated Apr 9, 2025

The Open Source Feature Store for AI/ML

Python 5,927 1,068 Updated Apr 10, 2025

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Go 2,896 1,404 Updated Apr 10, 2025

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Scala 2,173 936 Updated Apr 10, 2025

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Java 944 387 Updated Apr 10, 2025

A better radix-2 fast Fourier transform in Go.

Go 16 2 Updated Jun 20, 2023

The open-source AIOps and alert management platform

Python 9,965 926 Updated Apr 10, 2025

A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"

HTML 9,374 1,458 Updated Apr 15, 2023

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

Python 5,923 672 Updated Apr 9, 2025

open-source MLOps platform

Shell 399 40 Updated Apr 10, 2025

MLOps tutorial using Python, Docker and Kubernetes.

Python 388 108 Updated Oct 18, 2024

A curated list of references for MLOps

13,027 1,934 Updated Nov 21, 2024

cnvrg operator for deploying cnvrg.io K8s native AI/MLOps platform

Go 18 6 Updated Mar 26, 2025

AirLLM 70B inference with single 4GB GPU

Jupyter Notebook 5,757 456 Updated Nov 24, 2024

Distributed RL System for LLM Reasoning

Python 1,045 46 Updated Apr 7, 2025

Serving multiple LoRA finetuned LLM as one

Python 1,050 49 Updated May 8, 2024

Distributed Triton for Parallel Systems

MLIR 362 18 Updated Apr 8, 2025

Curated collection of papers in MoE model inference

134 6 Updated Feb 19, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 352 46 Updated Feb 11, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. 🎉🎉

Python 3,809 268 Updated Apr 6, 2025

Awesome LLM compression research papers and tools.

1,460 93 Updated Apr 10, 2025

A curated list for Efficient Large Language Models

Python 1,597 127 Updated Apr 6, 2025

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

Cuda 3,361 358 Updated Apr 10, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 6,905 684 Updated Apr 10, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,612 276 Updated Apr 10, 2025

A collection of MCP servers.

35,279 2,473 Updated Apr 10, 2025
Next
Showing results