Lists (16)
Sort Name ascending (A-Z)
- All languages
- ANTLR
- ASL
- ActionScript
- AppleScript
- Assembly
- Astro
- Awk
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Crystal
- Cuda
- Dart
- Dockerfile
- EJS
- Elixir
- Erlang
- F#
- Fennel
- Go
- Groovy
- HCL
- HTML
- Handlebars
- Haskell
- Java
- JavaScript
- Jinja
- Jsonnet
- Jupyter Notebook
- Kotlin
- LLVM
- Less
- Logos
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- MoonScript
- Mustache
- OCaml
- Objective-C
- Objective-C++
- PHP
- PLpgSQL
- Perl
- PowerShell
- PureBasic
- Python
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Smarty
- Solidity
- Starlark
- Swift
- TeX
- TypeScript
- Typst
- V
- Vim Script
- Vue
- WebAssembly
- Zig
- templ
Starred repositories
A Datacenter Scale Distributed Inference Serving Framework
kntrl is an eBPF based runtime agent that monitors and prevents anomalous behaviour defined by you on your pipeline. kntrl achieves this by monitoring kernel calls, and denying access as soon as yo…
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
FireFlyer Record file format, writer and reader for DL training samples.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
argusdusty / gofft
Forked from ktye/fftA better radix-2 fast Fourier transform in Go.
The open-source AIOps and alert management platform
A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"
ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
MLOps tutorial using Python, Docker and Kubernetes.
cnvrg operator for deploying cnvrg.io K8s native AI/MLOps platform
AirLLM 70B inference with single 4GB GPU
Distributed Triton for Parallel Systems
Curated collection of papers in MoE model inference
collection of benchmarks to measure basic GPU capabilities
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. 🎉🎉
Awesome LLM compression research papers and tools.
A curated list for Efficient Large Language Models
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
Accessible large language models via k-bit quantization for PyTorch.
FlashInfer: Kernel Library for LLM Serving