
Lists (7)
Sort Name ascending (A-Z)
Starred repositories
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
Applied AI experiments and examples for PyTorch
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Get your documents ready for gen AI
A Easy-to-understand TensorOp Matmul Tutorial
Ongoing research training transformer models at scale
Fully open reproduction of DeepSeek-R1
Code release for book "Efficient Training in PyTorch"
这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。
Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't Know'"
Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Stretching GPU performance for GEMMs and tensor contractions.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
About This sample shows how to deploy glm-edge model series using OpenVINO
Only implemented through torch: "bi - mamba2" , "vision- mamba2 -torch". support 1d/2d/3d/nd and support export by jit.script/onnx;
Run Generative AI models with simple C++/Python API and using OpenVINO Runtime
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
LLM implementation one matrix multiplication at a time
Library for modelling performance costs of different Neural Network workloads on NPU devices