-
Peking University
- https://fxmeng.github.io
Lists (3)
Sort Name ascending (A-Z)
Stars
Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"
Train a 1B LLM with 1T tokens from scratch by personal
TransMLA: Multi-Head Latent Attention Converter
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
A library for generative social simulation
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Fully open reproduction of DeepSeek-R1
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Continual Learning of Large Language Models: A Comprehensive Survey
PyTorch native quantization and sparsity for training and inference
MineStudio: A Streamlined Package for Minecraft AI Agent Development
Must-read Papers on Knowledge Editing for Large Language Models.
📰 Must-read papers and blogs on Speculative Decoding ⚡️
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Awesome LLM compression research papers and tools.
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
A framework for the evaluation of autoregressive code generation language models.
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Modeling, training, eval, and inference code for OLMo
Video+code lecture on building nanoGPT from scratch
Create animations for the optimization trajectory of neural nets