Skip to content
View fxmeng's full-sized avatar

Block or report fxmeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"

Python 55 3 Updated Mar 24, 2025

Train a 1B LLM with 1T tokens from scratch by personal

Jupyter Notebook 589 64 Updated Mar 9, 2025

TransMLA: Multi-Head Latent Attention Converter

Python 4 2 Updated Mar 2, 2025

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Python 148 16 Updated Mar 24, 2025

A library for generative social simulation

Python 822 177 Updated Mar 27, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,944 230 Updated Mar 4, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,381 811 Updated Mar 1, 2025

Fully open reproduction of DeepSeek-R1

Python 23,400 2,127 Updated Mar 27, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

356 8 Updated Mar 25, 2025

Continual Learning of Large Language Models: A Comprehensive Survey

374 17 Updated Mar 3, 2025

PyTorch native quantization and sparsity for training and inference

Python 1,923 234 Updated Mar 27, 2025

PyTorch native post-training library

Python 5,032 564 Updated Mar 28, 2025

MineStudio: A Streamlined Package for Minecraft AI Agent Development

Python 214 10 Updated Mar 23, 2025

Must-read Papers on Knowledge Editing for Large Language Models.

1,042 71 Updated Mar 7, 2025

黑神话悟空妖怪平生录

671 51 Updated Sep 2, 2024

📰 Must-read papers and blogs on Speculative Decoding ⚡️

661 33 Updated Mar 27, 2025

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

19,191 1,846 Updated Sep 19, 2024

Awesome LLM compression research papers and tools.

1,444 93 Updated Mar 26, 2025

🧱 Modula software package

Python 186 13 Updated Mar 26, 2025

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 749 52 Updated Oct 1, 2024

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Python 423 40 Updated Feb 1, 2024

A framework for the evaluation of autoregressive code generation language models.

Python 914 237 Updated Oct 31, 2024
Jupyter Notebook 185 8 Updated Oct 21, 2024

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 11,879 1,196 Updated Mar 27, 2025

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,346 520 Updated May 3, 2024

Modeling, training, eval, and inference code for OLMo

Python 5,445 583 Updated Mar 26, 2025
Jupyter Notebook 126 16 Updated Mar 4, 2025

Video+code lecture on building nanoGPT from scratch

Python 4,011 599 Updated Aug 13, 2024

Create animations for the optimization trajectory of neural nets

Python 147 24 Updated Jan 30, 2024
Next
Showing results